Hide keyboard shortcuts

Hot-keys on this page

r m x p   toggle line displays

j k   next/prev highlighted chunk

0   (zero) top of page

1   (one) first highlighted chunk

1""" 

2This module provides functions to perform full Procrustes analysis. 

3 

4This code was originally written by Justin Kucynski and ported over from 

5scikit-bio by Yoshiki Vazquez-Baeza. 

6""" 

7 

8import numpy as np 

9from scipy.linalg import orthogonal_procrustes 

10 

11 

12__all__ = ['procrustes'] 

13 

14 

15def procrustes(data1, data2): 

16 r"""Procrustes analysis, a similarity test for two data sets. 

17 

18 Each input matrix is a set of points or vectors (the rows of the matrix). 

19 The dimension of the space is the number of columns of each matrix. Given 

20 two identically sized matrices, procrustes standardizes both such that: 

21 

22 - :math:`tr(AA^{T}) = 1`. 

23 

24 - Both sets of points are centered around the origin. 

25 

26 Procrustes ([1]_, [2]_) then applies the optimal transform to the second 

27 matrix (including scaling/dilation, rotations, and reflections) to minimize 

28 :math:`M^{2}=\sum(data1-data2)^{2}`, or the sum of the squares of the 

29 pointwise differences between the two input datasets. 

30 

31 This function was not designed to handle datasets with different numbers of 

32 datapoints (rows). If two data sets have different dimensionality 

33 (different number of columns), simply add columns of zeros to the smaller 

34 of the two. 

35 

36 Parameters 

37 ---------- 

38 data1 : array_like 

39 Matrix, n rows represent points in k (columns) space `data1` is the 

40 reference data, after it is standardised, the data from `data2` will be 

41 transformed to fit the pattern in `data1` (must have >1 unique points). 

42 data2 : array_like 

43 n rows of data in k space to be fit to `data1`. Must be the same 

44 shape ``(numrows, numcols)`` as data1 (must have >1 unique points). 

45 

46 Returns 

47 ------- 

48 mtx1 : array_like 

49 A standardized version of `data1`. 

50 mtx2 : array_like 

51 The orientation of `data2` that best fits `data1`. Centered, but not 

52 necessarily :math:`tr(AA^{T}) = 1`. 

53 disparity : float 

54 :math:`M^{2}` as defined above. 

55 

56 Raises 

57 ------ 

58 ValueError 

59 If the input arrays are not two-dimensional. 

60 If the shape of the input arrays is different. 

61 If the input arrays have zero columns or zero rows. 

62 

63 See Also 

64 -------- 

65 scipy.linalg.orthogonal_procrustes 

66 scipy.spatial.distance.directed_hausdorff : Another similarity test 

67 for two data sets 

68 

69 Notes 

70 ----- 

71 - The disparity should not depend on the order of the input matrices, but 

72 the output matrices will, as only the first output matrix is guaranteed 

73 to be scaled such that :math:`tr(AA^{T}) = 1`. 

74 

75 - Duplicate data points are generally ok, duplicating a data point will 

76 increase its effect on the procrustes fit. 

77 

78 - The disparity scales as the number of points per input matrix. 

79 

80 References 

81 ---------- 

82 .. [1] Krzanowski, W. J. (2000). "Principles of Multivariate analysis". 

83 .. [2] Gower, J. C. (1975). "Generalized procrustes analysis". 

84 

85 Examples 

86 -------- 

87 >>> from scipy.spatial import procrustes 

88 

89 The matrix ``b`` is a rotated, shifted, scaled and mirrored version of 

90 ``a`` here: 

91 

92 >>> a = np.array([[1, 3], [1, 2], [1, 1], [2, 1]], 'd') 

93 >>> b = np.array([[4, -2], [4, -4], [4, -6], [2, -6]], 'd') 

94 >>> mtx1, mtx2, disparity = procrustes(a, b) 

95 >>> round(disparity) 

96 0.0 

97 

98 """ 

99 mtx1 = np.array(data1, dtype=np.double, copy=True) 

100 mtx2 = np.array(data2, dtype=np.double, copy=True) 

101 

102 if mtx1.ndim != 2 or mtx2.ndim != 2: 

103 raise ValueError("Input matrices must be two-dimensional") 

104 if mtx1.shape != mtx2.shape: 

105 raise ValueError("Input matrices must be of same shape") 

106 if mtx1.size == 0: 

107 raise ValueError("Input matrices must be >0 rows and >0 cols") 

108 

109 # translate all the data to the origin 

110 mtx1 -= np.mean(mtx1, 0) 

111 mtx2 -= np.mean(mtx2, 0) 

112 

113 norm1 = np.linalg.norm(mtx1) 

114 norm2 = np.linalg.norm(mtx2) 

115 

116 if norm1 == 0 or norm2 == 0: 

117 raise ValueError("Input matrices must contain >1 unique points") 

118 

119 # change scaling of data (in rows) such that trace(mtx*mtx') = 1 

120 mtx1 /= norm1 

121 mtx2 /= norm2 

122 

123 # transform mtx2 to minimize disparity 

124 R, s = orthogonal_procrustes(mtx1, mtx2) 

125 mtx2 = np.dot(mtx2, R.T) * s 

126 

127 # measure the dissimilarity between the two datasets 

128 disparity = np.sum(np.square(mtx1 - mtx2)) 

129 

130 return mtx1, mtx2, disparity 

131