Affine Invariant Dynamic Time Warping and its Application to Online Rotated Handwriting Recognition Yu Qiao and Makoto Yasuhara University of Electro-Communications,1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585, Japan
[email protected], yas@ is.uec.ac.jp
Abstract Dynamic Time Warping (DTW) has been widely used to align and compare two sequences. DTW can efficiently deal with local warp or deformation between sequences. However, it can’t take account of affine transformation of sequences, such as rotation, shift and scale. This paper introduces a novel Affine Invariant Dynamic Time Warping (AI-DTW) method, which tries to deal with the affine transformation and sequence alignment in a unified framework. We propose an iterative algorithm to estimate the optimal transformation matrix and warping path by mutually updating them. Recognition experiments on the online rotated handwritten data illustrated that the AI-DTW achieves a recognition rate of 95.54%, which is significantly higher than that (65.87%) of the classical DTW method.
1. Introduction To compare two sequences of data (e.g. shapes, time series) is a fundamental and important problem in the pattern recognition field. The key problem in sequence comparison is to align one sequence data with another, in other words, to determine the correspondences between the elements of two sequences. The Dynamic Time Warping (DTW) [1] solves this efficiently by searching the optimal warping path, along which the accumulated distance or distortion is minimized. By using the Dynamic Programming (DP) algorithm, the best warping path can be found in a polynomial time. DTW was originally developed in automatic speech recognition to cope with different speaking speeds [1], [2], and has been widely used in handwriting/document recognition [3-6], image analysis and etc. The DTW can deal with the local deformation or warping of the sequence by finding an optimal alignment (warping path) between two input sequences. However, it doesn't account for the global affine transformation which the sequences may undergo, such as rotation, shift and scale. These transformations
can greatly distort the elements in the sequence, thus influenced the optimal procedure and the result of DTW. Perhaps the simplest method to this problem is to normalize sequence (i.e. to compensate affine transformation) before using DTW. But the normalization always needs heuristics or a prior knowledge and thus brings uncertainty into system. Another idea is to use affine invariant features. Unfortunately, the design of effective affine invariant features is a difficult problem itself [7]. To overcome these difficulties, this paper proposes a novel method: Affine Invariant Dynamic Time Warping (AI-DTW) that tries to cope with the affine transformation and the sequence aliment problem in a unified framework. We set the optimal object as a function of transformation matrix and warping path, and develop an iterative algorithm to find the optimization by mutually updating transformation matrix and warping path. The method itself does not make use of any assumption or constraint about the affine transformation or the alignment. But when available, these constraints can be incorporated easily into this method. As its application, we have applied the AI-DTW to online rotated handwriting recognition and found that it achieved a significant improvement of recognition rates when compared with the DTW. The remainder of this paper is organized as follows. In Section 2, we provide a brief review of DTW. Section 3 formulates the AI-DTW problem and develops the optimal algorithm for it. In Section 4, we apply the AI-DTW to online rotated handwriting recognition and compare the recognition rates with the classical DTW. The paper is concluded in Section 5.
2. Dynamic Time Warping In this paper, the term “sequence” denotes a variable-length series of elements; and each “element” is a vector with dimension K, which represents a measurement at certain time or position. For example, in the context of online hardwiring, the sequence is a series of points and each element consists of the (x, y) coordinates of a point. Consider two sequences: T=(t1,
t2, …, tn) and R=(r1,r2,…, rm), where ti and rj are elements at index i and j in T and R respectively. An alignment from T to R can be represented by a warping path w={w(1), w(2),…, w(n)}, where j=w(i), j ∈ [1,m] , i ∈ [1,n] means that the i-th element in T is aligned to the j-th element in R. Please be noticed, this definition is not symmetric, that is, the warping path from T to R and the one form R to T may be different. There is a general symmetric definition of warping path and details can be found in [1]. For simplicity, we focus on the non-symmetric version in this paper and our developed method can be easily generalized to symmetric warping path. Suppose we have a distance measure (e.g. Euclidean distance) between any two elements ti and ri, denoted by d(ti, rj), or simply by d(i, j). The accumulated distance along warping path w can be calculated by: 1 n (1) Dw (T , R ) = ∑ d (i, w(i )) . n i =1 The object of DTW is to find the warping path w which minimizes the distance Dw(T, R). And the DTW distance between T and R is calculated by DTW (T , R) = min {Dw (T , R)} . (2) w the associated best warping path is denoted by: (3) DTW _ PATH (T , R ) = arg min{Dw (T , R )} . w
The warping path w should subject to several constraints [1], [2]: Boundary constraints: w(1)=1, w(n)=m; z Monotonicity constraints: w(k +1)≥ w(k); z Local Continuity Constraints: w(k+1)- w(k)≤1. Local Continuity Constraint is introduced to ensure the smoothness of warping path. Note that there are many other kinds of local continuity constraints [1]. The DTW find the best warping path efficiently using dynamic programming. Under the definition above, we calculate recursively the following distance: D(i, j)= min{D(i, j-1), D(i-1, j-1), D(i-1, j)}+d(i, j). (4) And at last reach to D(n, m)= D(T, R).
3. Affine Warping
Invariant
Dynamic
Time
In this Section, firstly, we formulate the problem of affine invariant dynamic time warping (AI-DTW). Then, we develop an iterative algorithm to solve the optimization problem of AI-DTW. Finally, we discuss a special case: rotate, scale and shift invariant DTW.
3.1 Formulation of AI-DTW As discussed in Section 1, the traditional DTW cannot cope with the affine transformation of sequence.
Assume the input sequence(s) have undergone certain unknown affine transformation, to correctly compare the two sequences T and R, we think it is necessary to take account of this transformation into the calculation of the distance. For convenience, we extend each ti in T by adding a member 1, ti->[ti, 1]. The same operation is also done for ri. Thus the extended vector ti has a dimension of K+1. We have: 1 n (5) Dw (T , R, A) = ∑ d (ti , rw ( i ) A) , n i =1 where A denotes the transform matrix with size (K+1)×(K+1) and w represents the warping path. Our object is to find the optimal matrix A and the optimal path w by minimizing the distance between T and R: arg min{Dw (T , R, A)} . (6) w, A
The Eq. (6) unifies w and A into the same optimal objective. This enables to cope with the sequence variance caused by both global affine transformations and local deformation.
3.2 Optimization of AI-DTW To directly optimize Eq. (6) is a hard problem as it includes a nonlinear factor w and a linear factor A. In the next, we will divide the problem into two sub optimal problems, each of which can be solved independently. And we will show that the optimization of Eq. (6) can be achieved through mutually solving the two sub problems. In the first sub problem, assume that the path w is given, the question becomes how to find optimal A. Using the least square error, we have n
arg max{∑ t i − rw ( i ) A } . A
2
(7)
i =1
Construct the following two matrixes: Dt = [t1, t2, …, tn]T (8) Dr = [rw(1), rw(2), …, r w(n)] T . T ' ' denotes the matrix transpose. It is easy to see that the optimal A in Eq. (7) can be obtained by: (9) A=(DrTDr)-1DrTDt. To speed up the calculation of Eq (9), one can apply SVD decomposition [8] on matrix Dr at first. In the second sub problem, we assume transformation matrix A is known and our object is to find optimal warping path w. This can be solved by using the DP algorithm with the same techniques of the DTW, as described in Section 2. In the next, we combine them together to solve Eq. (6). Our basic idea is that at first we fix the transformation matrix A and update the warping path w, then we fix w and update A, and so on. By alternatively updating (or optimizing) A and w, the solutions of two
Input T
Input R err.=26.9028
110
Itr. 1 err.=7.3915
Itr. 4 err.=5.3841
120
100
120
120
100
90
100
100
80
80
60
60
40
40
20
20
0
0
30 25
80 70
60
60 50
40
20 Error
80
10
40 30 20
0 0
20
40
60
80
Input T
10 -50
120
0
50
100
Input R err.=36.9075
20
40
60
80
0
Itr. 1 err.=9.8098
120 100
100
0
120
100
100
80
80
80
60
60
60
60
40
40
40
40
20
20
0
0
20
0 0
20
40
60
80
0 20
40
60
80
0
20
40
60
20
40
60
0
80
Itr. 4 err.=6.8033
120
80
20
5
80
0
5 Iteration
10
0
5 Iteration
10
30 25 20 Error
20
15
15 10 5
0
20
40
60
0
80
Figure 1 Examples of AI-DTW. Top row: digit 2. Bottom row: digit 8. Left column: Input pattern T; 2nd column: Input pattern R after rotation. 3rd column: pattern R after the first iteration of AI-DTW. 4th column: pattern R after the 4th iteration; Right column: Error curve along iteration num. sub-problems can mutually improve one another during the process. And if the square of Euclidean distance is used to in the second sub-problem to calculate d(ti, rj), the convergence of this method can be analyzed by a similar way of ExpectationMaximization (EM) algorithm [9]. The details of our algorithm are as follows: Optimization Algorithm of AI-DTW: Initialize The warping path w(1)=DTW_PATH(T, R). Iteration number k=1 While not convergence k =k+1; Update the transformation matrix by: n
2
A( k ) = arg min{∑ ti − rw( k −1) ( i ) A } . A
i =1
Update the warping path by: w(k)=DTW_PATH(T, RA(k)). End While Some examples of the above algorithm are illustrated in Figure 1, and one can find that the most error reduction is done in the first iterations and the AI-DTW can usually converge in a few iterations.
3.3 Rotation, Scale and Shift Invariant DTW In many 2-dimension shape analysis applications, such as online handwriting, it is often necessary to use rotation scale and shift invariant (RSS invariant for
short) method. Rotation scale and shift can be seen as special cases of affine transform. Here, we discuss this problem separately for the following reasons: (1) The RSS invariant problem can be solved more efficiently without the matrix calculation in Eq (9). (2) In some applications, the AI-DTW may over-fit the input sequence. For example in the online handwriting, the samples of digit '2','7' can be transformed shapes very similar to digit '1', if we wrinkle the samples along x coordinate. Thus for these examples, it is more natural to set the scale rates of x coordinate and y coordinate as the same. Given a point (x, y), the coordinates after rotation, scale and shift transformation can be calculated by: x'= rcos(θ) x+ rsin(θ)y+Δx (10) y'=-rsin(θ) x+ rcos(θ)y+Δy, where r denotes the scale rate, θ is the rotation angle, and Δx, Δy represents the shift parameters. For simplicity, we denote the transformation in Eq. (10) by (x', y')=f(x, y), and assume that each element ti= (xt,i, yt,i) is a 2-dimension vector composed by the coordinates of point ti. Similarly rj= (xr,i, yr,i). Given warping path w, the first sub optimal problem discussed in Section 3.2 can be re-formulated as: arg min G (T , R ) = r ,θ , a , b
n
∑
t i − f ( rw ( i ) )
2
.
(11)
i =1
By solving the functions ∂G/∂r=0, ∂G/∂a=0, ∂G/∂b=0, we have:
d c
θ = tan −1 ( ) ,
r = c2 + d 2 / e ,
∂G/∂θ=0,
Δx = xr − r cos(θ ) xt − r sin(θ ) yt , Δy = yr + r sin(θ ) xt − r cos(θ ) yt , where xr =
(12)
1 1 xr , w ( i ) , y r = ∑ y r ,w ( i ) , ∑ n i =1 n i =1 n
n
n
c = ∑{−( xt ,i − xt )( xr , w ( i ) − xr ) −( yt ,i − yt )( y r ,w ( i ) − y r )} , i =1 n
d = ∑{−( yt ,i − yt )( xr , w ( i ) − xr ) +( xt ,i − xt )( y r , w ( i ) − y r )} , i =1 n
e = ∑{( xt ,i − xt ) 2 + ( yt ,i − yt ) 2 } . i =1
4. Experiments on Online Handwriting Recognition
Rotated
Figure 2 Examples of Rotated Testing Samples To examine the utility of AI-DTW, we have executed recognition experiments on a set of rotated online handwriting digit data from the UCI Repository of machine learning databases [10]. The database includes 7,494 training samples and 3,498 writer independent testing samples. If a sample consists of more than one stroke, we connect them into one stroke for simplicity. All the samples are normalized to a size of 120×90. We sample 20 points with equal space interval along the pen trace of each sample. Then each of the testing samples was rotated by an angle randomly generated between [-π/2, π/2]. Some of the rotated testing samples are shown in Fig. 2. DTW had been proved to be an efficient method to calculate distance in online handwriting recognition [35]. Here we select the simple k-Nearest Neighbor (NN) classifier to compare the recognition performance of two distance metrics: DTW and AI-DTW. As the online samples have been normalized, to be fair for DTW and AI-DTW, we use the rotation and shift invariant with fixed scale rate r=1 (see Section 3.3). The recognition results are summarized in Table 1. We can find that the AI-DTW performed much better than
the DTW method did, which indicated that the AIDTW is a more effective method to compare online rotated shapes than classical DTW. The main drawback of using NN classifier is its low speed. However, one can reduce the computation time greatly by using learned prototypes [4]. Due to page limitation, we have to omit the detailed discussion on this. Table 1 Recognition rates of DTW and AI-DTW (k is the number of neighbors used) k 1 3 5 10 DTW 64.00% 64.44% 65.44% 65.87% AI-DTW 95.54% 95.11% 94.65% 94.08%
5.Conclusion This paper proposes the affine invariant dynamic time warping method to compare two sequences which may have local shape deformation and undergo global unknown affine transform. We formulated the transformation matrix and warping path into a unified optimal problem and developed an iterative algorithm to find the optimal solution. Experiments on online rotated handwriting indicated that our method can significantly improve the recognition rates of the DTW. For future work, we are considering 1) to study the regulations of transformation matrix and 2) to combine the statistical learning methods with the AI-DTW.
References [1] L. Rabiner und B.H. Juang. "Fundamentals of Speech Recognition". Prentice Hall PTR. 1993 [2] H. Sakoe and S. Chiba, "A Dynamic Programming Algorithm Optimization for Spoken Word Recognition," IEEE Trans. on ASSP, 26(1): 43-49. 1978 [3] H. Mitoma, S.i Uchida, and H. Sakoe, "Online character recognition using eigen-deformations," 9th IWHFR Tokyo, 2004 [4] J. Alon, V. Athitsos, and S. Sclaroff. "Online and Offline Character Recognition Using Alignment to Prototypes," Proc. ICDAR, Korea 2005 [5] C. Bahlmann and H. Burkhardt, "The writer independent online handwriting recognition system flog on hand and cluster generative statistical dynamic time warping," IEEE Trans. on PAMI, 26(3):299310, 2004 [6] T. Rath and R. Manmatha. "Word image matching using dynamic time warping", Proc. CVPR, vol. 2. pp. 521-527, 2003 [7] Dionisio, C.R.P.; Kim, H.Y."New features for affine-invariant shape classification," Proc. ICIP, pp:2135 - 2138, 2004 [8] William H P., Saul A T., William T V., Brian P F., "Numerical Recipes in C++: The Art of Scientific. Computing," Cambridge University Press, Cambridge, England, 1992 [9] Dempster, A., Laird, N., and Rubin, D. "Maximum likelihood from incom-plete data via the EM algorithm.," Journal of the Royal Statistical Society, Series B,39(1):1–38. 1977 [10] Hettich S,. Blake C.L, and Merz C.J. "UCI Repository of machine learning databases", 1998 [http://www.ics.uci.edu/~mlearn/MLRepository.html]