A Feature Extraction Method for Cursive Character ... - Semantic Scholar

Report 0 Downloads 158 Views
2014 14th International Conference on Frontiers in Handwriting Recognition

A Feature Extraction Method for Cursive Character Recognition Using Higher-Order Singular Value Decomposition Mohammad Reza Ameri∗ , Medhi Haji∗ , Andreas Fischer† , Dominique Ponson‡ , Tien D. Bui∗ ∗

Computer Science and Software Engineering Department Concordia University Montr´eal, Qu´ebec H3G 1M8, Canada Email: mo [email protected], m [email protected], [email protected] † D´epartement de G´enie Electrique ´ ´ Ecole Polytechnique de Montr´eal Montr´eal, Qu´ebec H3T 1J4, Canada Email: andreas.fi[email protected] ‡ IMDS Software 152 Notre-Dame Est, suite 100 Montr´eal, Qu´ebec H2Y 3P6, Canada Email: [email protected]

check reading [3]. In the general case, the problem is still considered as widely unsolved [4]. The difficulty of recognizing handwritten characters lies in the fact that there can be as many handwriting styles as there are people. In fact, it is widely believed that each individual’s handwriting is unique to themselves. In the discipline of forensic science, handwriting identification and verification are based on the principle that the handwritings of no two people are exactly alike. This means that the number of shapes that a handwritten character can take is very large, which is challenging for pattern recognition. Fig. 1 shows some examples of letters from the NIST SD19 database [5] which may be mistaken for ‘a’ without context. In a recent study, we have shown that there are at least 29 pairs of letters that may have almost identical shapes in cursive Latin handwriting [6]. Our motivation for using HOSVD for character recognition is inspired by the success of this method for face recognition [7]. Shapes of faces can also be very similar, making it necessary to distinguish different faces based on details in the image. Handwritten character recognition is commonly performed in two steps: feature extraction and classification. Feature extraction is a crucial first step that determines how well the different characters can be distinguished in the respective feature space. For an early survey, we refer to [8]. Examples of state-of-the-art feature sets include wavelet-based representations of low-quality printed characters as well as handwritten characters [9]–[11]. In general, it cannot be predicted which feature set performs best for a specific recognition task. Yet, there are methods for feature space transformation that are applicable to any feature set and may be able to improve the class separability.

Abstract—The use of Higher-Order Singular Value Decomposition (HOSVD) and other tensor decomposition methods are popular in the face recognition domain, yet a direct application to handwritten character recognition has not shown promising results so far. Character recognition is commonly performed in two steps: feature extraction and classification. In this paper, we propose a feature extraction algorithm based on HOSVD which is then combined with standard statistical classification. The algorithm constructs a tensor from the training data and applies HOSVD in order to obtain a feature extractor matrix for arbitrary character images. We evaluate the proposed handwriting features in combination with SVM classification for character recognition on the CEDAR benchmark data set. The results indicate that our proposed approach significantly outperforms the standard HOSVD classification method. Index Terms—Feature evaluation and selection, optical character recognition, tensor decompositions, higher-order singular value decomposition (HOSVD).

I. I NTRODUCTION Recognition of handwritten characters using computers has been one of the first and most successful applications of pattern recognition. Optical Character Recognition (OCR) has been an active field of research for more than three decades. Hundreds of approaches have been proposed for the recognition of handwritten characters for different scripts [1]. For machineprinted Latin scripts, character recognition methods achieve very high recognition rates, at least when the level of noise is low [2]. When clear imaging is available, typical recognition rates for machine-printed characters exceed 99%. However, OCR is prone to errors when dealing with handwritten characters. Commercial applications with near-perfect recognition accuracy are only available for restricted tasks such as bank 2167-6445/14 $31.00 © 2014 IEEE DOI 10.1109/ICFHR.2014.92

512

considers the sparse HOSVD representation as a feature vector, which is then used for statistical classification. In combination with Support Vector Machine classification using a Radial Basis Function kernel (RBF SVM), we demonstrate that the proposed approach significantly outperforms standard HOSVD classification [15], which is designed to perform character representation as well as recognition with HOSVD. In contrast, we use HOSVD only for feature extraction. Recognition is then performed by the SVM classifier. The remainder of this paper is organized as follows. First, the SVD and HOSVD terminology is introduced in Section II. Then, we present the proposed HOSVD-based feature extraction algorithm in Section III and provide experimental results in Section IV. Finally, conclusions are drawn in Section V.

(a) ‘a’ or ‘c’

(b) ‘a’ or ‘u’

(c) ‘a’ or ‘Q’

II. H IGHER -O RDER SVD (d) ‘a’ or ‘w’

Higher-order Singular Value Decomposition (HOSVD) is obtained by extending the concept of SVD for tensors. SVD for arbitrary matrix A is defined by

Fig. 1. Ambiguity of cursive characters for the character ‘a’.

A = U ΣV T Exemplary applications for cursive handwriting include the use of Principle Component Analysis (PCA) and Independent Component Analysis (ICA) [12] as well as non-linear kernel PCA [13]. Recent advances in image representation include the development of sparse representations, which have proven successful for various applications in computer vision and pattern recognition [14]. The underlying idea is to describe an image with respect to a linear combination of representative samples such that only few coefficients of the linear combination are non-zero. Unlike PCA, the goal is not to create a feature space with a small orthogonal basis but instead use an extensive dictionary of representative samples to create an overcomplete basis. Following this procedure, semantic information like class membership can be directly propagated from the nonzero coefficients. Higher-Order Singular Value Decomposition (HOSVD) is a promising tensor decomposition method for sparse representation. It has a high visibility in the domain of face recognition, for instance for the recognition of facial expressions [7]. A successful application to handwriting recognition was recently reported in [15] for the ten-class problem of digit classification. Promising accuracies are reported with respect to an efficient implementation, which includes a reduction of the training set, and the standard HOSVD classification method, which selects the class with the highest similarity directly from the sparse representation. In this paper, we investigate the application of HOSVD to the more challenging problem of handwritten character recognition that considers a larger number of classes when compared with digit recognition. An experimental evaluation on the CEDAR benchmark database [16] shows that the achieved recognition accuracy is surprisingly low when compared with other state-of-the-art methods. In order to improve these initial results, we propose a different recognition approach that

where U is the an orthogonal matrix containing the eigenvectors of AAT in its columns, V is an orthogonal matrix containing the eigenvectors of AT A and Σ is a diagonal matrix containing the singular values of A. Singular values are sorted by decreasing order in Σ. The number of singular values of A is equal to rank(A). Let Aˆ be an approximation of A with lower rank than A. To build such ˆ = k, k < n, it a matrix, suppose rank(A) = n and rank(A) can be obtained by reconstructing A with the k first singular values of Σ. Tensors are represented by their dimension N , denoted by N -tensor. If we consider vectors as first order tensors, N = 1 and matrices as second order tensors, N = 2 then third order tensors are three dimensional data with N = 3. HOSVD is a generalized concept of SVD for tensors. Unfolding or flattening of tensors [15] is the first step toward HOSVD. It involves the conversion of tensor to matrix form. For a 3-tensor A ∈ RI×J×K unfolding is done for each dimension. One possible flattening for 3-tensor could be: (1)

RI×JK  A(1) : aijk = aiv J×IK

 A(2) : aijk =

K×IJ

 A(3) : aijk =

R R

(2) ajv (3) akv

,

(v = j + (k − 1)K)

,

(v = k + (i − 1)I)

,

(v = i + (j − 1)J)

The second issue to deal with is matrix-tensor multiplication. For matrix F ∈ RJn ×In and tensor A ∈ RI1 ×...×IN , such a multiplication is called n-mode tensor-matrix multiplication by: (A ×n F )(i1 , ..., in−1 , jn , in+1 , ..., iN ) In  = (A(i1 , ..., iN )F (jn , in )) in =1

513

and the following property holds for tensor A ∈ RI×J×K and matrices F ∈ RL×I , G ∈ RM ×J and H ∈ RN ×L (A ×1 F ) ×2 G = (A ×2 G) ×1 F = A ×1 F ×2 G ∈ RL×M ×K The SVD theorem states that for any matrix F ∈ RM ×N , it can be written as F = U ΣV T Considering the matrix as a second order tensor this can be expressed in terms of n-mode tensor-matrix multiplication as F = Σ ×1 U ×2 V and for a third order tensor it would be A = ϕ ×1 U ×2 V ×3 W.

(1)

where A ∈ RI×J×K , U ∈ RI×I , V ∈ RJ×J and W ∈ RK×K are orthogonal matrices and ϕ ∈ RI×J×K is an orthogonal tensor. To compute matrices U , V and W in Eq. 1, matrices A(1), A(2), A(3) must be decomposed by SVD:

Fig. 2. Making tensor for dataset

Our feature extraction method is described as follows. Consider a training data set consisting of M sample images of K classes (K is 10 for numerals and 26 for alphabetical letters). Let Ji denote the number K of sample images in class i, i = 1, .., K, hence M = i=1 Ji . For each image of size m × n in the data set we resize it to a 20 × 20 matrix by using the bi-cubic approximation. We then convert each 20 × 20 matrix representing an image into a vector of size 400 by concatenating the columns. The first step of our method is to construct a tensor A to represent the data set described above. The tensor A is a 3-tensor, A ∈ RI×J×K , where I = 400 is the normalized size for an image, J = max(Ji ), i = 1, .., K, and K is the number of classes. Thus tensor A can be considered as a stack of K matrices Ai , i = 1, .., K where each matrix Ai has dimension 400 × Ji (see Fig.2). However, for many data sets there are different numbers of samples for different classes. Since tensor A is built from these matrices they must have the same size. To achieve this, one solution is to extend all matrices to the size of the largest J = max(Ji ), i = 1, .., K. The extension could be done by replicating the columns of Ai . The next step is to apply HOSVD decomposition method as described in the previous section. An important step in our method to make use of low rank approximation as well as reducing computational cost is to use the approximated tensor Aˆ instead of A. Aˆ is constructed as follows. Compression measure p determines the size of the approximated tensor Aˆ which then produces ˆ from the HOSVD method. the feature extraction matrix U ˆ . To The length of the feature vector depends on the size of U ˆ compute A, we use Eq. 1 to decompose A by HOSVD, and Eq. 2 where U  = U (:, 1 : p), V  = V (:, 1 : p), W  = W and tensor ϕ = ϕ(1 : p, 1 : p, :) where p < I and p < J. Notation (:, 1 : p) means select all rows and all columns from the first to the p-th elements. Eq. 1 is again used to decompose

A(1) = U S (1) (V (1) )T A(2) = V S (2) (V (2) )T A(2) = W S (3) (V (3) )T and the tensor ϕ can be obtained by multiplying tensor A with U T , V T and W T in 1-mode, 2-mode and 3-mode respectively: ϕ = A ×1 U T × 2 V T × 3 W T . Tensor ϕ and matrices U , V and W from Eq.1 can be ˆ A used to compute an approximation of A, denoted by A. (k1 , k2 , k3 )-rank approximation for A is obtained by keeping the first k1 , k2 , k3 columns of U , V and W respectively to produce U  ∈ RI×k1 , V  ∈ RJ×k2 and W  ∈ RK×k3 ; with a sub tensor ϕ ∈ Rk1 ×k2 ×k3 , which contains the first k1 , k2 , k3 elements of ϕ. This procedure results in the tensor approximation Aˆ = ϕ ×1 U  ×2 V  ×3 W  .

(2)

Eq. 1 and Eq. 2 are key elements of the proposed feature extraction algorithm in Section III. More information about HOSVD and tensor decomposition can be found in [15] and [17]. III. F EATURE EXTRACTION Feature selection is one of the most important steps in pattern and character recognition because it affects the performance in terms of recognition and resource usage. Our feature extraction method is based on the method of HOSVD similar to [15]. A compression measure p controls the size of the feature vector. Smaller values of p result in more tensor compression and low resource usage.

514

ˆ ∈ R400×p which represents the feature matrix Aˆ to obtain U for the training images. In the training as well as the testing stage, an image represented by vector d ∈ R400 is projected ˆ to obtain the image feature onto the space spanned by U ˆ T d, where d ∈ Rp has p elements. Our feature vector dp = U extraction method is summarized in Algorithm 1.

TABLE I U PPER CASE E XPERIMENT

Algorithm 1 HOSVD feature extraction Input: Training and test set images, compression measure p. Output: HOSVD features of training and test set images. 1: for all image ∈ training set do 2: d ⇐ resize image to 20 × 20, reshape to 400 × 1 3: add d to TrainVectors 4: end for 5: for all image ∈ test set do 6: d ⇐ resize image to 20 × 20, reshape to 400 × 1 7: add d to TestVectors 8: end for 9: for all class i do 10: Ai ⇐ all TrainVectors ∈ class i 11: A(:, :, i) = Ai 12: end for 13: Decompose [ϕ, U, V, W ] ⇐ A by Eq. 1 14: ϕ ⇐ ϕ(1 : p, 1 : p, :) 15: U  ⇐ U (:, 1 : p) 16: V  ⇐ V (:, 1 : p) ˆ ⇐ [ϕ , U  , V  , W ] by Eq. 2 17: Compose A ˆ , Vˆ , W ˆ ] ⇐ Aˆ by Eq. 1 18: Decompose [ϕ, ˆ U 19: for all d ∈ TrainVectors do ˆ T d to TrainFeatures 20: add U 21: end for 22: for all d ∈ TestVectors do ˆ T d to TestFeatures 23: add U 24: end for 25: return TrainFeatures, TestFeatures

p

Cross Validation

Test

HOSVD classic

16 32 48 64 128

86.66% 89.52% 89.19% 88.16% 85.89%

85.80% 89.75% 90.34% 89.75% 87.34%

62.25% 75.85% 80.10% 82.37% 84.78%

TABLE II L OWER CASE E XPERIMENT p

Cross Validation

Test

HOSVD classic

16 32 48 64 128

86.98% 88.61% 87.72% 87.18% 84.29%

86.64% 88.35% 85.78% 85.41% 84.55%

66.29% 76.22% 79.77% 80.63% 82.47%

and the RBF parameter γ. The best parameters (Cbest , γbest ) were used for recognition on the test set. The proposed combination of HOSVD-based feature extraction and RBF SVM classification is compared with the standard HOSVD classification approach, which selects the most similar class directly from the sparse HOSVD-based representation. We have verified the correctness of our implementation of the reference system, which is similar to the system proposed in [15] for digit recognition, on a ZIP code data set [19]. In Table I and Table II we report recognition accuracies for upper case and lower case experiments, respectively, and different compression measures p ∈ {16, 32, 48, 64, 128}. For the proposed system, results are indicated for cross validation on the training set as well as recognition on the test set. For the reference system (HOSVD classic) results are indicated for recognition on the test set. When compared with recent benchmark results on the CEDAR database, the recognition accuracy achieved by HOSVD classic is surprisingly low. For instance, the authors of [20] report one of the best current results on this database, which is a recognition accuracy of 95.90% for upper case characters, 93.50% for lower case characters, and 94.73% for merged upper case and lower case characters of the CEDAR data set based on recursive subdivision features and RBF SVM classification [20]. Although these benchmark results cannot just yet be obtained by our proposed method, it shows significant improvements over HOSVD classic. In 9 out of 10 cases, the improvements in recognition accuracy on the test set are statistically significant (t-test, α = 0.01). Furthermore, the proposed method achieves its best results for relatively low values of p. That is, a high compression and therefore a low resource usage can be achieved.

IV. E XPERIMENTS In order to evaluate the proposed algorithm and to compare it with the standard HOSVD classification approach, we used the CEDAR data set [16], which contains scanned forms of cursive handwritten mail documents. The samples are split into disjoint sets for training and testing. We used the BINANUMS folder which contains alphanumeric characters. We conducted two experiments for upper case and lower case characters, respectively. The samples for training and testing are selected by the provided guideline. The upper case data contains 11453 character for training and 1327 characters for testing. The lower case data, on the other hand, contains 7692 characters for training and 816 characters for testing. This setup has been preserved for all the experiments. For classification we used the LIBSVM toolkit [18] to train a Support Vector Machine classifier (SVM) with Radial Basis Function kernel (RBF). A cross validation was performed on the training set over 110 pairs (C, γ) of the SVM parameter C

515

V. C ONCLUSION

[11] X. Wang, X. Ding, and C. Liu, “Gabor filters-based feature extraction for character recognition,” Pattern Recogn., vol. 38, no. 3, pp. 369–379, Mar. 2005. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2004.08.004 [12] A. Vinciarelli and S. Bengio, “Off-line cursive word recognition using continuous density HMMs trained with PCA and ICA features,” in Proc. 16th Int. Conf. on Pattern Recognition, vol. 3, 2002, pp. 81–84. [13] A. Fischer and H. Bunke, “Kernel PCA for HMM-based cursive handwriting recognition,” in Proc. 13th Int. Conf. on Computer Analysis of Images and Patterns, 2009, pp. 181–188. [14] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010. [15] B. Savas and L. Eldn, “Handwritten digit classification using higher order singular value decomposition,” Pattern Recognition, vol. 40, no. 3, pp. 993 – 1003, 2007. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320306003542 [16] J. J. Hull, “A database for handwritten text recognition research.” IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 5, pp. 550–554, 1994. [17] L. D. Lathauwer, B. D. Moor, and J. Vandewalle, “A multilinear singular value decomposition,” SIAM J. Matrix Anal. Appl, vol. 21, pp. 1253– 1278, 2000. [18] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, pp. 27:1–27:27, 2011. [Online]. Available: http://www.csie.ntu.edu.tw/ cjlin/libsvm [19] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference and prediction, 2nd ed. Springer, 2009. [Online]. Available: http://www-stat.stanford.edu/ tibs/ElemStatLearn/ [20] G. Vamvakas, B. Gatos, and S. J. Perantonis, “Handwritten character recognition through two-stage foreground sub-sampling,” Pattern Recogn., vol. 43, no. 8, pp. 2807–2816, Aug. 2010. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2010.02.018

In this paper, we have investigated the application of HOSVD-based classification for handwritten character recognition. When compared with an earlier work on digit classification [15], a larger number of classes is taken into account and a surprisingly low recognition accuracy is reported on the CEDAR benchmark database. In order to improve these initial results, we propose a different recognition approach that considers the sparse HOSVD representation as a feature vector, which is then used for statistical classification. In combination with RBF SVM classification, we demonstrate that the proposed approach significantly outperforms standard HOSVD classification. There is clearly a potential for HOSVD-based character recognition, yet the recognition accuracy of the proposed approach is still below other state-of-the-art results. Future work includes the investigation of different image representations prior to HOSVD decomposition as well as a combination of the proposed HOSVD features with other statistical classifiers. Finally, an application of HOSVD to the task of keyword spotting could be a promising line of future research. ACKNOWLEDGMENT This work has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) under grant numbers RGPIN 9265-2010 and CRDPJ 395169-09, and the fellowship project P300P2-151279 of the Swiss National Science Foundation. R EFERENCES [1] P. R. Cavalin, R. Sabourin, C. Y. Suen, and A. S. B. Jr., “Evaluation of incremental learning algorithms for HMM in the recognition of alphanumeric characters,” Pattern Recognition, vol. 42, no. 12, pp. 3241 – 3253, 2009, new Frontiers in Handwriting Recognition. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320308004433 [2] H. Fujisawa, “Forty years of research in character and document recognitionan industrial perspective,” Pattern Recognition, vol. 41, no. 8, pp. 2435 – 2446, 2008. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320308000964 [3] N. Gorski, V. Anisimov, E. Augustin, O. Baret, and S. Maximor, “Industrial bank check processing: The A2iA check reader,” Int. Journal on Document Analysis and Recognition, vol. 3, pp. 196–206, 2001. [4] H. Bunke and T. Varga, “Off-line Roman cursive handwriting recognition,” in Digital Document Processing, B. Chaudhuri, Ed. Springer, 2007, pp. 165–173. [5] P. J. Grother, “NIST special database 19 - handprinted forms and characters database,” National Institute of Standards and Thechnology (NIST), Tech. Rep., 1995. [6] M. Haji, “Arbitrary keyword spotting in handwritten documents,” Ph.D. dissertation, Concordia University, 2012. [7] H. Wang and N. Ahuja, “Facial expression decomposition,” in Proc. 9th Int. Conf. on Computer Vision, vol. 2, 2003, pp. 958–965. [8] O. D. Trier, A. K. Jain, and T. Taxt, “Feature extraction methods for character recognition – A survey,” Pattern Recognition, vol. 29, no. 4, pp. 641–662, 1996. [9] P. Wunsch and A. F. Laine, “Wavelet descriptors for multiresolution recognition of handprinted characters,” Pattern Recognition, vol. 28, no. 8, pp. 1237—1249, 1995. [10] G. Chen, T. Bui, and A. Krzyzak, “Contour-based handwritten numeral recognition using multiwavelets and neural networks,” Pattern Recognition, vol. 36, no. 7, pp. 1597–1604, 2003.

516