Feature Generation by Simple-FLDA for Pattern Recognition M. Fukumi, S. Karungaru Faculty of Engineering University of Tokushima 2-1, Minami-Josanjima, Tokushima, Japan
[email protected] Abstract—In this paper, a new feature generation method for pattern recognition is proposed, which is approximately derived from geometrical interpretation of the Fisher linear discriminant analysis (FLDA). In a field of pattern recognition or signal processing, the principal component analysis (PCA) is popular for data compression and feature extraction. Furthermore, iterative learning algorithms for obtaining eigenvectors in PCA have been presented in such fields, including neural networks. Their effectiveness has been demonstrated in many applications. However, recently the FLDA has been used in many fields, especially face image analysis. The drawback of FLDA is a long computational time based on a large-sized covariance matrix and the issue that the within-class covariance matrix is usually singular. Generally FLDA has to carry out minimization of a within-class variance. However in this case the inverse matrix of the within-class covariance matrix cannot be obtained, since data dimension is generally higher than the number of data and then it includes many zero eigenvalues. In order to overcome this difficulty, a new iterative feature generation method, a simple FLDA is introduced and its effectiveness is demonstrated for pattern recognition problems.
I. INTRODUCTION In statistical pattern recognition [1], principal component analysis (PCA), Fisher linear discriminant analysis (FLDA), and factor analysis have been widely utilized and their effectiveness has been demonstrated in many applications, especially in face recognition [2]. Recently, many new algorithms have been presented in a field of the statistical pattern recognition and neural networks [3]-[5]. In particular, the simple PCA is a simple and fast learning algorithm and its effectiveness has been demonstrated in face information processing [6][7]. Furthermore, the extension of PCA to higher order nonlinear space has been carried out [8][9]. In [2], Eigenface and Fisherface were compared and the effectiveness of Fisherface is shown by means of computer simulations. However, face image size is generally large compared to the number of image data and therefore many zero eigenvalues are included in a within-class covariance matrix [1]. Then the method could not yield Fisherface directly. In this case, usually PCA is first used to compress data dimension. After data compression, FLDA is used to yield eigenvectors. This process requires a huge quantity of matrix computation. FLDA is
Y. Mitsukura Faculty of Education Okayama University 3-1-1, Tsushima-Naka Okayama, Japan
[email protected] usually better than PCA as a feature generator in pattern recognition [1][2][9]. However singularity in the within-class covariance matrix makes FLDA difficult. Furthermore PCA can cause information loss in data compression, because its accumulated relevance cannot reach 100 % owing to data compression. On the one hand, simple iterative algorithms for achieving PCA have been proposed [3]-[5]. These algorithms are based on the data method instead of the matrix method and are easy to be implemented. Notice that PCA is based on distribution of all data and is not necessarily effective for pattern classification although PCA has been used in many pattern recognition problems. In this paper, a simple algorithm to achieve FLDA is presented. This is carried out by a simple iterative algorithm and does not use any matrix computation. The present algorithm is derived based on geometrical interpretation of maximization of between-class variance and minimization of within-class variance. We call this the simple-FLDA or simple-FLD. The matrix method of FLDA is due to an eigenvalue problem. In this case, the matrix equation must be solved. Therefore it is difficult to solve its eigenvalue problem owing to a large-sized matrix. Our approach doesn’t require any matrix computations. This paper presents simulation examples for simple 2-class separation, rotated coin recognition, and face recognition problems by using the simple-FLDA. From results of computer simulations, it is demonstrated that this algorithm is better than PCA in feature generation property.
II. SIMPLE-FLDA The simple-FLDA is derived to satisfy FLDA properties, which are based on maximization of between-class variance and minimization of within-class variance. These properties are described by the ratio of those in FLDA. This is an eigenvalue problem and can be solved by matrix computation including matrix inversion. This is the same in PCA. To construct the discriminant space where the trace of the within-class covariance matrix is minimized and the trace of the between-class covariance matrix is maximized in FLDA, we can define the discriminant criterion given as
Fig. 1.
J = tr (ΣW−1Σ B ) by using the within-class covariance matrix
ΣW and the
between-class covariance matrix Σ B in the discrminant space. Usually we have to solve eigen-equation to obtain eigenvectors for maximizing the criterion eq.(1). However the within-class covariance matrix
ΣW is always singular, because the number
of images in learning set “N” is much smaller than the number of pixels in each image. If the number of class is c, then the rank of
Maximization of between-class variance.
(1)
ΣW is at most N-c [1][2].
To avoid matrix calculation of the eigenvalue problem, the simple-PCA was presented [5]. The simple-PCA is a simple iterative algorithm to obtain eigenvectors without matrix calculation. This iterative learning is carried out using all input data. The objective of the algorithm is to obtain a vector direction for maximizing a data variance. After the convergence, the first eigenvector can be obtained. Next, the second eigenvector is also obtained by using the same procedure after subtraction of the first eigenvector component. Then, the third, the fourth and so on can be obtained. A. Maximization of between-class variance In this paper, the simple-FLDA is derived in a simple form. First, maximization of between-class variance is introduced by geometrical interpretation. Let hj be a mean vector in each class. In Fig.1, hj is shown as a black dot. Then the mean value for all data is zero. The class-mean vectors are regarded as input vectors in the simple PCA. Therefore maximization process of data variance is the same as that of the simple PCA. The maximization is carried out in the following.
In Fig.1, a dotted line shows a line (hyperplane) which adjusts eq.(2) to 0. The threshold function (3) is summed using k
every mean vector. By using this process, it is hoped that a n k
converges to n-th eigenvector. a n is the n-th eigenvector after the k-th iteration. The obtained eigenvectors maximize the between-class variance. This is the same as the simple-PCA in principle. Eq.(3) can be replaced by another forms [5]. Eeigenvectors are therefore effective up to (c-1), if the number of classes is c [1][2]. However this is approximated iterative algorithm and therefore more eigenvectors are derived considering a numerical error. B. Minimization of within-class variance Next, a method for achieving minimization of within-class variance is introduced. In Fig.2, data vector Xj in a class is shown as a black dot. The vector Xj is zero mean in the class. The relationship between the vector Xj and arbitrary vector
a kn which converges to an eigenvector is considered. The direction to minimize a vector length of the projected Xj is an orthogonal direction to Xj. Therefore such an orthogonal direction is computed and summed to minimize the within-class variance. Let bj be a direction where Xj direction is subtracted k
from the vector a n . This calculation is easy to be done and is given by where
bj = akn - ( xˆ j • akn ) xˆ j
(4)
xj xˆ j = ∥x∥ j
(5)
yn = (akn )T hj
(2)
Actual learning quantity is given as
h j : if y n ≥ 0 Φ( yn, hj ) = - h j : otherwise
(3)
∥xj∥ Φi( bj, x j ) = bj ∥b∥ j
Yn = (a ) hj k T n
Positive
akn Negative
hj
(6)
where bj is normalized in size. In Fig.2, bj is the same direction as the dotted line which is perpendicular to Xj. In learning, this quantity is summed and averaged over all sample vectors in the same class. In this case, weighted summation and averaging by the vector length of Xj are carried out. This weighted summation achieves calculation considering an influence of component that vector norm is larger. Then it is hoped that it converges to a direction to minimize the within-class variance. By using eqs.(3) and (6), we can obtain the simple-FLDA as follows,
Φkn = ∑i=1Ni Φ( y n , hi ) + ∑i=1 ∑ j=1 Φi(bj, xj ) c
c
Ni
(7)
akn+1 =
Φkn Φkn
(8)
where Ni is the number of data in the same class i. As shown in eq.(7), the maximization of between-class variance and the minimization of within-class variance are simultaneously achieved and after convergence its eigenvector is obtained.
a kn bj
Fig. 2.
between the eigenvector and a test sample is a feature for coin recognition. These features are recognized using the minimum distance classifier. In Table 1, “Evs” indicates the number of eigenvectors. The rows labeled S-FLDA and S-PCA are the classification accuracy (%) obtained by feature generations of the simple-FLDA and the simple–PCA, respectively. As shown in Table 1, recognition accuracy by the simple-FLDA is better than that by the simple-PCA.
Xj
Minimization of within-class variance.
After convergence of the first eigenvector, the next eigenvector can be computed. In this case, the first eigenvector component must be subtracted from input data in the same way as the simple-PCA [5]. Furthermore, the first eigenvector component must be subtracted from the mean vectors hj and an initial vector for the next eigenvector.
III. COMPUTER SIMULATION In this paper, to demonstrate the effectiveness of the simple-FLDA, computer simulations for rotated coin recognition and two face recognition problems are carried out.
Fig. 3. Coin images. The top is a 500 won coin and the bottom is a 500 yen coin. Table 1. Recognition accuracy for coin recognition. EVs S-FLDA S-PCA
1 73.4 60.0
2 98.1 84.4
3 98.1 88.4
4 100.0 90.0
5 100.0 96.6
10 100.0 95.6
20 100.0 95.3
A. Rotated coin recognition problem We evaluated the feature generation property for the rotated coin recognition problem [10]. The coins used are a Japanese 500 yen coin and a South Korean 500 won coin, as shown in Fig.3. (Notice that recently a new 500 yen coin is also used in Japan.) Their head and tail are also recognized. They have the same size, weight, and color. Therefore four kinds of coin images are recognized using the simple minimum distance classifier, which is the same as k-NN (Nearest Neighbor) with k=1. The size of coin image data is 200x200 pixels with 256 gray scale levels. They are transformed into frequency spectra by 32x32 points FFT (Fast Fourier Transform) in the polar coordinate system. 1,024 dimensional Fourier spectra obtained are rotational invariant. The data dimension used is 544 owing to symmetry of spectra. The learning of eigenvectors is carried out using 10 samples for each class. The rest 40 data of each class is used for evaluation. The total number of test data is therefore 160. The accumulated relevance is greater than 80%, if the number of eigenvectors obtained is 20. Those eigenvectors are used to generate features for test examples. The dot product
B. Face recognition The first face recognition problem used face images with expressions as shown in Fig.4. Five face expressions are used, which show neutral, laugh, surprise, anger, and sadness. 4 subjects have two sets of five expressions. The total number of data is 40. The size of image data is 100x100 pixels, which are transformed into gray-scale ones with 256 levels. The image data is further compressed into 2,500 pixels. The length of eigenvectors to be obtained is therefore 2,500. Personal identification was carried out for these image data. The number of training data was 6 for each subject. The total is 24. The rest 16 images were used for evaluation. The results are shown in Table 2. The accumulated relevance was 83% if the number of eigenvectors in PCA was 10. The features obtained using eigenvectors are classified into five classes by the minimum distance classifier. As shown in Table 2, recognition accuracy by the simple-FLDA is better than that by the simple-PCA.
(7’) where w1=w2 in eq.(7). If w1=w2, then it means that data distribution is maintained in the original one. Here we evaluated eq.(7’) using varying ratio of w1/w2. Fig.6 shows the result obtained using varying value of the ratio for the first face recognition problem. As shown in Fig.6, recognition accuracy can be invariant to varying value of the ratio. On the one hand, the number of iteration cycles to obtain eigenvectors can be drastically changed. The appropriate value is about 0.5 to 2.0, as shown in Fig.6. This tendency can be observed in the other problems.
Table 2. Recognition accuracy for the first face recognition.
400 300 200 100
10
4
0 2
Face image examples in the second one are shown in Fig.5. They include the database by University of Oulu. Each person has 6 images, which include faces with slight expression. This is personal identification experiment, which consists of images of 50 persons. The image size shown in Fig. 5 is 100x100 pixels. This is reduced in experiments into 50x50 pixels. The length of eigenvectors to be obtained is therefore 2,500. The number of leaning data for eigenvector is 5 for each person. We evaluate the effectiveness of FLDA by using the leave-one-out cross-validation method. The number of trials is therefore 6. The accumulated relevance is about 80% when the number of eigenvectors is 10 for 10 persons and 15 for the others. We used these numbers of eigenvectors. The simulation results are shown in Table 3. In Table 3, “No. of subjects” means the number of subjects used in personal identification experiment. As shown in Table 3, S-FLDA is better than S-PCA in recognition accuracy. The number of iterations for convergence is 5 to 10 in the Simple-PCA and 20 to 40 in the simple-FLDA. The computation time for obtaining eigenvectors in simple-FLDA is a little bit longer than that in simple-PCA.
Accuracy Iterations
500
0.9
10 60.3 55.9
5 63.9 57.3
0.7
4 60.5 50.4
0. 5
3 62.9 49.5
0. 3
2 60.7 42.9
0. 01
1 52.4 47.6
Accuracy & Iterations
EVs S-FLDA S-PCA
600
0.1
Fig. 4. Face Images with expressions.
Ratio
Fig. 6. Recognition accuracy and the number of iteration cycles obtained using varying value of the ratio.
IV. CONCLUSION In this paper, the simple-FLDA of yielding features suitable for pattern recognition was presented. This is derived from geometrical interpretation of the FLDA. This method was applied to 2-class separation, rotated coin recognition, and face image recognition problems. From the results of computer simulation, it is shown that feature generation property of the simple-FLDA is better than that of the simple-PCA. The simple-FLDA doesn’t require a matrix computation and doesn’t cause information loss, contrast to the Fisher-face.
ACKNOWLEDGMENT Fig. 5. Face Images.
This research was partially supported by SECOM Science and Technology Foundation.
Table 3. Recognition accuracy for the second face recognition.
REFERENCES
No. of subjects
10 98.3 96.7
S-FLDA S-PCA
20 100.0 97.5
30 100.0 98.3
40 99.2 98.3
50 99.7 98.7
Finally we evaluated eq.(7) in the simple-FLDA. The first and second terms in eq.(7) usually have the same importance.
Φ kn = w1 ∑i=1Ni Φ( y n , hi ) + w2 ∑i=1 ∑ j=1 Φi(bj, xj ) c
c
Ni
[1] R.O.Duda and P.E.Hart, Pattern Classification and Scene Analysis, John Wiley & Sons (1973) [2] P.N.Belhumeur, J.P.Hespanha, and D.J.Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.19, No.7, pp.711-720 (1997)
[3] T.D. Sanger, “Optimal Unsupervised Learning in a Single Layer Linear Feedforward Neural Network,” Neural Networks, Vol. 2, No. 6, pp. 459-473 (1989) [4] S.Y. Kung, Digital Neural Networks, Prentice-Hall, (1993) [5] M.Partridge and R. Calvo, “Fast dimentionality reduction and simple PCA”, IDA, 2, pp.292-298 (1997) [6] M.Nakano, F.Yasukata, and M.Fukumi: "Recognition of Smiling Faces Using Neural Networks and SPCA", International Journal of Computational Intelligence and Applications, Vol.4, No.2, pp.153-164 (2004) [7] H. Takimoto, Y. Mitsukura, M. Fukumi and N. Akamatsu, "A Feature Extraction Method for Personal Identification System by Using Real-Coded Genetic Algorithm", Proc. of 7th SCI'2003, Orlando, USA, Vol. IV, pp.66-70 (2003). [8] B.Scholkopf, et al., “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Technical Report No.44, Max-Planck-Institute, Germany (1996) [9] M.H.Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel methods,” Proc. of Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp.215-220, Washington, D.C. (2002) [10] M.Fukumi, S.Omatu, and Y.Nishikawa, "Rotation Invariant Neural Pattern Recognition System Estimating a Rotation Angle", IEEE Trans. on Neural Networks, Vol.8, No.3, pp.568-581 (1997)