Feature Generation by Simple FLD - Semantic Scholar

Report 3 Downloads 96 Views
Feature Generation by Simple FLD Minoru Fukumi1 and Yasue Mitsukura2 1

University of Tokushima, Dept. of Information Science and Intelligent Systems, 2-1, Minami-Josanjima, 770-8506, Japan [email protected] 2 Okayama University, Faculty of Education, 3-1-1, Tsushima-Naka, Okayama, 700-8530, Japan [email protected]

Abstract. This paper presents a new algorithm for feature generation, which is approximately derived based on geometrical interpretation of the Fisher linear discriminant analysis. In a field of pattern recognition or signal processing, the principal component analysis (PCA) is often used for data compression and feature extraction. Furthermore, iterative learning algorithms for obtaining eigenvectors have been presented in pattern recognition and image analysis. Their effectiveness has been demonstrated on computational time and pattern recognition accuracy in many applications. However, recently the Fisher linear discriminant (FLD) analysis has been used in such a field, especially face image analysis. The drawback of FLD is a long computational time in compression of large-sized between-class and within-class covariance matrices. Usually FLD has to carry out minimization of a within-class variance. However in this case the inverse matrix of the within-class covariance matrix cannot be obtained, since data dimension is higher than the number of data and then it includes many zero eigenvalues. In order to overcome this difficulty, a new iterative feature generation method, a simple FLD is introduced and its effectiveness is demonstrated.

1 Introduction In statistical pattern recognition [1], principal component analysis (PCA), Fisher linear discriminant (FLD) analysis, and factor analysis have been widely utilized and their effectiveness has been demonstrated in many applications, especially in face recognition [2]. Recently, many new algorithms have been presented in a field of the statistical pattern recognition and neural networks [3]-[5]. In particular, the simple PCA is a simple and fast learning algorithm and its effectiveness has been demonstrated in face information processing [6][7]. Furthermore, their extension to higher order nonlinear space has been carried out [8][9]. In [2], Eigenface and Fisherface were compared and the effectiveness of Fisherface is shown by means of computer simulations. However, face image size is large compared to the number of image data and therefore many zero eigenvalues are included in a within-class covariance matrix [1]. Then the method could not yield Fisherface directly. In this

case, usually PCA is used first to compress data dimension. After data compression, FLD is used to yield eigenvectors. This process requires a huge quantity of matrix computation. FLD is usually better than PCA as a feature generator in pattern recognition [1][2][9]. However zero eigenvalue in the within-class covariance matrix make FLD analysis difficult. On the one hand, simple iterative algorithms for achieving PCA have been proposed [3]-[5]. These algorithms are based on the data method instead of the matrix method and are easy to be implemented. Notice that PCA is based on distribution of all data and is not necessarily effective for pattern classification although PCA has been used in many pattern recognition problems. In this paper, a simple algorithm to achieve FLD analysis is presented. This is carried out by a simple iterative algorithm and does not use any matrix computation. The present algorithm is derived based on geometrical interpretation of maximization of between-class variance and minimization of within-class variance. The matrix method of FLD analysis is due to an eigenvalue problem. In this case, the matrix equation must be solved. Therefore it is difficult to solve its eigenvalue problem owing to a large-sized matrix. Our approach doesn’t require matrix computations. We call this the simple-FLD or simple-FLDA. From results of computer simulations, it is demonstrated that this algorithm is better than PCA in feature generation property.

2 Simple-FLD The simple-FLD (simple-FLDA) is derived to satisfy FLD properties, which are based on maximization of between-class variance and minimization of within-class variance. These properties are described by the ratio of those in FLD analysis. This is an eigenvalue problem and can be solved by matrix computation including matrix inversion. To avoid matrix calculation of the eigenvalue problem, the simple-PCA was presented [5]. The simple-PCA is a simple iterative algorithm to obtain eigenvectors without matrix calculation. This iterative learning is carried out using all input data. The objective of the algorithm is to maximize a data variance. After the convergence, the first eigenvector can be obtained. Next, the second eigenvector is also obtained by using the same procedure after subtraction of the first eigenvector component. Then, the third, the fourth and so on can be obtained. 2.1 Maximization of Between-Class Variance In this paper, FLD is derived in a simple form. First, maximization of between-class variance is introduced by geometrical interpretation. Mean vector hj in each class is obtained. In Fig.1, hj is shown as a black dot. Then the mean value for all data is zero. The class-mean vectors are regarded as input vectors in the simple PCA. Therefore maximization process of data variance is the same as that of the simple PCA. The maximization is carried out in the following.

yn = (α kn ) T hj

(1)

 h j : if y n ≥ 0 Φ( yn, hj ) =  - h j : otherwise

(2)

In Fig.1, a dotted line shows a line (hyperplane) which adjusts eq.(1) to 0. By using k

k

this process, it is hoped that a n converges to n-th eigenvector. a n is the n-th eigenvector after the k-th iteration. This is the same as the simple-PCA in principle. Eeigenvectors are therefore effective up to (c-1), if the number of classes is c. However this is approximated iterative algorithm and therefore more eigenvectors are derived considering numerical error.

Yn = (a kn ) T hj

Positive

a kn Negative

hj

Fig. 1.

Maximization of between-class variance.

2.2 Minimization of Within-Class Variance Next, a method for achieving minimization of within-class variance is introduced. In Fig.2, data vector Xj in a class is shown as a black dot. The vector Xj is zero mean in k

the class. The relationship between the vector Xj and arbitrary vector a n which converges to an eigenvector is considered. The direction to minimize a vector length of the projected Xj is an orthogonal direction to Xj. Therefore such an orthogonal direction is computed and summed to minimize the within-class variance. Let bj be a k

direction where Xj direction is subtracted from the vector a n . This calculation is easy to be done and is given by (3)

bj = a kn - ( xˆ j • a kn ) xˆ j where

xj xˆ j = ∥x∥ j

(4)

Actual learning quantity is given as

∥x∥ j Φi( bj, x j ) = bj ∥b∥ j

(5)

where bj is normalized in size. In Fig.2, bj is the same direction as the dotted line which is perpendicular to Xj. In learning, this quantity is summed and averaged over all sample vectors in the same class. In this case, weighted summation and averaging by the vector length of Xj are carried out. This weighted summation achieves calculation considering an influence of component that vector norm is larger. Then it is hoped that it converges to a direction to minimize the within-class variance.

Φkn = ∑i=1Ni Φ( y n , hi ) + ∑i=1 ∑ j=1 Φi(bj, xj )

(6)

Φkn = Φkn

(7)

c

c

a

k +1 n

Ni

where Ni is the number of data in the same class i. As shown in eq.(6), the maximization of between-class variance and the minimization of within-class variance are simultaneously achieved and after convergence its eigenvector is obtained.

akn bj

Fig. 2.

Xj

Minimization of within-class variance.

3 Computer Simulation In order to show the effectiveness of the simple-FLD, computer simulations for a simple 2-class separation problem and a rotated coin recognition problem are carried out. The data distribution example in the 2-class separation is shown in Fig.3. In Fig.3, data of class 1 is generated by y = 0.5 x - 0.5 + U(-0.01, 0.01), (8) where x is an uniform random number. The generated data is illustrated as a black dot in Fig.3. Data of class 2 is generated by y = 2 x + 1.0 + U(-0.01, 0.01). (9) y y

Class 2 x Class 1

Fig. 3.

2-class separation problem in a 2 dimensional space.

In Fig.3, data of class 2 is illustrated as a square. U(-0.01, 0.01) means an uniform random number between –0.01 and 0.01. The number of data is 200 for each class. These data are classified by using the simple-PCA and the simple-FLD followed by the minimum distance classifier. They generate a new feature by performing the inner product (projection to eigenvector) between the eigenvector and data. In this case, the length of the eigenvector is 1. Results obtained by them are illustrated in Figs. 4 and 5. Each figure shows obtained eigenvector and data projection to it. The number of learning iteration of the simple-PCA and the simple-FLD is about 10 to 20 times. The inner product of both the eigenvectors is 0.0044. They are almost orthogonal. As shown in Figs. 4 and 5, the features by the simple-FLD can achieve almost 100 % classification accuracy by using only a simple threshold processing. On the one hand, it is difficult classify the features obtained by the simple-PCA by a threshold processing. The distribution of 2-class data by the simple-PCA is overlapping. We tested convergence property by changing initial values of eigenvector and then obtained a similar eigenvector. Furthermore a matrix type linear discriminant analysis was computed for the same data. Its eigenvector was almost the same direction as that of the simple-FLD. Next, we evaluated the feature generation property for the rotated coin recognition problem [10]. The coins are a Japanese 500 yen coin and a South Korean 500 won coin. They have the same size, weight, and color. Their head and tail are also

recognized. Therefore four kinds of coin images are recognized by using the simple minimum distance classifier. The learning of eigenvectors is carried out using 10 samples for each class. The rest 40 data of each class is used for evaluation. The total number of learning data and test data is therefore 40 and 160, respectively. The coin image data is transformed by 32x32 points FFT, because they are randomly rotated and therefore rotational invariant features are needed. The data dimension is 544 by symmetry of spectra. The accumulated relevance is greater than 80%, if the number of eigenvectors is 20. In Table 1, EV is the number of eigenvectors obtained. The rows labeled S-FLD and S-PCA are the classification accuracy obtained by feature generations of the simple-PCA and the simple–FLD, respectively. y Eigenvector

x

Fig. 4.

A resulting eigenvector obtained by the simple-PCA. The vector direction is (0.7115678, 0.702618). y

x Eigenvector Fig. 5.

A resulting eigenvector obtained by the simple-FLD. The vector direction is (0.705755, -0.708456). Table 1. Recognition accuracy for coin recognition. EV S-FLD S-PCA

1 73.4 60.0

2 98.1 84.4

3 98.1 88.4

4 100.0 90.0

5 100.0 96.6

10 100.0 95.6

20 100.0 95.3

As shown in Table 1, recognition accuracy by the simple-FLD is better than that of the simple-PCA. The simple-FLD can achieve 100 % recognition accuracy even by using the minimum distance classifier. More sophisticated classifiers can improve accuracy.

4 Conclusion We presented the simple-FLD to generate features suitable for pattern recognition. This is derived by geometrical interpretation of the FLD analysis. This method was applied to 2-class separation and rotated coin recognition problems. From the results of computer simulation, it is shown that feature generation property of the simpleFLD is better than that of the simple-PCA.

References 1.

R.O.Duda and P.E.Hart, Pattern Classification and Scene Analysis, John Wiley & Sons (1973) 2. P.N.Belhumeur, J.P.Hespanha, and D.J.Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.19, No.7, pp.711-720 (1997) 3. T.D. Sanger, “Optimal Unsupervised Learning in a Single Layer Linear Feedforward Neural Network,” Neural Networks, Vol. 2, No. 6, pp. 459-473 (1989) 4. S.Y. Kung, Digital Neural Networks, Prentice-Hall, (1993) 5. M.Partridge and R. Calvo, “Fast dimentionality reduction and simple PCA”, IDA, 2, pp.292-298 (1997) 6. M.Nakano, F.Yasukata, and M.Fukumi: "Recognition of Smiling Faces Using Neural Networks and SPCA", International Journal of Computational Intelligence and Applications, Vol.4, No.2, pp.153-164 (2004) 7. H. Takimoto, Y. Mitsukura, M. Fukumi and N. Akamatsu, "A Feature Extraction Method for Personal Identification System by Using Real-Coded Genetic Algorithm", Proc. of 7th SCI'2003, Orlando, USA, Vol. IV, pp.66-70 (2003). 8. B.Scholkopf, et al., “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Technical Report No.44, Max-Planck-Institute, Germany (1996) 9. M.H.Yang, “Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel methods,” Proc. of Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp.215-220, Washington, D.C. (2002) 10. M.Fukumi, S.Omatu, and Y.Nishikawa, "Rotation Invariant Neural Pattern Recognition System Estimating a Rotation Angle", IEEE Trans. on Neural Networks, Vol.8, No.3, pp.568-581 (1997)