Discriminant Analysis of Principal Components for Face Recognition W. Zhao
R. Chellappa
Center for Automation Research University of Maryland College Park, MD 20742-3275
Abstract
In this paper we describe a face recognition method based on PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis). The method consists of two steps: rst we project the face image from the original vector space to a face subspace via PCA, second we use LDA to obtain a best linear classi er. The basic idea of combining PCA and LDA is to improve the generalization capability of LDA when only few samples per class are available. Using PCA, we are able to construct a face subspace in which we apply LDA to perform classi cation. Using FERET dataset we demonstrate a signi cant improvement when principal components rather than original images are fed to the LDA classi er. The hybrid classi er using PCA and LDA provides a useful framework for other image recognition tasks as well.
1 Introduction
The problem of automatic face recognition is a composite task that involves detection and location of faces in a cluttered background, normalization, recognition and veri cation. Depending on the nature of the application, e.g. sizes of training and testing database, clutter and variability of the background, noise, occlusion, and nally speed requirements, some of the subtasks could be very challenging. Assuming that segmentation and normalization haven been done, we focus on the subtask of person recognition and veri cation and demonstrate the performance using a testing database of about 3800 images. There have been many methods proposed for face recognition. And one of the key components of any methods is facial feature extraction. Facial feature could be a gray-scale image, a low-dimensional abstract feature vector, and it could be either global or local. There are two major approaches to facial feature extraction for recognition, holistic template matching based systems and geometrical local feature based schemes [1]. The algorithm we present belongs to the rst category. Partially supported by the ONR MURI contract N0001495-1-0521, under ARPA order C635.
A. Krishnaswamy Electrical Engineering Dept Stanford University Stanford, CA 94305
2 LDA of Principal Components face recognition system 2.1 PCA and LDA
Principal Component Analysis is a standard technique used to approximate the original data with lower dimensional feature vectors [2]. The basic approach is to compute the eigenvectors of the covariance matrix, and approximate the original data by a linear combination of the leading eigenvectors. The mean square error (MSE) in reconstruction is equal to the sum of the remaining eigenvalues. The feature vector here is the PCA projection coecients. PCA is appropriate when the samples are from one class or group(superclass). On the other hand, LDA produces an optimal linear discriminant function f (x) = W T x which maps the input into the classi cation space on which the class identi cation of this sample is decided based on some metric such as Euclidean distance. A typical LDA training is carried out via scatter matrices analysis [2]. We compute the within and between-class scatter matrices as follows: 1 Sw = M
Sb =
1
X M
M i=1
X M
P r(Ci)i
(1)
P r(Ci)(mi ? m)(mi ? m)T
(2)
i=1
Here Sw is the Within-class Scatter Matrix showing the average scatter i of the sample vectors x of dierent class Ci around their respective mean mi:
i = E [(x ? mi)(x ? mi )T jC = Ci] (3) Similarly Sb is the Between-class Scatter Matrix , representing the scatter of the conditional mean vectors mi 's around the overall mean vector m. Various measures are available for quantifying the discriminatory power [2], the commonly used one being, T SwW k : (4) J (W ) = kk W W T SbW k
Here W is the optimal discrimination projection and can be obtained via solving the generalized eigenvalue problem [10]: Sb W = Sw W (5) The distance measure used in the matching could be a simple Euclidean, or a weighted Euclidean distance. It has been suggested that the weighted Euclidean distance will give better classi cation than the simple Euclidean distance [6], where the weights are the normalized versions of eigenvalues de ned in (5). But it turns out that this weighted measure is sensitive to whether the corresponding persons have been seen before during the training stage. To account for this, we devised a simple scheme to detect whether the person in the testing image has been trained on or not and then use either a weighted Euclidean distance or simple Euclidean distance respectively.
2.2 LDA of Principal Components
Both PCA and LDA had been used for face recognition [3, 4, 5, 6, 7]. With PCA, the input face images usually needed to be warped to a standard face because of the large within-class variance [4, 5]. This preprocessing stage reduces the within-class variance dramatically, thus improving the recognition rate. We rst built a simple system based on pure LDA [6], but the performance was not satisfactory on a large dataset of persons not present in the training set. Although the pure LDA algorithm does not have any problem discriminating the trained samples, we have observed that it does not perform very well for the following three cases: 1. when the testing samples are from persons not in the training set 2. when markedly dierent samples of trained classes are presented 3. samples with dierent background are presented Basically this is a generalization problem since the pure LDA based system is very much tuned to the speci c training set, which has the same number of classes as persons, with 2 or 4 samples per class! Although this maybe seen as a drawback, this actually work well for the veri cation task. As Pentland et al [3] have observed, we also notice that the face images occupies a face subspace. Thus it makes sense to compare face images only in the face subspace. By projecting the original images into this subspace, we hope that we can solve the above generalization problem to some extent. We have also noticed that PCA is a conventional method to obtain such a subspace. Thus our natural choice becomes combining PCA and LDA which has been previously explored by Weng et al [7]. Combining PCA and LDA, we obtain a linear projection which maps the input image x rst into the face-subspace y, and then into the classi cation space z: y = (x ? m) (6) z = WyT y (7) T z = Wx (x ? m) (8)
where is the PCA transform, and Wy is the best linear discriminating transform in PCA feature space. After this composite linear projection, the classi cation is performed in the classi cation space based on some distance measure criterion. One thing need to be pointed out is that for the implementation of PCA, in many cases, even though the covariance matrix is a full-rank matrix, the large condition number will create a numerical problem. One way around this is to compute the eigenvalues and eigenvectors for C + I instead of C , where is a positive number. This is based on the following lemma:
Lemma 1 Matrices C and C + I have same eigen-
vectors but dierent eigenvalues with the relationship: = + as long as + is not equal to zero.
C +I
3 Experiments
To process the face images, we manually locate the eyes and then perform geometric normalization with the eye locations xed and perform intensity normalization, histogram equalization or zero mean unit variance. The normalized image size is chosen to be 4842 since similar performance has been observed with the image size 96 84 in our experiments. To obtain the principal components, we used 1038 FERET images from 444 classes. Then we retained eigenvectors corresponding to the top 300 eigenvalues, based on the observation that the higher order eigenvectors do not look like a face ( gures 7, 8). A wrong choice of this number will result in bad performance. We have tested the algorithm that performs LDA on principal components using the rst 15 eigenvectors and 1000 eigenvectors on both USC dataset and Stirling dataset. Both choices produced lower scores while the latter choice did better than the pure LDA algorithm. Since an orthonormal linear projection can be viewed as projection onto a set of bases, we can visualize these bases. Three dierent sets of bases from three dierent linear projections are shown here: (1) pure LDA projection W ( gure 5), (2) pure PCA projection ( gure 7), and (3) PCA + LDA projection Wx ( gure 6). All these bases are computed using the FERET training set, the PCA + LDA bases being based on the rst 300 PCA bases. Through this visualization, we can clearly see that subspace LDA provide a constrained version of pure LDA, thus has better generalization.
3.1 Our experiments
All the experiments conducted here are similar to FERET test: we have a gallery set and a probe set. In the prototyping stage, the weights that characterize the projections of images in the gallery set are computed. In the testing stage, the weights that characterize the projections of images in the probe set are calculated. Using these weights and the nearest-neighbor criterion, for each image in the probe set a rank ordering of all the images in the gallery set is produced. The cumulative match score in gure 2 is computed the same way as in FERET test [9].
3.1.1 Comparison of LDA and LDA of Principal Components
To test our system, we constructed a gallery set which contains 738 images, with 721 from the FERET training set and 17 from the USC dataset [8]. The probe set has 115 images with 78 images are in the training set, 18 images are from FERET data set but not trained, and 19 images from USC dataset. For those 78 trained images, both system works perfectly even though most of these images do no appear in the gallery set. But for the other 18 and 19 images from the FERET & USC datasets, the performance between these two methods is quite dierent. Figure 2 shows the performance comparison between pure LDA with dierent intensity preprocessing and LDA of principal components with histogram equalization preprocessing.
3.1.2 Sensitivity test of LDA of Principal Components In addition to the above experiments, we also conducted a sensitivity test of our system. We took one original face image, and then electronically modi ed the image by creating occlusions, applying Gaussian blur, randomizing the pixel location, and adding arti cial background. Figure 3 shows the various electronically-modi ed face images which have been correctly identi ed.
3.2 FERET test
Although we are not one of the participants in the FERET program, we agreed to take the FERET test in September 1996 to test the ecacy of the pure LDA approach. The gallery and probe datasets had 3323 and 3816 images respectively. Thus for each image in the probe set we produced a set of 3323 ordered images from the gallery set. The detailed description of the FERET test can be found at [9]. In March 1997, we re-took the FERET test to test the eect of dierent intensity preprocessing for LDA and also to test the improvement due to LDA of Principal Components. Figure 4 shows a signi cant improvement of LDA of principal components approach over LDA in every category 1 . More recently, some preliminary results show that our system's performance on the task of person veri cation is very competitive.
4 Conclusions
We have presented in this paper a face recognition system which combines PCA and LDA. Performance improvement of this method over pure LDA based method is demonstrated through our own experiments and FERET test. We believe that by combining PCA and LDA, using PCA to construct a task-speci c subspace and then applying LDA on that subspace, other image recognition systems such as ngerprint, optical
1 Even though the zero-mean-unit-variance preprocessing showed better results for pure LDA approach than histogramequalizationon the experimentreported in gure 2, the FERET test showed inferior performance. The plots here are only for the histogram-equalization preprocessing case.
character recognition can be improved. We will study the subspace-LDA approach in detail and explore the possible applications in future work.
Acknowledgments
We would like to thank Saad Sirohey for participating the FERET test, and Hyeonjoon Moon for providing us the FERET test plots.
References
[1] R. Chellappa, C.L. Wilson and S. Sirohey, \Human and Machine Recognition of Faces, A Survey," Proc. of the IEEE, Vol. 83, pp. 705-740, 1995. [2] K. Fukunaga, Statistical Pattern Recognition, New York: Academic Press, 1989. [3] M. Turk and A. Pentland, \Eigenfaces for Recognition," Journal of Cognitive Neuroscience, Vol. 3, pp. 72-86, 1991. [4] B. Moghaddam and A. Pentland, \Probabilistic Visual Learning for Object Detection," in Proc. International Conference on Computer Vision, Boston, MA, 1995, pp. 786-793. [5] N. Costen, I. Craw, T. Kato, G. Robertson and S. Akamatsu, \Manifold Caricatures: on the Psychological Consistency of Computer Face Recognition," in Proc. Second International Conference on Automatic Face and Gesture Recognition, Killington, Vermont, 1996, pp. 4-9. [6] K. Etemad and R. Chellappa, \Discriminant Analysis for Recognition of Human Face Images," Journal of Optical Society of America A, pp. 1724-1733, Aug. 1997. [7] D.L. Swets and J. Weng, \Using Discriminant Eigenfeatures for Image Retrieval," IEEE trans. on PAMI, Vol. 18, pp. 831-836, Aug. 1996. [8] B.S. Manjunath, R. Chellappa and C.V.D. Malsburg, \A Feature Based Approach to Face Recognition," in Proc. of Computer Vision and Pattern Recognition, Urbana Champaign, Illinois, 1992, pp. 373-378. [9] P.J. Philips, H. Moon, P. Rauss, and S.A. Rizvi, \The FERET Evaluation Methodology for FaceRecognition Algorithms," in Proc. of Computer Vision and Pattern Recognition, Puerto Rico, 1997, pp. 137-143. [10] S.S. Wilks, Mathematical Statistics, New York: Wiley, 1963.
Preprocessing Photometrical Geometrical Gallery Image
LDA Projection
PCA Projection
Probe Image
Metric Eculidean Yes Weighted No
Figure 1: The generalized LDA face recognition system
Comparison of performaces: gallery trained, probe not trained
Comparison of performaces: gallery not trained, probe not trained
1
1
0.9 0.9
0.8
0.7
Cumulative Match Score
Cumulative Match Score
0.8 solid: PCA+LDA (HIST) 0.7
dashed: LDA (ZMUV) dotted: LDA (HIST)
0.6
0.6 solid: PCA+LDA (HIST) 0.5
dashed: LDA (ZMUV) dotted: LDA (HIST)
0.4
0.3
0.5
0.2 0.4
0.1
0.3
0
100
200
300
400 rank
500
600
700
800
0
0
100
200
300
400 rank
500
600
700
800
(a) (b) Figure 2: (a) Performance comparison on the 19 images from USC dataset, (b) Performance comparison on the 18 images from FERET dataset but not included in the training set.
Original image
Figure 3: Electronically-modi ed images which have been correctly identi ed.
FA vs FC 1.0
0.9
0.9
0.8
0.8
Cumulative Match Score
Cumulative Match Score
FA vs FB 1.0
0.7 0.6 0.5 0.4 0.3 LDA (Sep 96) PCA_LDA (Mar 97)
0.2
0.6 0.5 0.4 0.3 LDA (Sep 96) PCA_LDA (Mar 97)
0.2
0.1 0.0
0.7
0.1 0
10
20
30
40
50 60 Rank
70
80
90
0.0
100
0
10
20
30
(a)
70
80
90
100
Duplicate (images taken at least one year apart) 1.0
0.9
0.9
0.8
0.8
Cumulative Match Score
Cumulative Match Score
Duplicate
0.7 0.6 0.5 0.4 0.3 LDA (Sep 96) PCA_LDA (Mar 97)
0.7 0.6 0.5 0.4 0.3 LDA (Sep 96) PCA_LDA (Mar 97)
0.2
0.1 0.0
50 60 Rank
(b)
1.0
0.2
40
0.1 0
10
20
30
40
50 60 Rank
70
80
90
100
0.0
0
10
20
30
40
50 60 Rank
70
80
90
(c) (d) Figure 4: FERET test results from September 96 and March 97: (a)FA vs FB, (b)FA vs FC, (c)Duplicate, (d)Duplicate (images taken at least one year apart) (Courtesy of Army Research Lab)
100
Figure 5: The rst ve pure LDA bases
Figure 6: The rst ve PCA + LDA bases
The average face and rst four eigenfaces
Eigenfaces 15, 100, 200, 250, 300 Figure 7: Useful eigenfaces
Eigenfaces 400, 450, 1000, 2000 Figure 8: Suspicious eigenfaces: statistically insigni cant.