A Hybrid Color and Frequency Features Method for Face Recognition

Report 2 Downloads 109 Views
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 10, OCTOBER 2008

1975

A Hybrid Color and Frequency Features Method for Face Recognition Zhiming Liu and Chengjun Liu

Abstract—This correspondence presents a novel hybrid Color and Frequency Features (CFF) method for face recognition. The CFF method, which applies an Enhanced Fisher Model (EFM), extracts the complementary frequency features in a new hybrid color space for improving face recognition performance. The new color space, the RIQ color space, component image of the RGB color space and the which combines the chromatic components and of the YIQ color space, displays prominent capability for improving face recognition performance due to the complementary characteristics of its component images. The EFM then extracts the complementary features from the real part, the imaginary image in the frequency domain. The part, and the magnitude of the complementary features are then fused by means of concatenation at the feature level to derive similarity scores for classification. The complementary feature extraction and feature level fusion procedure applies to the and component images as well. Experiments on the Face Recognition Grand Challenge (FRGC) version 2 Experiment 4 show that i) the hybrid color space improves face recognition performance significantly, and ii) the complementary color and frequency features further improve face recognition performance. Index Terms—Enhanced fisher model (EFM), face recognition grand challenge (FRGC), the RIQ color space.

I. INTRODUCTION Face recognition has become a very active research area in pattern recognition and computer vision due to its broad applications in humancomputer interaction, security, law enforcement, and entertainment [2], [3], [9], [10], [12], [13]. The recent Face Recognition Grand Challenge (FRGC) program reveals that uncontrolled illumination conditions pose grand challenges to face recognition performance [14]. Traditional face recognition methods, such as the Eigenfaces method [17] and the Fisherfaces method [1], have difficulties in tackling these grand challenge problems. Recent research reveals that color information and frequency features derived by means of the Discrete Fourier Transform (DFT) help improve face recognition performance [6], [8], [15]. This correspondence presents a novel hybrid Color and Frequency Features (CFF) method for face recognition. A new color space (RIQ), which combines the R component image of the RGB color space and the chromatic components I and Q of the YIQ color space, displays prominent capability for improving face recognition performance due to the complementary characteristics of its component images. An Enhanced Fisher Model (EFM) [11] then extracts the complementary features from the real part, the imaginary part, and the magnitude of the R image in the frequency domain. Note that in the frequency domain, a

Manuscript received July 19, 2007; revised July 2, 2008. Current version published September 10, 2008. This work was supported in part by Grants 2006-IJ-CX-K033 and 2007-RG-CX-K011 awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. The opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect those of the Department of Justice. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Dan Schonfeld. The authors are with the Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102 USA (e-mail: [email protected]; chengjun. [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2008.2002837

Fig. 1. Example color component images in the RGB and the YIQ color spaces.

frequency set selection method is applied to derive the complementary frequency features that have different discriminating power for face recognition. The complementary features are then fused by means of concatenation at the feature level to derive similarity scores for classification. The complementary feature extraction and feature level fusion procedure applies to the I and Q component images as well. The similarity scores derived from the R; I; and Q images are finally fused by means of a weighted summation at the decision level for face recognition. Experimental results using the FRGC version 2 Experiment 4 show that the CFF method achieves the face verification rate (ROC III) of 80.3% at the false accept rate of 0.1%, compared to the FRGC baseline performance with the face verification rate of 11.86% at the same false accept rate. II. HYBRID COLOR SPACE: RIQ A color image in the RGB color space consists of the red, green, and blue component images. Other color spaces are calculated from the RGB color space by means of either linear or nonlinear transformations. The complementary characteristics of color spaces can be applied to improve face recognition performance [9], [15]. Our research reveals that fusing features across color spaces can enhance the discriminating power of the hybrid color features. As the R component image in the RGB color space is more effective than the luminance [15], we define a new hybrid color space, the RIQ color space, where R is from the RGB color space and I and Q are from the YIQ color space. Fig. 1 shows some example color component images in the RGB and the YIQ color spaces, respectively. Specifically, the top row shows the R; G, and B component images in the RGB color space, and the bottom row displays the Y; I; and Q component images in the YIQ color space. Our hybrid Color and Frequency Features (CFF) method extracts the complementary frequency features in the new hybrid RIQ color space for improving face recognition performance. Fig. 2 shows the system architecture of the CFF method. First, the R; I; and Q component images in the RIQ color space are derived from the RGB color space. Second, the EFM extracts the complementary frequency features using different masks (see Fig. 4) in the frequency domain from the R; I; and Q component images, respectively. Third, the complementary features are fused (by means of concatenation) at the feature level to derive similarity scores for classification. Finally, the similarity scores derived from the R; I; and Q images are fused together (through a weighted summation) at the decision level for face recognition. III. HYBRID COLOR AND FREQUENCY FEATURES (CFF) METHOD This section presents our color and frequency features method, or the CFF method, which derives the complementary features in the fre-

1057-7149/$25.00 © 2008 IEEE

Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 30, 2008 at 12:09 from IEEE Xplore. Restrictions apply.

1976

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 10, OCTOBER 2008

Fig. 4. Frequency pattern vectors by means of frequency set selection in the hybrid color space.

Fig. 2. System architecture of the CFF method.

domain have the same resolution of 64 2 64. In the hybrid color space, the R; I , and Q component images apply some different masks to extract their frequency features, respectively. In particular, Fig. 4 shows that the R component image first applies two masks, 8216 and 32264 , to extract the frequency features from the real and the imaginary parts, respectively, and then utilizes the mask, 32264 , to extract the frequency features from the magnitude in the frequency domain. The frequency features extracted corresponding to these R R R R masks are R r;8216 ; i;8216 ; r;32264 ; i;32264 , and m;32264 , respectively. After reshaping and concatenating the real and imaginary features, we have three frequency pattern vectors resulting from the R R R R component image: ri; 25621 ; ri;409621 , and m;204821 . Fig. 4 also reveals that the I component image first applies three masks, 32264 ; 29258 , and 27254 , to extract the frequency features from the real and the imaginary parts, respectively, and then applies the mask, 32264 , to extract the frequency features from the magnitude in the frequency domain. The frequency features extracted corresponding to these three masks are I I I I I I r;32264 ; i;32264 ; r;29258 ; i;29258 ; r;27254 ; i;27254 , and I m;32264 , respectively. After reshaping and concatenating the real and imaginary features, we have four frequency pattern vectors resulting I I I from the I component image: ri; 409621 ; ri;336421 ; ri;291621 , I and m;204821 . Similarly, the four frequency pattern vectors resulting from the Q component image are as follows: Q Q Q Q ri;409621 ; ri;336421 ; ri;291621 , and m;204821 .

M

M

X

Fig. 3. Two-dimensional discrete Fourier transform of a face image: the real part (log plot), the imaginary part, and the magnitude (log plot). The gray area defines a mask, which is used to extract the face information in the Fourier domain.

quency domain of the component images in the hybrid color space, RIQ. A. Frequency Set Selection in the Hybrid Color Space Fourier transform is able to convert an image from the spatial domain to the frequency domain, where the image is decomposed into the combination of various frequencies. Applying this technique, one can extract the salient image properties in the frequency domain that are often not available in the spatial domain. Fig. 3 shows the 2-D discrete Fourier transform of a face image: the real part (log plot), the imaginary part, and the magnitude (log plot). The last image in Fig. 3 defines a mask (the gray area), which is used to extract the face information in the Fourier domain. Before extracting the hybrid color and frequency features, we perform data reduction by means of frequency set selection of the real part, the imaginary part, and the magnitude in the frequency domain. Fig. 3 shows a mask of size n 2 2n; n22n , which is defined as the gray sub-region that extracts the frequency features of the real part, the imaginary part, and the magnitude from the right two quadrants. As the face images in our research have the spatial resolution of 64 2 64, the real part, the imaginary part, and the magnitude in the frequency

M

X

M

X

X X X

X

X

X

M

X

X

M

X

X

X

X

X

X

M

X

X

M

X

X

X

X

B. Hybrid Color and Frequency Feature Extraction The frequency pattern vectors derived in the hybrid color space are further processed using the EFM method [11] to extract the hybrid color and frequency features. Next we briefly review the EFM method and then apply it for feature extraction. The EFM method first applies PCA 2 N to reduce the dimensionality of the input pattern vector. Let

X

be a random vector representing a frequency pattern vector. The new pattern vector with reduced dimensionality by means of PCA may be derived as follows:

Y = [  1 1 1 m]t X 1 2

Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 30, 2008 at 12:09 from IEEE Xplore. Restrictions apply.

(1)

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 10, OCTOBER 2008

1977

TABLE I FACE VERIFICATION RATE AT 0.1% FALSE ACCEPT RATE OF THE COLOR COMPONENT IMAGES, THE YIQ AND RIQ COLOR SPACES

TABLE II FACE VERIFICATION RATES (ROC III) AT 0.1% FALSE ACCEPT RATE OF THE COLOR COMPONENT IMAGES BEFORE AND AFTER ILLUMINATION NORMALIZATION

TABLE III FACE VERIFICATION RATE AT 0.1% FALSE ACCEPT RATE OF THE FREQUENCY FEATURES, AND THE NEW CFF METHOD

AND

Fig. 5. Hybrid color and frequency feature extraction, fusion, and classification component image using the EFM method. for the

Fig. 6. Example FRGC images cropped to the size of 64

64.

where 1 2 1 1 1 m (m < N ) are the eigenvectors corresponding to the largest eigenvalues of the covariance matrix of . The reduced pattern vector , however, contains only the most expressive features that are not suitable for pattern classification. One solution to this problem is to apply the Fisher linear discriminant, or FLD, to achieve high separability among the different pattern classes [4]. FLD derives a projection basis that maximizes the criterion J1 = tr(Sw 01 Sb ), where Sw and Sb are the within-class scatter matrix and the between-class scatter matrix, respectively [4]. The FLD method, however, often leads to overfitting with poor generalization performance. The EFM addresses the overfitting problem and displays enhanced generalization performance [11]. Specifically, the EFM method improves the generalization capability of the FLD method by decomposing the FLD procedure into a simultaneous diagonalization of the within- and between-class scatter matrices [11].

Y

X

The simultaneous diagonalization is stepwise equivalent to two operations as pointed out by Fukunaga [4]: whitening the within-class scatter matrix and diagonalizing the between-class scatter matrix using the transformed data. The stepwise operation shows that during whitening the eigenvalues of the within-class scatter matrix appear in the denominator. Since the small (trailing) eigenvalues tend to capture noise [11], they cause the whitening step to fit for misleading variations and thus generalize poorly to new data. To achieve enhanced performance, the EFM method preserves a proper balance between the need that the selected eigenvalues account for most of the spectral energy of the raw data (for representational adequacy), and the requirement that the eigenvalues of the within-class scatter matrix (in the reduced PCA space) are not too small (for better generalization performance) [11]. The EFM method further processes the frequency pattern vectors derived in the hybrid color space to extract the hybrid color and frequency features. Taking the R component image as an example, we next show the procedure about how to extract, fuse, and classify the hybrid color and frequency features. Fig. 5 shows the hybrid color and frequency feature extraction, fusion, and classification for the R component image using the EFM method. Specifically, after the frequency set selection

Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 30, 2008 at 12:09 from IEEE Xplore. Restrictions apply.

1978

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 10, OCTOBER 2008

Fig. 7. FRGC version 2 Experiment 4 face recognition performance (the ROC III curves) using the hybrid color space (RIQ ROC III), the YIQ color space (YIQ component image ( ROC III), the component image ( ROC III), the component image ( ROC III), and the component image ( ROC III), the ROC III), respectively. The FRGC baseline performance (Baseline ROC III) using gray scale images is also included for comparison.

and feature concatenation, we have three frequency pattern vectors R 409621 , and resulting from the R component image: R ri;25621 ; ri; Rm;204821 . These frequency pattern vectors are then processed by the EFM to extract the low-dimensional EFM features. The three EFM feature vectors are fused by means of concatenation to form another feature vector. This fused EFM feature vector is further processed by the EFM to derive the final feature vector for the computation of the similarity scores.

X

X

X

IV. EXPERIMENTS This section assesses the hybrid Color and Frequency Features (CFF) method on a face recognition task using the FRGC version 2 database [14]. The FRGC baseline algorithm reveals that the FRGC Experiment 4, which is designed for controlled single still image versus uncontrolled single still image, is the most challenging FRGC experiment [14]. We, therefore, choose the FRGC version 2 Experiment 4 to assess our CFF method. In particular, the Training set contains 12 776 images that are either controlled or uncontrolled. The Target set has 16 028 controlled images and the Query set has 8 014 uncontrolled images. While the faces in the controlled images have good image resolution and good illumination, the faces in the uncontrolled images have lower image resolution and larger illumination variations. These uncontrolled factors pose grand challenges to the face recognition performance. The face images used in our experiments are normalized to 64 2 64 to extract the facial region, which contains only face and the performance of face recognition is thus not affected by the factors not related to face, such as hair styles. Fig. 6 shows some example FRGC images used in our experiments that are already cropped to the size of 64 2 64 to extract the facial region. In particular, the first image is the controlled target image, and the remaining three images are the uncontrolled query images.

Fig. 8. Example malization results.

and

component images and the illumination nor-

The face recognition performance for FRGC version 2 Experiment 4 is reported using the Receiver Operating Characteristic (ROC) curves, which plot the Face Verification Rate (FVR) versus the False Accept Rate (FAR). The ROC curves are generated by the Biometric Experimentation Environment (BEE), when a similarity matrix is provided to the system. As the similarity matrix stores the similarity score of every target versus query image pair, the size of the similarity matrix is T 2Q, i.e., the number of target images T times the number of query images Q (T = 16 028 and Q = 8 014 for FRGC version 2 Experiment 4). The first set of our experiments assesses the face recognition performance using the R; Y; I; Q component images, and the YIQ and the RIQ color spaces, respectively, on the FRGC version 2 Experiment 4. In particular, the EFM method first processes each individual component image to derive discriminating features. These features then apply the cosine similarity measure to generate a similarity matrix. The BEE

Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 30, 2008 at 12:09 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 10, OCTOBER 2008

1979

Fig. 9. FRGC version 2 Experiment 4 face recognition performance (the ROC III curves) using the CFF method (CFF method ROC III), the (normalized) frequency features ( frequency features ROC III), the frequency features ( frequency features ROC III), and the frequency features ( frequency features ROC III), respectively. The FRGC baseline performance using gray scale images is also included for comparison.

system finally analyzes the z-score [7] normalized similarity matrix and generates three ROC curves (ROC I, ROC II, and ROC III) corresponding to the images collected within semesters, within a year, and between semesters, respectively [14]. For the YIQ and the RIQ color spaces, the three z-score normalized similarity matrices corresponding to their component images are first fused to form a new similarity matrix using the sum rule. The new similarity matrix is further normalized using the z-score normalization method and then analyzed by BEE to generate the three ROC curves. Fig. 7 shows the face recognition performance (the ROC III curves) on FRGC version 2 Experiment 4 using the hybrid color space (RIQ ROC III), the YIQ color space (YIQ ROC III), the R component image (R ROC III), the Y component image (Y ROC III), the I component image (I ROC III), and the Q component image (Q ROC III), respectively. The FRGC baseline performance (Baseline ROC III) using gray scale images is also included for comparison. Table I lists the face verification rate corresponding to the ROC curves in Fig. 7 at 0.1% false accept rate. The table shows that 1) the R component image possesses more discriminating capability than the Y component image; 2) fusion of individual color component images boosts the face verification performance significantly; and 3) the RIQ color space achieves better face verification performance than the YIQ color space. The second set of our experiments assesses an illumination normalization procedure for improving face recognition performance. As illumination variations pose grand challenge to face recognition, an illumination normalization procedure comprising the gamma correction, Difference of Gaussian (DoG) filtering, and contrast equalization [5], [16] is applied to the R component image to alleviate the effect of illumination variations. This normalization procedure is not applied to the I and Q component images, because the irregular intensity values in the I and Q component images usually lead to unstable illumination

normalization results, which are often detrimental to face recognition. Fig. 8 shows some examples of illumination normalization for the R; I; and Q component images. After the transformation from the RGB color space, the I component image often displays sharper contrast around the eye and the mouth corner regions than the R component image does. When the DoG filtering operation is applied to these regions, a more severe “ringing” effect occurs as shown in Fig. 8. The Q component image, on the other and, contains much impulsive noise, which leads to very poor results after the DoG filtering operation. Table II lists the experimental results of the R; I; and Q component images using the illumination normalization procedure. These results show that the normalization procedure helps the R (but not the I or Q) component image for alleviating the effect of illumination variations. The last set of our experiments assesses the proposed CFF method on the FRGC version 2 Experiment 4. At the decision level, rather than fusing the three similarity matrices by means of a simple summation, we take into consideration of the different contributions by the three component images in the hybrid color space. In particular, we design a fusion method by applying different weights to the similarity matrices generated using the three component images, respectively. The three weights used in our experiments are defined empirically as 0.9, 0.55, and 0.95 corresponding to the R; I; and Q color component images, respectively. Table III shows the performance of the frequency features extracted from the individual component images in the RIQ hybrid color space and the their fused results. Rnorm and Rorig represent the frequency features extracted from the illumination normalized and the original R component image, respectively. CFForig and CFFnorm represent the fusion of the corresponding R frequency features with the I and Q frequency features, respectively. Tables II and III reveal that the illumination normalization helps improve face recognition performance for both the original R component image and the

Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 30, 2008 at 12:09 from IEEE Xplore. Restrictions apply.

1980

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 17, NO. 10, OCTOBER 2008

frequency features extracted from the R image. However, the improvement is more prominent for the former than for the latter. Fig. 9 shows the FRGC version 2 Experiment 4 face recognition performance (the ROC III curves) using the CFF method (CFF method ROC III), the R (normalized) frequency features (R frequency features ROC III), the I frequency features (I frequency features ROC III), and the Q frequency features (Q frequency features ROC III), respectively. The FRGC baseline performance using gray scale images is also included for comparison. These experimental results show that the combination of the hybrid color and frequency features by the CFF method is able to further improve face recognition performance. In particular, the CFF method achieves the face verification rate (corresponding to the ROC III curve) of 80.3% at the false accept rate of 0.1%, compared to the FRGC baseline performance with the face verification rate of 11.86% at the same false accept rate. V. CONCLUSION This correspondence presents a novel hybrid Color and Frequency Features (CFF) method for face recognition. Future research will consider applying kernel methods, such as the multiclass Kernel Fisher Analysis (KFA) method presented in [9], to replace the EFM method for improving face recognition performance. Note that the KFA method achieves, at 0.1% false accept rate, 76% face verification rate without any score normalization [9] and 84% face verification rate with z-score normalization, respectively, for the FRGC version 2 Experiments 4.1 ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their constructive comments and suggestions.

REFERENCES [1] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul. 1997. 1Please see slide 15 of the presentation at the Biometric Consortium Conference 2007, September 11–13, 2007, Baltimore Convention Center, Baltimore, MD. http://www.bio-metrics.org/bc2007/presentations/Wed Sep 12/Session II/12 Liu BT DOJ.pdf.

[2] J. R. Beveridge, D. Bolme, B. A. Draper, and M. Teixeira, “The csu face identification evaluation system: Its purpose, features, and structure,” Mach. Vis. Appl., vol. 16, no. 2, pp. 128–138, 2005. [3] K. W. Bowyer, K. Chang, and P. J. Flynn, “A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition,” Comput. Vis. Image Understand., vol. 101, no. 1, pp. 1–15, 2006. [4] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed. New York: Academic, 1990. [5] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2002. [6] W. Hwang, G. Park, J. Lee, and S. C. Kee, “Multiple face model of hybrid fourier feature for large face image set,” presented at the IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2006. [7] A. K. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodel biometric systems,” Pattern Recognit., vol. 38, pp. 2270–2285, 2005. [8] V. Kumar, M. Savvides, and C. Xie, “Correlation pattern recognition for face recognition,” Proc. IEEE, vol. 94, no. 11, pp. 1963–1976, Nov. 2006. [9] C. Liu, “Capitalize on dimensionality increasing techniques for improving face recognition grand challenge performance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 5, pp. 725–737, May 2006. [10] C. Liu, “The Bayes decision rule induced similarity measures,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 1086–1090, Jun. 2007. [11] C. Liu and H. Wechsler, “Robust coding schemes for indexing and retrieval from large face databases,” IEEE Trans. Image Process., vol. 9, no. 1, pp. 132–137, Jan. 2000. [12] A. J. OToole, H. Abdi, F. Jiang, and P. J. Phillips, “Fusing face recognition algorithms and humans,” IEEE Trans. Syst., Man, Cybern., vol. 37, no. 5, pp. 1149–1155, May 2007. [13] A. J. OToole, P. J. Phillips, F. Jiang, J. Ayyad, N. Penard, and H. Abdi, “Face recognition algorithms surpass humans matching faces across changes in illumination,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 9, pp. 1642–1646, Sep. 2007. [14] P. J. Phillips, P. J. Flynn, T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek, “Overview of the face recognition grand challenge,” presented at the IEEE Conf. Computer Vision and Pattern Recognition, 2005. [15] P. Shih and C. Liu, “Comparative assessment of content-based face image retrieval in different color spaces,” Int. J. Pattern Recognit. Artif. Intell., vol. 19, no. 7, pp. 873–893, 2005. [16] X. Tan and B. Triggs, “Enhanced local texture feature sets for face recognition under difficult lighting conditions,” presented at the IEEE Int. Workshop on Analysis and Modeling of Faces and Gestures, Oct. 2007. [17] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neurosci., vol. 13, no. 1, pp. 71–86, 1991.

Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 30, 2008 at 12:09 from IEEE Xplore. Restrictions apply.