Redundant DWT Based Translation Invariant ... - Semantic Scholar

Report 3 Downloads 83 Views
Redundant DWT Based Translation Invariant Wavelet Feature Extraction for Face Recognition

1

Deqiang Li1, Haibo Luo1,2, Zelin Shi1 Shenyang Institute of Automation, Chinese Academy of Sciences, 110015, Shenyang, P.R.China 2 Graduate School of the Chinese Academy of Sciences {lideqiang, luohb, zlshi}@sia.cn Abstract

Discrete Wavelet Transform (DWT) is sensitive to the translation/shift of input signals, so its effectiveness could be negatively impacted when we encounter translation among signals. To deal with such drawbacks, this paper proposes redundant DWT(RDWT) based method to achieve image registration, translation invariant wavelet feature extraction and face recognition. We select a representative face from each person to form a reference face set and perform DWT on it. For each test face, we perform RDWT and compare its redundant horizontal and vertical details with the corresponding details obtained from the reference face. The reference face that is the most similar to the test face is determined to be the recognized face. Experiments on Yaleface database prove the effectiveness of our RDWT based method.

1. Introduction The multi-resolution analysis of DWT enables us to obtain good localized time/frequency characteristics, such as abrupt changes, spikes, drifts and trends. Therefore, WT has been widely used to extract features for signal classification. Particularly, face recognition[1,2,3,4]. The typical wavelet based feature extraction methods are Local Discriminant Bases (LDB) [1], Joint Best Basis (JBB) [2], and Fuzzy Wavelet Packet (WP) based methods [4]. Since they retain the wavelet’s localized time-frequency property, they are usually good candidates for forming the reduced feature space. JBB, LDB, and Fuzzy WP Supported by National Natural Science Foundation of China 60603097.

978-1-4244-2175-6/08/$25.00 ©2008 IEEE

based methods are effective when all of the signals are already aligned, that is no translation-variation exits among input signals. Such pre-condition is hard to be guaranteed in real applications, because the arrival time/position of valid information is always unpredictable. Presently, research on translation invariant wavelet feature extraction can be mainly categorized into 2 classes. The first class is to seek indirect variance insensitive wavelet features, such as local minimums [5], as well as the mean energy [6]. The second class applies redundancy to extract translation invariant features. Typical method is Redundant DWT, introduced by Beylkin [7], Coifman [8], Liang [9], and Pesquet [10], which decomposes an individual signal to the maximal decomposition level to achieve translation invariant wavelet representation. The advantage of RDWT is that it can obtain L DWT results by only decomposing a L-length signal x(k) and its one step shift instead of decomposing each shift of x(k) respectively. Among L DWTs, there is a DWT with the highest denoising/compression performance defined by a cost function, and the DWT exhibiting the optimality is regarded as the translation invariant wavelet representation. Presently, RDWT has been successfully used in the field of signal denoising [8,9] and compression [10]. The wavelet coefficients obtained by RDWT are the same when two signals are merely circulant shifts to each other. Signal classification is quite different from signal denoising/compression. Signal classification extracts translation invariant features from a group of signals, while signal denoising/compression is always about an individual signal and therefore feature extraction or alignment is not involved. The RDWT methods mainly process the individual signal rather than a group of signals. So far, no systematic research on

RDWT in the field of signal classification has been reported yet. We believe that the difficulties of applying RDWT in signal classification are mainly due to the following aspects: (1) translation invariant representation for a group of varying input signals; (2) signal or feature alignment; (3) discriminating feature extraction. Face recognition is a challenging problem because of varying light illumination and direction variation, different face poses, diversified facial expressions, and in particular position variations. To ensure the effectiveness of DWT based face recognition, wavelet features obtained from various faces must be matched(or aligned) with each other. Otherwise, the disordered feature matching will degrade the performance of face recognition. This paper proposes a novel RDWT based translation invariant wavelet feature extraction method

which can accomplish face recognition, face registration, and translation-invariant wavelet feature extraction at the same time.

2. Redundant discrete wavelet transform RDWT decomposes original signal and its shift-byone on both x and y dimensions to deal with the problem of decimation. Compared to the DWT, it provides the possibility for exploring DWT results for arbitrary shift of the original signal by doubling the computational cost in each dimension. The decomposition structure of the 2-D RDWT is depicted as Figure 1, where Ωmj is the mth sub-sapce on the jth level, Si ,k (Ωmj ) is a circulant shift of Ωmj with a i-step shift on x-direction and a k-step shift on y-direction.

Ω mj DWT (S0,0 (Ωmj ))

DWT (S1,0 (Ωmj ))

Ω4j +m1Ω4j +m1+1Ω4j +m1+2Ω4j +m1+3 Ω16j +m2 Ω16j +m2 +1` L Ω16j +m2 +15

DWT (S0,1 (Ωmj ))

DWT (S1,1 (Ωmj ))

Ω 4j +m1+ 4 L Ω 4j +m1+ 7 Ω 4j +m1+8 L Ω 4j +m1+11 Ω 4j +m1+12 L Ω 4j +m1+15 Ω16j +m2 +16 L Ω16j +m2 +31



Ω16j +m2 + 48 L Ω16j +m2+ 63

Figure 1: 2-D RDWT decomposition structure It can be seen that one-level 2-D RDWT of Ω mj consists of four conventional DWTs on S0,0 (Ωmj ) , j j +1 j +1 j +1 S1,0 (Ωmj ) , S 0,1 (Ω mj ) and S1,1 (Ω m ) . Ω 4 m , Ω 4 m +1 , Ω 4 m + 2

and Ω 4j +m1+3 are approximation, horizontal, vertical and diagonal details of DWT ( S0,0 (Ωmj )) , respectively. The translation relationship between the parent node and its redundant child nodes can be described as: DWT (Sx, y (Ωmj )) = [Sx / 2, y / 2 (Ω4j +m1 ) ⊕ Sx / 2, y / 2 (Ω4j +m1+1 ) ⊕Sx / 2, y / 2 (Ω4j +m1+2 ) ⊕ Sx / 2, y / 2 (Ω4j +m1+3 )]

(1)

for x = 0,2,...,2k; y = 0,2,...,2l j+1 4m+4

DWT(Sx, y (Ω )) =[S⎢⎣x/2⎥⎦, y/2 (Ω j m

j+1 4m+6

⊕S⎣⎢x/2⎦⎥, y/2 (Ω

j+1 4m+5

) ⊕S⎢⎣x/2⎥⎦, y/2 (Ω

j+1 4m+7

) ⊕S⎣⎢x/2⎦⎥, y/2 (Ω

)

)]

(2)

for x =1,3,...,2k +1; y = 0,2,...,2l DWT (Sx, y (Ωmj )) = [Sx /2,⎣⎢ y /2⎦⎥ (Ω4j+m1+8 ) ⊕ Sx /2,⎣⎢ y /2⎦⎥ (Ω4j +m1+9 ) ⊕Sx /2,⎣⎢ y /2⎦⎥ (Ω4j+m1+10 ) ⊕ Sx / 2,⎣⎢ y /2⎦⎥ (Ω4j +m1+11)] for x = 0,2,...,2k; y = 1,3,...,2l +1

(3)

DWT (Sx, y (Ωmj )) = [S⎣⎢ x / 2⎦⎥,⎣⎢ y / 2⎦⎥ (Ω4j +m1+12 ) ⊕ S⎣⎢ x / 2⎦⎥,⎣⎢ y / 2⎦⎥ (Ω4j +m1+13 ) ⊕S⎣⎢ x / 2⎦⎥,⎣⎢ y / 2⎦⎥ (Ω4j +m1+14 ) ⊕ S⎣⎢ x / 2⎦⎥,⎣⎢ y / 2⎦⎥ (Ω4j +m1+15 )]

(4)

for x = 1,3,..., 2k + 1; y = 1,3,..., 2l + 1

From (1) to (4), it is noted that DWT ( S x, y (Ωmj )) can be simply obtained by translating its child subspaces on the j+1th level. To avoid the exponential increase of the sub-space number in RDWT, only approximation node is permitted to be decomposed.

3.Translation-invariant feature extraction Firstly, we perform the DWT on the representative face, which is selected from each class/person. For each test(unknown) face, we perform the RDWT on it, and match its redundant horizontal and vertical details with the corresponding details obtained from all of the reference faces. According to the cross-correlation criterion, we classify the test face to the reference face that is the most similar. This method enables us to deduce the shift step of the test face, and extract its

translation-invariant wavelet features with respect to the reference face. The following two subsections explain the details of the proposed approach.

3.1. Cross correlation criterion Cross correlation is a popular and powerful method to compute the similarity between two signals. We define a Cross Correlation function S_corr between wavelet details of the test face and the reference face by summing two cross-correlations with circulant shifts on x and y directions. Then, find the optimal shift [x*, y*] with the maximized S_corr. S_corr(HD,HDref ,VD,VDref ,x ,y) =Corr(Sx ,y (HD),HDref ) (5) +Corr(Sx ,y (VD),VDref ) [x*,y*] =arg{ max[S_corr(HD,HDref ,VD,VDref ,x ,y)]}

(6)

x, y

for x =−τ0 : +1:τ0 −1, y =−τ0 : +1:τ0 −1 where Corr(X,Y) is the traditional product correlation operation evaluating the similarity between two matrixes X, Y ∈ ℜ2 . τ0 is a positive integer which defines the shift range on x and y directions. HD and VD are redundant horizontal and vertical details of the test face. HDref and VDref are horizontal and vertical details of the reference face.

3.2. Face Recognition and Feature Extraction Horizontal and vertical details contain texture information which is relatively geometrically stable, robust to illumination, and especially useful for image registration. Therefore we adopt them in our method. The procedure of our RDWT based method is described as follows. Step1: Select a representative face from each class as the reference face, and carry out the Jmax-th level DWT on it where Jmax is the maximal decomposition level. Then, generate c reference DWT trees denoted as DWT(i) where i=1,2,…,c. c is class number. Step2: For a test face, perform the Jmax–th level RDWT on it as shown in Figure1, and generate a series of redundant sub-spaces Ω nmax , n=0,1,…, 4 J

J max +1

J max J max J max . We can also face is S x*, y* ({Ω4Jmax k * , Ω4 k *+1 , Ω4 k *+2 , Ω4 k *+3 })

deduce subspace indexes and its corresponding shift steps from level Jmax-1 to level 0 by bottom-up searching according to Eq(1)~(4) to form the translation invariant wavelet coefficients.

4. Experimental Results In this section, we use the well-known Yaleface database to test the performance of the proposed face recognition method. The Yaleface database, available at http://cvc.yale.edu/projects/yalefaces/yalefaces.html, contains 165 face images of 15 persons, and 11 images for each individual in the following configurations: center-light, glasses/no glass, happy, normal, left-light, right-light, sad, sleepy, surprised and wink, and particular different position. Each image was processed as a 8-bit grayscale image with the size of 120×160. We select subject_01 from each person as the reference faces. All of the faces are treated as the test faces. Specifically, there are 15 reference faces and 165 test faces. We perform 2-D DWT on 15 reference faces and 2-D RDWT on test faces with the wavelet ‘db4’ and the maximal level of 3. Experiments are carried out according to the procedure described in Section3.2. Figure 2 shows the face registration performance for person_1 obtained by the RDWT method and the DWT method. The top-left image is the reference face of person_1. The remaining images on the first row and the images on the third row are test faces subject_02 to subject_11. The second row contains the corresponding registration result of the images at the first row, and the fourth row contains the corresponding registration result of the images at the third row. The thin lines are used to illustrate the alignment performance. It is clearly that the shift steps of the test faces with respect to the reference face are correctly calculated. Face registration performance indirectly guarantees that the extracted wavelet features are translation invariant.

-1.

Step3: Match the shift of redundant horizontal and vertical details Sx, y ({Ω4J k +1 , Ω4J k +2}) obtained from the test max

max

face with the corresponding details of the reference faces where k ∈ [0, 4 J − 1] , x ∈ [ −τ 0 ,τ 0 − 1], y ∈ [ −τ 0 ,τ 0 − 1] . Compute the Cross-Correlation function S_corr by (5), determine the corresponding parameters, [k*,x*, y*] by (6), and assign the test face to the person who has the maximal Cross Correlation. Step4: The Jmax–th level translation invariant wavelet coefficients of the test face with respect to its reference max

(a)

(b) Figure2: Face registration performance for person_1.(a)RDWT method, (b)DWT method.

that horizontal and vertical details of an image contain edge or texture information, which is geometrically stable and helpful for image registration. The proposed method also achieves decent preliminary results on face recognition using a shift based correlation strategy between horizontal/vertical details of the test and reference faces. Even though the DWT is compact and concise in computation, it is lack of stability and robustness. RDWT allows us to obtain the DWT results for arbitrary shift of the original signal by doubling the computational cost on each dimension when compared to the DWT. RDWT recursively decomposes approximation nodes, and requires O(LlogL) operations which is acceptable in real applications. In some sense, redundancy provides flexibility and compatibility for more stability and robustness.

References

Figure 3: Face recognition number with RDWT method Figure 3 shows the face classification performance for 15 persons. Total correct recognition number is 149, so the mean recognition rate is 90.3%. Even though the face recognition rate is lower than 99% as obtained in [3], it is the preliminary result based on the simple correlation criterion. The significance of this paper is the RDWT based image registration and translation invariant feature extraction rather than the face recognition. Here, we also emphasize that: (1) The selection of the reference face affects the performance of face registration and feature extraction. Reference faces should be representative, such as with normal facial expression and illumination. (2) As long as all of the reference faces are placed at the same position of an image, the computational cost will be reduced, and the performance of registration and translationinvariant feature extraction will be improved.

5. Conclusions This paper proposes a RDWT based translationinvariant wavelet feature extraction method for face recognition. The translation-invariance is achieved by the image registration between redundant horizontal and vertical details of the test face and the corresponding details of the reference face. It is known

[1] N. Saito, R. R. Coifman, F. B. Geshwind, F. Warner. Discriminant Feature Extraction Using Empirical Probability Density Estimation And a Local Basis Library. Pattern Recognition, 35(12):2841–2852, 2002. [2] R. Coifman, M. V. Wickerhauser. Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2):713–718, 1992. [3] K. C. Kwak, W. Pedrycz. Face Recognition Using Fuzzy Integral and Wavelet Decomposition Method. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 34(1):1666-1675, 2004. [4] Deqiang Li, W. Pedrycz. N. J. Pizzi. Fuzzy and Wavelet Packet Based Feature Extraction Method and Its Application to Signal Classification. IEEE Transactions on Biomedical Engineering, 52(6):1132-1139, 2005. [5] S. Mallat. Zero-Crossing of a Wavelet Transform. IEEE Transactions on Information Theory, 37(4):1019-1033, 1991. [6] Chi-Man Pun, Moon-Chuen Lee. Extraction of Shift Invariant Wavelet Features for Classification of Images with Different Sizes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1228-1233, 2004. [7] G. Beylkin. On the Representation of Operators in Bases of Compactly Supported Wavelets. SIAM.J.Num.Anal, 29(6): 1716-174, 1992. [8] R. R. Coifman, D. L. Donoho. Translation Invariant DeNoising. Wavelet and Statistics, Lecture Notes in Statistics, Springer Verlag, A.Antoniadis and G.Oppenheim,eds,125-150, 1995. [9] Jie Liang, T. W. Parks. A Translation Invariant Wavelet Representation Algorithm with Applications. IEEE Transactions on Signal Processing, 44(2):225-232, 1996. [10] J. C. Pesquet, K. Krim, H. Carfantan. Time-Invariant Orthonormal Wavelet Representations. IEEE Transactions on Signal Processing, 44(8):1964-1970, 1996.