Face Recognition Based on LBP and Orthogonal Rank-One Tensor Projections NuTao Tan 1,2 , Lei Huang 1 , ChangPing Liu 1 1 2
Institute of Automation, Chinese Academy of Sciences, Beijing, China Graduate University of Chinese Academy of Sciences, Beijing, China {nutao.tan, lei.huang, changping.liu}@ia.ac.cn
Abstract In this paper, a novel framework for face recognition based on discriminatively trained orthogonal rank-one tensor projections (ORO) and local binary pattern (LBP) is proposed. LBP is an efficient method for extracting shape and texture information and it is robustness to illumination and expression, while ORO has been successful in appearance based face recognition by finding orthogonal tensors. Accordingly, we propose to reduce the dimension of LBP by ORO. Moreover, we propose to update the k-nearest neighbors when each subspace has been obtained in ORO. The experiments demonstrate that the new ORO can be stabilized more quickly and obtain higher correct rate. Finally, because the computation of LBP is simple and the size of compress matrix of ORO is small, this algorithm is easy to be applied to embedded application.
1. Introduction Automatic face recognition has rapidly developed over the years and is now a highly active field of research, with important applications in security surveyllance, access control, human-machine interaction, and a host of other domains. Among the various features for face recognition, local binary pattern (LBP) [1,2,3] and Gabor wavelet [4] have attracted many researchers because of their insensitive to expression, illumination and wears. And among subspace learning, such as EigenFaces (PCA) [5], FisherFaces (LDA)[5], locality preserving projections (LPP)[6] and local discriminant embedding (LDE) [7], they all treat images data as vectors and have difficulty with high dimensionality. In recent years, tensor subspace learning has attracted many researchers, such as in [8,9,10]. The images data are dealt with as high order tensors that can both preserve the spatial structure of image data and decrease the number of the parameters to be learnt. In [10], Gang Hua proposed an approach based on
978-1-4244-2175-6/08/$25.00 ©2008 IEEE
Orthogonal Rank One (ORO) tensor projections. This algorithm outperforms other tensor methods for its orthogonal projection, but it uses gray image as feature such that it can’t capture the statistical information and texture information from original image. Accordingly, we propose to substitute the image used in ORO by statistical features like LBP histogram which not only can capture local structure but also is very simple in computation versus Gabor wavelet. Meanwhile, based on the original ORO, we suggest updating the knearest neighbors (KNN) when each subspace has been obtained and name this modified ORO as DKORO in this paper. In order to compare with other methods, results of the original ORO and the LBP based on LDA are given in the experiments. Finally, performances of DKORO with different arguments are given and the memories consumed by the compress matrixes of LDA and ORO are analyzed. The remainder of the paper is organized as follows: Sec. 2 introduces the proposed algorithm. Sec.3 presents extensive experimental result and discussions. Finally conclusions are drawn in Sec.4.
2. The Proposed Algorithm 2.1 Local Binary Pattern Introduction The original LBP operator, introduced by Ojala [1], is a powerful method of texture description. The operator labels the pixels of an image by thresholding the 3×3 neighborhood of each pixel with the center value and considering the result as a binary number. Later it was extended to so called uniform patterns (ULBP)[2]. A Local Binary Pattern is called uniform if it contains at most two bitwise transitions from 0 to 1 or vice versa when the binary string is considered circular. For example, 00011110 and 10000011 are uniform patterns. Recently, Ahonen etc.[3] have proposed a face recognition system based on LBP descriptor. They first divide the face image into R non-overlap regions, then calculate
the LBP histograms {H r | r ∈ (0,....R − 1)} from each region and concatenate them into a single spatially enhanced feature histogram efficiently representing the face image. The Histogram of a labeled image f ( x, y ) can be defined as H ir = ∑ I ( f ( x, y ) = i ) , i = 1,..., L x , y∈blockr
where L is the number of different labels produced by the LBP operator and I is a indicator function. Then, the concatenated histogram is defined as In this histogram, the labels contain information about the patterns on a pixel-level, the histogram in a small region produce information on a regional level and the concatenated histogram build a global description of the face.
2.2 ORO Introduction Discriminant and orthogonal rank-one tensor learning (ORO)[10] is a discriminative linear projection in tensor space. Its projections are pursued sequentially and take the form of a rank one tensor which is the outer product of a set of vectors and can be denoted as P = {p0 , p1,..., pn−1} . Given a training set { X i ∈ R m0 ×m1×...×mn−1 }iN=−01 , the objecttive of ORO is to learn a set of d ortho-normal rank one projections p d = { p (0) ,..., p ( d −1) } such that in the projecttive embedding space, the distances of the example pairs in S are minimized, while the distances of those in D are maximized. S and D are defined as D = {(i, j ) | i < j , l (i, j ) = 0, X i ∈ N k ( j ) || X j ∈ N k (i )} S = {(i, j ) | i < j , l (i, j ) = 1, X i ∈ N k ( j ) || X j ∈ N k (i )}
Where N k (i ) denotes the set of KNN of X i , l (i, j ) = 1 if X i and X j are in the same category, and l (i, j ) = 0 otherwise. To achieve this, ORO proposes to maximize a series of local weighted discriminant cost functions. Suppose k discriminant rank one projections indexed from 0 to k-1 have been obtained, ORO pursue the (k+1)th rank one projection by solving the following constrained optimization problem, ∑ D wij || X i :P ( k ) − X j : P ( k ) ||2 (1) max P ( k ) ∑ wij || X i :P ( k ) − X j : P ( k ) ||2 S
P ( k ) ⊥ P ( k −1) ,..., P ( k ) ⊥ P (0)
where X : P = ∑ ...(∑ (∑ xi0 i1 ...in−1 p0i0 ) p1i1 ...) pn −1in−1 in−1
i1
Output: P = { p ,..., p } 1. calculate S and D according to LBP histogram 2. k = 0, iteratively solving non-constrained optimization to (0) (0) obtain p 0 , p1 . k = k + 1 3. Update matrix S and D according to the compressed features in the subspace that have been obtained 4. Randomly initialize p 0( k ) , p1( k ) as a normal vector. k
(0)
( k −1)
(a) Solve the constrained optimization to obtain p 0( k ) and (k )
H = [ H 00 H10 ... H L0−1 H 01 H11 ... H L1 −1 ... H LR−−11 ]
s.t.
* N
Input: {H i }i =1
(2)
i0
|| ⋅ || is the Euclidean distance, and wij is a weight assigned according to the importance of the example pair ( X i , X j ) .
the non-constrained optimization to obtain p1 [10]. (b) Repeat (a) until the optimization of Eq.1 is (k ) converged to obtain p
5. k=k+1. if k < K, repeat step 3 and 4 , else output P k = { p ( 0) ,..., p ( k −1) } Fig. 1. The process of DKORO
To make the optimization more tractable, ORO replace the constraints on Eq.1 with the following stronger constraints: ∃j ∈ {0,..., n − 1}: p (jk ) ⊥ p (jk −1) ,..., p (jk ) ⊥ p (0) j Then this problem can be divided into two optimization problems: a constrained optimization problem that used to obtain p (jk ) and a non-constrained optimization problem that used to obtain { pi( k ) | i = 0,..., n − 1; i ≠ j ) [10].
2.3 Combine LBP and ORO The concatenated histogram is in form of vector, which is not suitable for ORO training, thus we must first re-arrange it to the form of tensor. Suppose the width is w and height is h, then the re-arranged tensor can be denoted as ⎡ H 0 H1 H 2 ...... H w −1 ⎤ ⎥ ⎢ ⎢ H w H w +1 H w + 2 ...... H 2× w −1 ⎥ * H =⎢ ⎥ ............ ⎥ ⎢ ⎢⎣ H w×( h −1) H w×( h −1) +1 ...... H w× h −1 ⎥⎦
The subscript of each element in H * is equal to the position value of the element in the original histogram. Then, the next step is reducing the dimension using ORO. In the original ORO algorithm, the KNN computed from the original feature is unchanged in each loop iteration. There are some disadvantages in that. Firstly, the feature used to recognition is the compressed feature in the embedding space, but the KNN computed in the original space is different from that in the embedding space; secondly, KNN will be changed in the new space when the algorithm proceeds to next iteration. So, the original ORO can’t obtain an optimum result. In order to
Table 1. The error rate(%) on PIE and AR face database. LBP+ORO 4
LBP+DKORO 4
LBP+DKORO8
LBP+DKORO16
11.867
6.894
5.3161
8.079
12.257
8.3114
6.2169
4.5142
6.190
10.159
Database
ORO
LBP+LDA
PIE
12.895
AR
6.882
(a) Results of different methods (b) Results of DKORO with different alignments Fig. 2. Error Rate v.s. Dimensionality on PIE data set.
avoid this, we propose to re-calculate the KNN when each subspace is obtained such that when solving the (k+1)th subspace, the attention can be focused on the KNN within the k subspaces obtained before, which is very useful to obtain an optimum space. We call this new algorithm as dynamic k-nearest neighbors ORO, in short DKORO. Note that, because the number of orthogonal vectors in an m-dimension space is not more than m, the number of tensors obtained by DKORO is smaller than the height of the matrix H * . Suppose the orthogonal condition must be satisfied in the vertical direction, then the process of DKORO can be depicted in Fig1.
3
Experimental Result
The proposed algorithm were trained and tested on two standard face-database: CMU-PIE and Purdue AR. On all datasets, the gray-scale face images are cropped and aligned by fixing the eye locations, and then resized to 128x128. No other pre-processing is performed. For each data set, we randomly split it into training and testing sets. Recognition is performed using a nearest neighbor (NN) classifier based on the Euclidean distance. Uniform LBP is used, and the image is divided into 4×4 regions, thus the total feature dimension is 944. We adopt three different alignments for tensor such as 4 × 236, 8 × 118 and16 × 59, and employ ORO w , DKORO w to denote different alignments, for example,
ORO8 denotes 8×118, DKORO 4 denotes 4×236. We compare the results from our approach with the original ORO and the LBP based on LDA, and give the performances of DKORO with different alignments. Lastly, we analyze the memories consumed by the
compress matrixes of LDA and ORO.
3.1 Face recognition on PIE database The PIE dataset contains 41368 images of 68 people. We used the images of the nearly frontal poses under all illumination conditions and expressions. The eyes were located automatically by program. Finally, a subset of 10667 face images with 150 to 167 images per person was used. We randomly select 680 images that include 10 images per person for training, and the rest for testing. The result is showed in table 1. Note the subscripts of the error rates indicate the dimension of the embedding space where the best error rates are achieved. We use DKORO 4 refers LBP+DKORO 4 and so do to the other methods. From the table, we can see that DKORO 4 achieves the lowest error rate of 5.3% with 161 dimensions, followed by ORO 4 (6.8% with 94 dimensions), DKORO8 (8.0% with 79 dimensions), LDA (11.8% with 67 dimensions), DKORO16 ( 12.2 % with 57 dimensions) and original ORO (12.8% with 95 dimensions). We plot the error rate versus dimension for the different methods in Fig.2. It is clear in Fig.2(a) that DKORO 4 , ORO 4 outperform LDA and the original ORO on all dimensions. This indicates that the shortcoming that lack of statistical information for image can be overcome by LBP feature and that ORO can get a better result than LDA. Besides, from Fig.2(b) we can see that the error rate of DKORO decrease more quickly with smaller w. This is because when w is decreasing, the sum of w+h will increase, thus the information included in a single space will increase.
(a) Results of different methods (b) Results of DKORO with different alignments Fig. 3. Error Rate v.s. Dimensionality on AR data set.
3.2 Face recognition on AR database The Purdue AR dataset contains 3247 face images of 126 persons under different expression, illumination and wears. In the experiment, we selected 117 persons with 14 images that without wears per person. The training set composed of 7 images per person is randomly selected. It includes 819 images, and the rest are for testing set. Eye positions are marked manually. The result is showed in table 1. It shows that DKORO 4 achieves the lowest error rate of 5.3% with 161 dimensions. The error rate versus dimension for the different methods is depicted in Fig.3(a). We can see that DKORO 4 and ORO 4 outperform LDA on all dimensions, but ORO 4 is inferior to the original ORO. And also Fig.3(b) shows that the error rate of DKORO decreases more quickly with smaller w.
3.3 Memory analysis Suppose the dimension of original feature is 944 and the compressed feature is 100, the alignment of ORO is 8 × 118, and all data are 32-bit floating-point style in memory. Under this supposition, a comparison between the memories consumed by the compress matrixes of LDA and ORO are analyzed as follows. The dimension of projection in LDA is equal to that of the original feature, so the memory consumed by its compress matrix is 4 × 944 × 100 = 377, 600 bytes. In contrast, the projection of ORO is composed of two vectors with low dimension. The memory needed by it is only 4 × (8 + 118) × 100 = 50, 400 bytes, which is only 13.3% to LDA compress matrix. Thus, if the face recognition is applied to embedded application whose memory is very limited, reducing dimensions by ORO is a better selection.
4
Conclusion
A novel algorithm for face recognition based on LBP feature and ORO is proposed. Experiments demonstrate
that it outperforms LBP combined with LDA and original ORO. Besides, we modified the original ORO by updating the k-nearest neighbors when each subspace has been obtained, and named this new algorithm with DKORO. Experimental result show that DKORO can be stabilized more quickly and obtain lower error rate than original ORO. Finally, we have discussed the affection of the alignment argument to the performance of DKORO, and have analyzed the difference between LDA and ORO in the memory needed by their compress matrixes. The result shows that reducing dimension by ORO is suitable for embedded application.
References [1]
Ojala.T, Pietikainen.M, Harwood.D. A comparative study of texture measures with classification based on feature distributions. Pattern Recognition . 1996. [2] Ojala.T, Pietikainen.M, Maenpaa.T. Multiresolution gray scale and rotation invariant texture classification with local binary patterns. PAMI. 24:971~987, 2002. [3] Ahonen.T, Hadid.A, Pietikainen.M. Face recognition with local binary patterns. In: Proc. 8th European Conference on Computer Vision (ECCV). 469~481:2004. [4] L.Wiskott, J.M.Fellous, N.Kruger, and C.Malsburg. Face recognition by elastic bunch graph matching. PAMI, 19(7): 775~938, 1997. [5] P. N. Belhumeur, J.P. Hespanha, and D. J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. PAMI. 19(7):711~720, 1997. [6] D.Cai, X.He, J.Han, and H.J.Zhang. Orthogonal laplacian faces for face recognition. IEEE Transaction on Image Processing. 15(11): 3608~3614, 2006. [7] H.T.Chen, H.W.Chang, and T.L.Liu. Local discriminant embedding and its variants.CVPR. San Diego, CA, June . 846~853, 2005.2. [8] X.He, D.Cai, and P.Niyogi. Tensor subspace analysis. In Advances in Neural Information Processing Systems, Vancouver, Canada. 2005 [9] D.Xu, S.Lin, S.Yan, and X.Tang. Rank-one projections with adaptive margins for face recognition. CVPR, NewYork City, NY. 175~181, 2006.1. [10] Gang Hua, P.A.Viola, S.M.Drucker. Face Recognition using Discriminatively Trained Orthogonal Rank One Tensor Projections. CVPR, 2007.