Robust Sparse Coding for Face Recognition

Report 9 Downloads 128 Views
CVPR 2011

Robust Sparse Coding for Face Recognition Meng Yang, Lei Zhang, Jian Yang, David Zhang Hong Kong Polytechnic Univ.

Presenter : 江振國

Outline 

 

Introduction Previous Work Method 



 

Distribution induced weights Iteratively Reweighted Sparse Coding

Experimental Results Conclusion & Discussion

Motivation 

In sparse coding method, it assumes that the coding residual 𝒆 = 𝒚−𝐷𝜶 follows Gaussian or Laplacian distribution. Gaussian Distribution :

Laplacian Distribution :



The assumption may not be hold when the observation 𝒚 is noisy or has many outliers (disguise or occlusions). 

design the fidelity term which minimizes the function associated with the distribution of the coding residuals.

Related Work 

Sparse Representation-based Classification



Given training samples of the i-th object class, Ai =[vi,1, vi,2, …, vi,n_i ] in Rmxn_i ,





Denote a matrix A the entire training set: A = [A1, A2, A3, A4, …, Ak]=[v1,1, v1,2, …, vk,n_k]



A test sample y in Rm



training samples vi • “Robust Face Recognition via Sparse Representation”, John Wright and Yi Ma, PAMI 2009

Distribution of Coding Residual 

Assume that 𝑒1, 𝑒2, ⋅ ⋅ ⋅ , 𝑒n are  

independently and identically distributed according to some probability density function (PDF) 𝑓 (𝑒 ), where 𝜽 is the parameter that characterizes the distribution.



The likelihood of the estimator is



Maximum Likelihood Estimator aims to maximize this likelihood function or, equivalently, minimize the objective function:

Property of the Distribution 

In general, we assume that the unknown PDF 𝑓𝜽 (𝑒 )

is  



symmetric and 𝑓 (𝑒 ) < 𝑓 (𝑒 ) if∣∣ > ∣𝑒 ∣

So (𝑒 ) has the following properties: 

  

𝜌𝜽 (0) is the global minimal of 𝜌𝜽 (𝑒𝑖 ) 𝜌 (𝑒 ) = 𝜌 (−𝑒 ) (𝑒 ) > 𝜌 (𝑒 ) if ∣𝑒𝑖∣ > ∣𝑒 ∣ Without loss of generality, set 𝜌𝜽 (0) = 0

Distribution Induced Weights  

Let First order Taylor expansion in the neighborhood off

𝑹1(𝒆) is the high order residual term, Denote by 𝜌’𝜽 the derivative of 𝜌 , then where 𝑒0, is the 𝑖-th element of 𝒆0 

In sparse coding, it is usually expected that the fidelity term is strictly convex. So we approximate the residual term as

Distribution Induced Weights (2) 



Since 𝐹𝜽 (𝒆 ) reaches its minimal value (i.e., 0) at 𝒆 =

0, the approximated 𝐹𝜽 (𝒆 ) also has its minimal value at 𝒆 = 0.

Letting ~ T T ' F  x   F e0   x  e0  F e0   0.5x  e0  W x  e0  Constant

 ' e0,1   '     e  0 , 2    :   '     e   0,n 

D( X T QX )  2 X T Q

 e0  W T

Distribution Induced Weights (3) 

Then

where 𝑏 is a scalar value determined by 𝒆0 Since 𝒆 = 𝒚 − 𝐷, the RSC model can be approximated by

  

which is a weighted LASSO problem. 𝑊𝑖,𝑖 is the weight assigned to each pixel of the query image 𝒚. determination of distribution 𝜌𝜽 is transformed into the determination of weight 𝑊.

Weight Function  

Intuitively, in FR the outlier pixels (e.g. occluded or corrupted pixels) should have low weight values. Considering the logistic function has properties similar to the hinge loss function in SVM, 

the weight function

where 𝜇 and 𝛿 are positive scalars. Parameter 𝜇 controls the decreasing rate from 1 to 0, and 𝛿 controls the location of demarcation point.

y = e(- μx)

μ =0.07

μ =0.1

μ =0.5

Parameter μ and δ







When the square of residual is larger than , the weight value is less than 0.5. 𝛿 is chosen as  denote by 𝝍 = [(𝑒1)2, (𝑒2)2, ⋅ ⋅ ⋅ , (n)2].  by sorting in an ascending order, we get the reordered array 𝝍a . Let = ⌊𝜏𝑛⌋, where scalar 𝜏 ∈ (0, 1], and ⌊𝜏𝑛⌋ outputs the largest integer smaller than 𝜏𝑛.

Parameter: 𝜏 = 0.8/0.5 (w/o, w occlusion), 𝜇 = c /𝛿 , c = 8

The Convergence of IRSC 



Since the original cost function of Eq. (5) is lower bounded (≥0), the iterative minimization procedure in IRSC will converge. The convergence is achieved when the difference of the weight between adjacent iterations is small enough.

where 𝛾 is a small positive scalar.

Complexity Analysis 



Suppose that the dimensionality 𝑛 of face feature is fixed, the complexity of sparse coding model basically depends on the number of dictionary atoms, i.e. 𝑚. The empirical complexity of commonly used 𝑙1regularized sparse coding methods is 𝑂(𝑚𝜀) with 𝜀 ≈ 1.5 [12]. 



For FR without occlusion, SRC performs sparse coding once while RSC needs several iterations (usually 2 iterations). Thus in this case, RSC’s complexity is higher than SRC. For FR with occlusion or corruption, SRC needs to use an identity matrix to code the occluded or corrupted pixels. In this case SRC’s complexity is 𝑂((𝑚 + 𝑛)𝜀). Considering the fact that 𝑛 is often much greater than 𝑚 in sparse coding based FR (e.g. 𝑛 = 8086, 𝑚 = 717 in the experiments [25]), the complexity of SRC becomes very high. The computational complexity of our proposed RSC is 𝑂(𝑘(𝑚)), where 𝑘 is the number of iteration.

Experimental Results 

Three face datasets 

Extended Yale B – 2,414 frontal face images of 38 individuals 



AR - contains 50 males and 50 females 



 



with simultaneous variations in pose, expression, and illumination.

Demonstrate the robustness of RSC to 



a subset with only illumination and expression changes

Multi-PIE - 337 subjects captured in four sessions 



illumination conditions.

random pixel corruption, random block occlusion and real disguise.

All the face images are cropped and aligned by using the locations of eyes, which are provided by the face databases All the training samples are used as the dictionary 𝐷 in sparse coding.

Face Recognition Without Occlusion Table 1. Face recognition rates on the Extended Yale B database

Table 2. Face recognition rates on the AR database

Table 3. Face recognition rates on Multi-PIE database. (’SimS1’(’Sim-S3’): set with smile in Session 1 (3);’Sur-S2’(’SqiS2’):set with surprise (squint) in Session 2).

Face Recognition with Occlusion

Face Recognition with Occlusion (2)

Example Results

Conclusion & Discussion 



The pixel-level importance measure makes it robustness to various types of outliers (i.e., occlusion, corruption, expression, etc.) Pros:   



Full story from error distribution to weight measure Convergence and complexity analysis for iterative algorithm Complete experiments (occlusion/no occlusion on 3 datasets)

Cons:  

Thresholding method to decide weight Only compare to SRC