CVPR 2011
Robust Sparse Coding for Face Recognition Meng Yang, Lei Zhang, Jian Yang, David Zhang Hong Kong Polytechnic Univ.
Presenter : 江振國
Outline
Introduction Previous Work Method
Distribution induced weights Iteratively Reweighted Sparse Coding
Experimental Results Conclusion & Discussion
Motivation
In sparse coding method, it assumes that the coding residual 𝒆 = 𝒚−𝐷𝜶 follows Gaussian or Laplacian distribution. Gaussian Distribution :
Laplacian Distribution :
The assumption may not be hold when the observation 𝒚 is noisy or has many outliers (disguise or occlusions).
design the fidelity term which minimizes the function associated with the distribution of the coding residuals.
Related Work
Sparse Representation-based Classification
Given training samples of the i-th object class, Ai =[vi,1, vi,2, …, vi,n_i ] in Rmxn_i ,
Denote a matrix A the entire training set: A = [A1, A2, A3, A4, …, Ak]=[v1,1, v1,2, …, vk,n_k]
A test sample y in Rm
training samples vi • “Robust Face Recognition via Sparse Representation”, John Wright and Yi Ma, PAMI 2009
Distribution of Coding Residual
Assume that 𝑒1, 𝑒2, ⋅ ⋅ ⋅ , 𝑒n are
independently and identically distributed according to some probability density function (PDF) 𝑓 (𝑒 ), where 𝜽 is the parameter that characterizes the distribution.
The likelihood of the estimator is
Maximum Likelihood Estimator aims to maximize this likelihood function or, equivalently, minimize the objective function:
Property of the Distribution
In general, we assume that the unknown PDF 𝑓𝜽 (𝑒 )
is
symmetric and 𝑓 (𝑒 ) < 𝑓 (𝑒 ) if∣∣ > ∣𝑒 ∣
So (𝑒 ) has the following properties:
𝜌𝜽 (0) is the global minimal of 𝜌𝜽 (𝑒𝑖 ) 𝜌 (𝑒 ) = 𝜌 (−𝑒 ) (𝑒 ) > 𝜌 (𝑒 ) if ∣𝑒𝑖∣ > ∣𝑒 ∣ Without loss of generality, set 𝜌𝜽 (0) = 0
Distribution Induced Weights
Let First order Taylor expansion in the neighborhood off
𝑹1(𝒆) is the high order residual term, Denote by 𝜌’𝜽 the derivative of 𝜌 , then where 𝑒0, is the 𝑖-th element of 𝒆0
In sparse coding, it is usually expected that the fidelity term is strictly convex. So we approximate the residual term as
Distribution Induced Weights (2)
Since 𝐹𝜽 (𝒆 ) reaches its minimal value (i.e., 0) at 𝒆 =
0, the approximated 𝐹𝜽 (𝒆 ) also has its minimal value at 𝒆 = 0.
Letting ~ T T ' F x F e0 x e0 F e0 0.5x e0 W x e0 Constant
' e0,1 ' e 0 , 2 : ' e 0,n
D( X T QX ) 2 X T Q
e0 W T
Distribution Induced Weights (3)
Then
where 𝑏 is a scalar value determined by 𝒆0 Since 𝒆 = 𝒚 − 𝐷, the RSC model can be approximated by
which is a weighted LASSO problem. 𝑊𝑖,𝑖 is the weight assigned to each pixel of the query image 𝒚. determination of distribution 𝜌𝜽 is transformed into the determination of weight 𝑊.
Weight Function
Intuitively, in FR the outlier pixels (e.g. occluded or corrupted pixels) should have low weight values. Considering the logistic function has properties similar to the hinge loss function in SVM,
the weight function
where 𝜇 and 𝛿 are positive scalars. Parameter 𝜇 controls the decreasing rate from 1 to 0, and 𝛿 controls the location of demarcation point.
y = e(- μx)
μ =0.07
μ =0.1
μ =0.5
Parameter μ and δ
When the square of residual is larger than , the weight value is less than 0.5. 𝛿 is chosen as denote by 𝝍 = [(𝑒1)2, (𝑒2)2, ⋅ ⋅ ⋅ , (n)2]. by sorting in an ascending order, we get the reordered array 𝝍a . Let = ⌊𝜏𝑛⌋, where scalar 𝜏 ∈ (0, 1], and ⌊𝜏𝑛⌋ outputs the largest integer smaller than 𝜏𝑛.
Parameter: 𝜏 = 0.8/0.5 (w/o, w occlusion), 𝜇 = c /𝛿 , c = 8
The Convergence of IRSC
Since the original cost function of Eq. (5) is lower bounded (≥0), the iterative minimization procedure in IRSC will converge. The convergence is achieved when the difference of the weight between adjacent iterations is small enough.
where 𝛾 is a small positive scalar.
Complexity Analysis
Suppose that the dimensionality 𝑛 of face feature is fixed, the complexity of sparse coding model basically depends on the number of dictionary atoms, i.e. 𝑚. The empirical complexity of commonly used 𝑙1regularized sparse coding methods is 𝑂(𝑚𝜀) with 𝜀 ≈ 1.5 [12].
For FR without occlusion, SRC performs sparse coding once while RSC needs several iterations (usually 2 iterations). Thus in this case, RSC’s complexity is higher than SRC. For FR with occlusion or corruption, SRC needs to use an identity matrix to code the occluded or corrupted pixels. In this case SRC’s complexity is 𝑂((𝑚 + 𝑛)𝜀). Considering the fact that 𝑛 is often much greater than 𝑚 in sparse coding based FR (e.g. 𝑛 = 8086, 𝑚 = 717 in the experiments [25]), the complexity of SRC becomes very high. The computational complexity of our proposed RSC is 𝑂(𝑘(𝑚)), where 𝑘 is the number of iteration.
Experimental Results
Three face datasets
Extended Yale B – 2,414 frontal face images of 38 individuals
AR - contains 50 males and 50 females
with simultaneous variations in pose, expression, and illumination.
Demonstrate the robustness of RSC to
a subset with only illumination and expression changes
Multi-PIE - 337 subjects captured in four sessions
illumination conditions.
random pixel corruption, random block occlusion and real disguise.
All the face images are cropped and aligned by using the locations of eyes, which are provided by the face databases All the training samples are used as the dictionary 𝐷 in sparse coding.
Face Recognition Without Occlusion Table 1. Face recognition rates on the Extended Yale B database
Table 2. Face recognition rates on the AR database
Table 3. Face recognition rates on Multi-PIE database. (’SimS1’(’Sim-S3’): set with smile in Session 1 (3);’Sur-S2’(’SqiS2’):set with surprise (squint) in Session 2).
Face Recognition with Occlusion
Face Recognition with Occlusion (2)
Example Results
Conclusion & Discussion
The pixel-level importance measure makes it robustness to various types of outliers (i.e., occlusion, corruption, expression, etc.) Pros:
Full story from error distribution to weight measure Convergence and complexity analysis for iterative algorithm Complete experiments (occlusion/no occlusion on 3 datasets)
Cons:
Thresholding method to decide weight Only compare to SRC