Locality Constrained Dictionary Learning for Nonlinear Dimensionality Reduction Yin (Joe) Zhou and Kenneth Barner
IBM T. J. Watson Seminar
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
1 / 38
Outline 1
Introduction Overview NLDR and Examples Motivations Proposed Framework
2
Background Knowledge
3
Proposed Approach Problem Formulation Locality-Constrained Dictionary Learning
4
Experiments Synthetic Datasets Real-world Datasets
5
Conclusion and Future Work
6
Appendix Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
2 / 38
Introduction
Outline 1
Introduction Overview NLDR and Examples Motivations Proposed Framework
2
Background Knowledge
3
Proposed Approach Problem Formulation Locality-Constrained Dictionary Learning
4
Experiments Synthetic Datasets Real-world Datasets
5
Conclusion and Future Work
6
Appendix Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
3 / 38
Introduction
Overview
Overview Efficiently processing large-scale high-dimensional data is a challenging problem in machine learning. In this work, we propose a framework to accelerate nonlinear dimensionality reduction (NLDR) efficiency. Dictionary learning High-dimensional mensional ng data training
Linear reconstruction NLDR Dictionary atoms D
High-dimensional space
Low-dimensional LLow dimensional embedding of dictionary atoms d
L Low-dimensional embedding of all e data
Low-dimensional space
Experiments show that our method can improve the dimensionality reduction efficiency by more than 2 orders of magnitude. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
4 / 38
Introduction
NLDR and Examples
Nonlinear Dimensionality Reduction High-D Space
Low-D Space
Nonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction (NLDR) ⇐⇒ Manifold Learning
NLDR estimates the intrinsic low-dimensional manifold N Representative NLDR algorithms are: ISOMAP, Locally Linear Embedding, Laplacian Eigenmap, Local Tangent Space Alignment, etc. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
5 / 38
Introduction
NLDR and Examples
Pose and Illumination Direction Estimation
Figure: 2D manifold obtained by ISOMAP over 698 face images of dimension 4096.
Improve face recognition accuracy Facilitate human-computer interaction Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
6 / 38
Introduction
NLDR and Examples
Medical Imaging
Figure: Brain manifold. Image source: www.na-mic.org/Wiki.
Making it easier in searching and browsing large database An effective tool in clinical diagnosis
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
7 / 38
Introduction
Motivations
Motivations Nowadays, NLDR is facing large-scale problems. For example,
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
8 / 38
Introduction
Motivations
Motivations Nowadays, NLDR is facing large-scale problems. For example, In computer vision, image databases grow rapidly in size
(a) Caltech 256
(b) MIT SUN
Figure: Caltech 256 Database (Griffin et al.) contains 30,607 images; MIT SUN Database (Xiao et al.) now includes 131,072 images and is of growing size.
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
8 / 38
Introduction
Motivations
Motivations Nowadays, NLDR is facing large-scale problems. For example, In computer vision, image databases grow rapidly in size
(a) Caltech 256
(b) MIT SUN
Figure: Caltech 256 Database (Griffin et al.) contains 30,607 images; MIT SUN Database (Xiao et al.) now includes 131,072 images and is of growing size.
However, applying NLDR over large-scale databases causes exorbitant computational and memory complexity.
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
8 / 38
Introduction
Motivations
Motivations Nowadays, NLDR is facing large-scale problems. For example, In computer vision, image databases grow rapidly in size
(a) Caltech 256
(b) MIT SUN
Figure: Caltech 256 Database (Griffin et al.) contains 30,607 images; MIT SUN Database (Xiao et al.) now includes 131,072 images and is of growing size.
However, applying NLDR over large-scale databases causes exorbitant computational and memory complexity. Generally, NLDR has two steps, i.e., Nearest-Neighbor Graph Construction and Partial Eigenvalue Decomposition. Current NLDR algorithms have O(N 2 ) or O(N 3 ) computational complexity in the number of data N, and O(N 2 ) memory complexity in the number of data. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
8 / 38
Introduction
Proposed Framework
Proposed Framework To efficiently process large-scale databases, we propose the following framework for NLDR via learning a dictionary of landmark points. Dictionary learning
Linear reconstruction NLDR
High-dimensional mensional training ng data
Dictionary atoms D
Low-dimensional LLow dimensional embedding of d atoms dictionary
Low-dimensional space
High-dimensional space
Yin (Joe) Zhou (University of Delaware)
L Low-dimensional embedding of all e data
April 5th 2013
IBM T. J. Watson
9 / 38
Background Knowledge
Outline 1
Introduction Overview NLDR and Examples Motivations Proposed Framework
2
Background Knowledge
3
Proposed Approach Problem Formulation Locality-Constrained Dictionary Learning
4
Experiments Synthetic Datasets Real-world Datasets
5
Conclusion and Future Work
6
Appendix Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
10 / 38
Background Knowledge
Basic Concepts about Manifold ࡹ
A manifold M of dimension n, or n-manifold is a topological space with the following properties: 1 M is Hausdorff 2 M is locally Euclidean of dimension n, and 3 M has a countable basis of open sets. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
11 / 38
Background Knowledge
Basic Concepts about Manifold ࡾ
ࡹ ࡾ
ࡺ
Manifold M ∈ Rm is our data manifold in observation space Manifold N is unobservable and can only be estimated
Manifold M is the image of intrinsic low-dimensional manifold N under mapping f : N → Rm , where n m Local geometry is preserved after mapping f or g Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
12 / 38
Proposed Approach
Outline 1
Introduction Overview NLDR and Examples Motivations Proposed Framework
2
Background Knowledge
3
Proposed Approach Problem Formulation Locality-Constrained Dictionary Learning
4
Experiments Synthetic Datasets Real-world Datasets
5
Conclusion and Future Work
6
Appendix Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
13 / 38
Proposed Approach
Problem Formulation
Problem Formulation m Let Y = {yi }N i =1 be an observation set in R . Suppose all yi reside on a smooth m M ⊂ R , which is the image of a smooth n-manifold N under f : N → Rm . ࡾ
ࡹ ࡾ
ࡺ
Mapped data ݃ሺܡ ሻ ࡾ א Mapped atoms ݃ሺ܌ ሻ ࡾ א
Training data ܡ ࡾ א Learned atoms ܌ ࡾ א
Goal: learn a codebook D = [d1 , . . . , dK ] of K landmarks on M, such that kg (yi ) − g (D)xi k2 is minimized for all i = 1, . . . , N where g (D) = [g (d1 ), . . . , g (dK )], (K N), xi is a local reconstruction code for representing g (yi ). Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
14 / 38
Proposed Approach
Problem Formulation
Difficulties
In practice, however, it is infeasible to recover g . The reasons are 1. the myriad of observed data causes intractable computational complexity and memory consumption 2. the intrinsic manifold N is typically unknown
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
15 / 38
Proposed Approach
Problem Formulation
Difficulties
In practice, however, it is infeasible to recover g . The reasons are 1. the myriad of observed data causes intractable computational complexity and memory consumption 2. the intrinsic manifold N is typically unknown
Without knowing g explicitly, minimizing kg (yi ) − g (D)xi k2 for all i on N becomes impractical.
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
15 / 38
Proposed Approach
Problem Formulation
Difficulties
In practice, however, it is infeasible to recover g . The reasons are 1. the myriad of observed data causes intractable computational complexity and memory consumption 2. the intrinsic manifold N is typically unknown
Without knowing g explicitly, minimizing kg (yi ) − g (D)xi k2 for all i on N becomes impractical. Therefore, we need to establish a relationship between the approximation problem among latent variables (i.e., g (yi ) and g (D)) and the approximation problem among observation variables (i.e., yi and D).
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
15 / 38
Proposed Approach
Problem Formulation
Harnessing Locality Invariance ࡾ
ࡹ ࡾ
ࡺ
Mapped data ݃ሺܡ ሻ ࡾ א Mapped atoms ݃ሺ܌ ሻ ࡾ א
Training data ܡ ࡾ א Learned atoms ܌ ࡾ א
Intrinsic geometric properties (i.e., xi ) of each neighborhood on M is equally valid for local patches on N [Roweis and Saul Science ’00].
Therefore, we can use the same set of local reconstruction codes xi to characterize the local geometric relationships between g (yi ) and g (D) on N as to characterize those between yi and D on M. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
16 / 38
Proposed Approach
Locality-Constrained Dictionary Learning
Learning Theory
Lemma Let M, N and g be as above. Let p ∈ Up be an open subset of M with respect to p, such that ∀q ∈ Up , the line segment pq remains in Up . If |∂g s /∂q t | ≤ c, 1 ≤ s ≤ n, 1 ≤ t ≤ m, at every q ∈ Up , then we have ∀q ∈ Up : 2
2
kg (q) − g (p)k2 ≤ mnc 2 kq − pk2 . Lemma indicates that as Up shrinks to be a sufficiently small neighborhood of 2 2 p, mnc 2 kq − pk2 −→ kg (q) − g (p)k2 −→ 0. We use this observation below.
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
17 / 38
Proposed Approach
Locality-Constrained Dictionary Learning
Learning Theory Our objective is to minimize kg (yi ) − g (D)xi k2 for all i, which is equivalent to PN minimize i =1 kg (yi ) − g (D)xi k22 . Applying the previous Lemma, we derive the following theorem. Theorem Let g (yi ), yi , g (D), D and g be as above. Let yi ∈ Uyi and Dxi ∈ UDxi be open sets as in Lemma, that also satisfy Dxi ∈ Uyi and {dj |xji 6= 0, ∀j} ⊂ UDxi ∀i. If 1T xi = 1 and kxi k0 = τ (τ K ) for all i, then the following inequality holds: N X
kg (yi ) − g (D)xi k22
i =1
≤
α
N X
kyi − Dxi k22 + β
i =1
N X K h i X xji2 kDxi − dj k22 i =1 j=1
where xji is the j-th element in vector xi , τ ∈ Z+ , and α = 2c1 , β = 2τ c2 , with c1 = sup({|∂g s /∂q t | | q ∈ Uyi , ∀i, s, t}), and c2 = sup({|∂g s /∂q t | | q ∈ UDxi , ∀i, s, t}). Note that i exclusively represents the indexes of yi and its code xi while j only denotes the j-th element in xi . Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
18 / 38
Proposed Approach
Locality-Constrained Dictionary Learning
Interpretation N X i =1
2
kg (yi ) − g (D)xi k2
≤ α |
N X i =1
2
kyi − Dxi k2 + β {z
}
approximation error
ࡾ
ࡹ
|
N X K h i X 2 xji2 kDxi − dj k2 i =1 j=1
{z
localization error
}
Training data ܡ ࡾ א Learned atoms ܌ ࡾ א Reconstructed ۲ ܠ ࡾ א
all dj ∈ {dj |xjih 6= 0, ∀j} −→ Dx i h that i i −→Pyi , indicating PN PK 2 N PK 2 2 β i =1 j=1 xji kDxi − dj k2 ≈ β i =1 j=1 xji2 kyi − dj k2
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
19 / 38
Proposed Approach
Locality-Constrained Dictionary Learning
Locality Constrained Dictionary Learning (LCDL) LCDL Let Y ∈ Rm×N , D ∈ Rm×K , X ∈ RK ×N be defined as above. We formulate the practical LCDL optimization problem as: min
kY − DXk2F + λ
s.t.
D,X
N X K h i X xji2 kyi − dj k22 + µ kXk2F i =1 j=1
T
1 xi = 1 ∀i xji = 0 if dj ∈ / Ωτ (yi ) ∀i, j
(∗) (∗∗)
where Ωτ (yi ) is defined as the τ -neighborhood containing τ nearest neighbors of yi . The sum-to-one constraint (∗) follows from the symmetry requirement, while the locality constraint (∗∗) allows xi to characterize the intrinsic local geometry. Minimizing the proposed LCDL problem yields a codebook of K locality-preserving landmark points located on manifold M in observation space Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
20 / 38
Experiments
Outline 1
Introduction Overview NLDR and Examples Motivations Proposed Framework
2
Background Knowledge
3
Proposed Approach Problem Formulation Locality-Constrained Dictionary Learning
4
Experiments Synthetic Datasets Real-world Datasets
5
Conclusion and Future Work
6
Appendix Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
21 / 38
Experiments
Experiments
The proposed LCDL algorithm is evaluated in two experimental scenarios 1. The effectiveness in approximating the intrinsic manifold 2. The performance in classification of the reconstructed low-dimensional manifold produced by LCDL
LCDL is compared with state-of-the-art dictionary learning algorithm K-SVD [Aahron et al. TSP ’06] and locality-preserving codebook learning algorithms Local Coordinate Coding (LCC) [Yu et al. NIPS ’09] and Locality-constrained Linear Coding (LLC) [Wang et al. CVPR ’10]
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
22 / 38
Experiments
Synthetic Datasets
Experiment Setup Three synthetic manifolds are employed, i.e., Swiss roll, Punctured sphere and Gaussian manifold For each synthetic dataset, N = 3000 training data are randomly generated We set K = 500, 200, and 100 for the three manifold, respectively The NLDR algorithms are Hessian LLE, Laplacian Eigenmap, and LLE for these three manifolds Measure the root mean square error (RMSE) introduced through the reconstruction of an intrinsic manifold N , i.e., √ kg (Y) − g (D)XkF / N where g (Y) and g (D) are the low-dimensional embedding of training data and landmark points, respectively, computed via the NLDR algorithm.
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
23 / 38
Experiments
Synthetic Datasets
Synthetic Datasets 3D Manifold
Ground Truth
LCDL
KSVD
0.8
0.8 0.6
0.6
0.4
0.4
0.2
0.2
LCC
LLC 0.8
1.2
0.3
1
0.6
0.8
0.4
0.6
0.2
0.2
10 5 0 −5 −10 20 15
10 5
10
0
0
−0.2
−0.2
−0.4
−0.4
−0.6
−0.6
−0.8
−0.8
−1
−1
0.1
0.4 0.2
−5
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
−0.8 −1
−1
0.8
(b)
−0.5
0
0.5
1
(c) RMSE: 0.0299
0.8
0.8
0.6
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
(d) RMSE: 0.7409
0.5
0
0
−0.5
−0.4
−0.4
−0.6
−0.6
−0.8
−0.8
−1
−0.5
−0.8
−0.6
−0.4
(g)
−0.2
0
0.2
0.4
0.6
0.8
0.8
1
0.6
0.8
−0.3 −0.5
0
0.5
−0.1
0
0.1
0.2
0.3
0.6 0.4 0.2 0
−0.6 −0.8
−1
−0.5
0
0.5
−0.5
(k) RMSE: 0.1060
0
0.5
(n)
1
0.6 0.4
0.8
0.2
0.6
1
0 −0.2
−0.6
−1 0.5
(o) RMSE: 0.0076
1
0.2
−0.4
−0.6
0
1
0.4
−0.2
−0.8
−0.5
0.5
1.2
0.6
−0.4
−1
0
(l) RMSE: 0.1743
0.4
0
−0.8 −0.5
1
−0.2
0
−0.4
−1
0.5
−0.4
0.8
−0.6
−0.8
0
0.8
−0.2
0
−0.4
(m)
−0.2
(j) RMSE: 0.8664
−0.2
−0.6 2 0 −2
−0.3
−0.5
(f ) RMSE: 0.0535
−0.8 −0.4
0.2
−0.2
−2
−1
−0.6
0.2
0
0.8
0.6
0
2
0.6
−0.2
0.4
0.2
0.05
0.4
−0.2
(i) RMSE: 0.0705
0.4 0.1
0.2
−0.4
1.2
0.15
0
0 −0.1
−1
(h)
−0.2
0.2
0 0 −0.2
−0.4
0.4
0.2
0
−0.6
0.6 0.1
0.4
0.2
−0.2
−0.8
(e) RMSE: 0.0666
0.8
0.2
0.6
0.4
1
0.5
−0.6
−0.6 −0.8
−0.8
0.5
−0.4
−0.4 −0.3
(a)
1.5
−0.2
0 −0.2
−0.2
0
5
0
0 −0.1
−0.4
−0.8 −0.4
−0.2
0
0.2
0.4
0.6
(p) RMSE: 0.2774
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
(q) RMSE: 0.0727
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
(r) RMSE: 0.0380
Figure: Low-dimensional embedding reconstruction comparison on Swiss roll (1st row), Punctured sphere (2nd row) and Gaussian (3rd row). Ground truth means the low-dimensional embedding obtained directly from all training samples. The nearest neighbor parameter k of NLDR algorithms is set to 6. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
24 / 38
Experiments
Real-world Datasets
Face Recognition Extended Yale B Database 38 persons, 2414 frontal face images of size 32 × 32 32 images per person are randomly selected for training and the rest for testing
CMUPIE Database 68 persons, 11554 frontal face images of size 32 × 32 130 images per person are randomly selected for training and the rest for testing
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
25 / 38
Experiments
Real-world Datasets
Experiment Setup
The goal is examine which dictionary learning algorithm yields the most meaningful low-dimensional embedding for classification For all algorithms, a structured dictionary is learned as D = [D1 |D2 | . . . |DC ], where Di is the sub-dictionary for class i The number of atoms per class to be 8, yielding a dictionary of 304 atoms for the Extended YaleB Database and a dictionary of 544 atoms for the CMU PIE Database All Train is selected as the baseline method, which represents the results obtained in performing LLE on the entire training set Random is employed for comparison, meaning using randomly selected training samples as the dictionary Nearest-neighbor classifier is employed.
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
26 / 38
Experiments
Real-world Datasets
Face Recognition
Fix dictionary size 304 atoms for the Extended YaleB Database and 544 atoms for the CMU PIE Database Vary the reduced dimension Recognition Rate (%)
Recognition Rate (%)
Extended YaleB 100 95 90 85 80 75 70 65 60 55 50 45 40 35 30
LCDL KSVD LCC LLC Random All Train
10 50 90 130 170 210 250 290 Dimension
CMU PIE 100 95 90 85 80 75 LCDL 70 KSVD 65 LCC 60 LLC 55 Random 50 45 All Train 40 10 50 90 130 170 210 250 290 Dimension
(a)
Yin (Joe) Zhou (University of Delaware)
(b)
April 5th 2013
IBM T. J. Watson
27 / 38
Experiments
Real-world Datasets
Face Recognition
Change dictionary size from 2 atoms per class to 10 atoms per class Fixed dimension CMU PIE Recognition Rate (%)
Recognition Rate (%)
Extended YaleB 100 95 90 85 80 75 70 65 60 76
LCDL KSVD LCC LLC
152 228 304 Dictionary Size
380
100 95 90 85
75 70 136
(c)
Yin (Joe) Zhou (University of Delaware)
LCDL KSVD LCC LLC
80
272 408 544 Dictionary Size
680
(d)
April 5th 2013
IBM T. J. Watson
28 / 38
Experiments
Real-world Datasets
Parameter and Performance Vary λ among 0.001, 0.01, 0.1 and 1 Vary τ among 1, 2, 3, 4 and 5 CMU PIE
98 97 96 95 94 93
λ=1
1
λ = 0.1
2 (e)
λ = 0.01
3 τ
Yin (Joe) Zhou (University of Delaware)
4
λ = 0.001
Recognition Rate (%)
Recognition Rate (%)
Extended YaleB 99
5
98 97 96 95 λ=1
94
1
λ = 0.1
2 (f)
April 5th 2013
λ = 0.01
3 τ
4
λ = 0.001
5
IBM T. J. Watson
29 / 38
Experiments
Real-world Datasets
Computational Cost
Table: The overall time (seconds) includes dictionary learning and training data embedding. Note the time measurement may vary based on different implementations. Extended YaleB CMU PIE Overall Time Speedup Overall Time Speedup All Train 22.1577 s 1x 11807.3121 s 1x K-SVD 71.2387 s 0.3x 2751.2620 s 4.3x LCC 38.7172 s 0.6x 1299.7146 s 9.1x LLC 11.6593 s 1.9x 69.2321 s 170.5x LCDL 7.1001 s 3.1x 45.8025 s 257.8x
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
30 / 38
Conclusion and Future Work
Outline 1
Introduction Overview NLDR and Examples Motivations Proposed Framework
2
Background Knowledge
3
Proposed Approach Problem Formulation Locality-Constrained Dictionary Learning
4
Experiments Synthetic Datasets Real-world Datasets
5
Conclusion and Future Work
6
Appendix Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
31 / 38
Conclusion and Future Work
Conclusion and Future Work Conclusions: We show that the approximation to an unobservable intrinsic manifold by a few latent points residing on the manifold can be cast in a novel dictionary learning problem over the observation space The presented locality constrained dictionary learning (LCDL) is a novel algorithm, which effectively learns a compact set of atoms consisting of locality-preserving landmark points on a nonlinear manifold LCDL is superior to existing dictionary learning algorithms in terms of yielding more meaningful atoms for NLDR algorithms with greatly reduced computational complexity. Future work includes: Testing over additional datasets Incorporating a sparse outlier term to improve robustness Extending LCDL to be a discriminative dictionary learning algorithm for classification Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
32 / 38
Appendix
Outline 1
Introduction Overview NLDR and Examples Motivations Proposed Framework
2
Background Knowledge
3
Proposed Approach Problem Formulation Locality-Constrained Dictionary Learning
4
Experiments Synthetic Datasets Real-world Datasets
5
Conclusion and Future Work
6
Appendix Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
33 / 38
Appendix
SHREC’ 11 Contest Dataset 30 classes, 600 watertight meshes 20 non-rigid shapes per class
Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
34 / 38
Appendix
Extension to 3D Object Recognition
Recognition Rate (%)
100
90
80
70 0
Smeet BoWH + SVM D−KSVD Proposed 10
20 30 40 50 60 Percentage of Occlusion (%)
70
Figure: Performance comparison on robustness against partial occlusion. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
35 / 38
Appendix
Proof Sketch Denote by Y ∈ Rm×N the matrix containing all yi and let X = [x1 , . . . , xN ] ∈ RK ×N be the matrix containing N local reconstruction codes. We have N X i =1
kg (yi ) − g (D)xi k22
= kg (Y) − g (D)Xk2F
(a)
= kg (Y) − g (DX) + g (DX) − g (D)Xk2F
(b)
≤ 2kg (Y) − g (DX)k2F + 2kg (DX) − g (D)Xk2F
(c)
=2
N X i =1
kg (yi ) − g (Dxi )k22 + 2
N X i =1
kg (Dxi ) − g (D)xi k22
where in (a) g (DX) ∈ Rn×N is a matrix representing the image of the reconstructed signals DX via g ; (b) is from Cauchy-Schwarz inequality; in (c) g (Dxi ) ∈ Rn is the i-th column in g (DX). Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
36 / 38
Appendix
Proof Sketch Since 1T xi = N X i =1
PK
j=1 xji
= 1 and kxi k0 = τ for all i, Eq. (1) can be written as:
kg (yi ) − g (D)xi k22
2
N X h i X
K
xji g (Dxi ) − g (dj ) +2
i =1 j=1
≤2
N X
kg (yi ) −
≤2
N X
N X K h i X kg (yi ) − g (Dxi )k22 + 2τ xji2 kg (Dxi ) − g (dj )k22
i =1
i =1
2 g (Dxi )k2
2
i =1 j=1 2
hApplying Lemma 1 to ieach kg (yi ) − g (Dxi )k2 and to each 2 xji2 kg (Dxi ) − g (dj )k2 , ∃ c1 = sup ({|∂g s /∂q t | | q ∈ Uyi , ∀i, s, t}) and
2 c2 = sup ({|∂g s /∂q t | | q ∈ UDxhi , ∀i, s, t}) such that 2kg i i (yi ) −hg (Dxi )k2 2 2 2 ≤ 2c1 kyi − Dxi k2 , ∀i and 2τ xji2 kg (Dxi ) − g (dj )k2 ≤ 2τ c2 xji2 kDxi − dj k2 , ∀i, j. Letting α = 2c1 and β = 2τ c2 completes the result. Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
37 / 38
Appendix
Optimization Block Coordinate Descent Method. 1. Repeat 2. for i = 1 to N do Computing local reconstruction codes as ˆ xi ←−
(G + λδ(G) + µI)−1 1 1 (G + λδ(G) + µI)−1 1 T
end for 3. for j = 1 to K do Updating dictionary atoms as dj ←−
1 ExT j∗ + λYα T (1 + λ)(xj∗ xj∗ )
end for 4. Until convergence Yin (Joe) Zhou (University of Delaware)
April 5th 2013
IBM T. J. Watson
38 / 38