Building a Classification Cascade for Visual Identification from One Example
Andras Ferencz collaborators Erik Learned-Miller, Jitendra Malik
Andras Ferencz
March 18, 2005
1
Recognition Hierarchy Things Categorization Vehicles
Buses
Cars
Fruits
Bananas
Apples
Mammals
People
Chimps
Identification
Bill's Honda
Hillary's Toyota
Hillary
Bill
Andras Ferencz
March 18, 2005
2
Identification: Are you a car expert? Are these cars the same?
Are you sure? Andras Ferencz
March 18, 2005
3
Challenges of Identification (1) Differences between unique objects can be subtle
The answer is in the details... but which ones?
Andras Ferencz
March 18, 2005
4
Crash Course on Martian Identification Martian training set
Test: Find Bob
=
?
= Bob
=
Andras Ferencz
March 18, 2005
5
Challenges of Identification (2)
1) Differences between unique objects can be subtle - Requires careful selection of salient features while avoiding distracting ones
2) Typical to get only a single example for each class - Makes direct saliency testing for feature selection impossible
Andras Ferencz
March 18, 2005
6
An Example Our goal: Given a single image from known category (e.g. faces) be able to select a sequence of informative patches (difficult!) that can be matched to make a “same” vs. “different” decision
?
?
Andras Ferencz
March 18, 2005
7
Functional View: Categorization vs. Identification Object Categorization: 1) (Off-Line) Training Function Tcat: class training images --> Ccat 2) (On-Line) Classifier Ccat: test image --> class label
Object Identification: 1) (Off-Line) Training Function Tid: category training images --> Hid 2) (On-Line) Identifier Generator Hid : object image --> Cid 3) (On-Line) Classifier Cid: test image --> {same, different}
Identification has 2 training steps: 1) learn the category and 2) learn the object [see EigenFaces (PCA), FisherFaces (PCA+LDA)] Andras Ferencz
March 18, 2005
8
Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference
2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each
1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities
Modeling Patch Dependencies Building the Cascade Results
Andras Ferencz
March 18, 2005
9
Preprocessing: Detection and Alignment Alignment for (1) object centric coordinate system (2) help part correspondence
detect
warp
Camera 1
[see Faces in the News. Berg, et.al.] Andras Ferencz
March 18, 2005
10
Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference
2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each
1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities
Modeling Patch Dependencies Building the Cascade Results
Andras Ferencz
March 18, 2005
11
Classifier Cid: Test Image --> {same, different} Classifier Model: 1) Sequence of patches, j = {1,2,...,m} 2) Distributions P(Dj|same), P(Dj|diff) for each patch j
To classify test image: For each patch j: 1) Find matching patch in test image 2) Compute appearance difference dj and log likelihood ratio score
3 2 1 1
Given: 2 distributions for P(Dj|same) and
P(D)
3 P(D |diff) j
1
Compute LLR score 2
m
R=∑ log j=1
3
PD j=d j∣same PD j=d j∣diff
D Andras Ferencz
March 18, 2005
12
Classifier Cid (summary) Classifier Cid needs: 1) List of patches from the object model image, j={1,...,m} 2) The densities P(Dj|same) and P(Dj|diff) for each patch j 3) Threshold for decision Classifier Cid does: 1) Matches each patch j to test image, minimizing appearance distance: dj=1-NormalizedCorrelation(Obj_Patch, Test_Patch) 2) Record minimum dj for each patch and compute LLR m
R=∑ log j=1
PD j=d j∣same PD j=d j∣diff
3) if R {same, different} - Match patches, compute log likelihood score from appearance difference
2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each
1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities
Modeling Patch Dependencies Building the Cascade Results
Andras Ferencz
March 18, 2005
14
Estimating Saliency Assume we knew P(Dj|same), P(Dj|diff) from many same/different examples: Saliency = Mutual Information I(Dj;C) where C={same,diff}
3 2 1
P(D)
1
I(D1;C) = .39 (best)
2
I(D2;C) = .23 (good)
3
I(D3;C) = .01 (bad)
D Andras Ferencz
March 18, 2005
15
Estimating P(Dj|same) and P(Dj|diff) The problem: we don't have same/different examples for this car. So we need to estimate P(Dj|same) and P(Dj|diff) 3
2
1
from the single image:
3 2 1
Intuition: we want P(Dj|same) and P(Dj|diff) to depend on the position and appearance of the patch j
Andras Ferencz
March 18, 2005
16
Estimating P(Dj|same) and P(Dj|diff) Assume functions Q from patch characteristics to probability densities P: Qsame : Patchj --> P(Dj|same) and Qdiff : Patchj --> P(Dj|diff)
(will get back to the exact form of Q and P(Dj|C) later)
Andras Ferencz
March 18, 2005
17
Classifier Generator Hid : object image --> Cid Simplified algorithm for Hid, given a single object image: 1) Scan through all candidate patches (size, position, resolution) For each Patch j: a) Compute P(Dj|same) and P(Dj|diff) from Patchj (function Q) b) Compute mutual information I(Dj;C) 2) Sort j according to I(Dj;C); pick top m patches
Q P(Dj) Dj Mutual Information I(Dj;C) = 0.21
Andras Ferencz
March 18, 2005
18
Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference
2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each
1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities
Modeling Patch Dependencies Building the Cascade Results
Andras Ferencz
March 18, 2005
19
Learning About the Category Goal of off-line training Tid : to define Qsame and Qdiff Given: labeled same and different image pairs Learns: to estimate P(Dj|same) and P(Dj|diff) for any Patch j Same
Different
=
: :
: :
=
Andras Ferencz
March 18, 2005
20
Parameterizing Patch Characteristics and P(Dj|C) Hyper-Features: Dimensionality Reduction for Patch Parameterize characteristics (position+appearance) of Patch j with hyperfeatures Zj Examples: x position, y position, contrast, dominant orientation, cornerness, mean intensity, ... +higher order terms derived from these (squares, cubes, cross terms)
Model P(Dj|same) and P(Dj|diff) using Gamma ( ) distributions:
PD j∣same= D j ;same
Same (C=1)
PD j∣diff = D j ;diff
Different (C=0)
where
= , 2 degrees of freedom: mean, variance Andras Ferencz
March 18, 2005
21
Generalized Linear Model: Z --> Example Image
Distribution of d vs. Y position (Z =[Y Y2 Y3])
same Y
Y
Y d
X Same
different
Different
d
Ordinary Linear Model = [ Y Y2 Y3 1 ]
Same Different
Generalized Linear Model = [ Y Y2 Y3 1 ] σ = [ Y Y2 Y3 1 ]
µ
µ+σ
Mark Color - mutual information
µ+σ Andras Ferencz
March 18, 2005
22
Choosing an Encoding for Z (Feature Selection) Progress of Least Angle Regression (LARS) Based on Ordinary Linear Model
steps
Candidates for Hyper-Features (Z): X and Y coordinates directional filter energies intensity, contrast + quadratic, cubic and cross terms Feature Selection
Pick top N variables Z = [Y, Y^2, X*Y, Contrast, E^2 ... ] Train same and different GLMs by maximizing the likelihood: argmin −∑ log d i ; Z i ,
i
Candidate Hyper-Features Score patch i by estimating its LLR P d i∣Z i , same d i ; same Z i log ≈log P d i∣Z i , different d i ;dif Z i Andras Ferencz
March 18, 2005
23
Summary
Object image
Test Image
Dj = dj
Andras Ferencz
March 18, 2005
24
Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference
2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each
1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities
Modeling Patch Dependencies Building the Cascade Results
Andras Ferencz
March 18, 2005
25
The Problem of Dependency How to pick the 2nd most salient patch?
Color = Saliency (Mutual Information)
Most Salient 2nd Most Salient 3rd Most Salient
Andras Ferencz
March 18, 2005
26
The Problem of Dependency
Same
Different
Andras Ferencz
March 18, 2005
27
The Problem of Dependency
Same
Different
Andras Ferencz
March 18, 2005
28
Modeling Dependency: Bivariate Gamma Empirical Joint Distributions (by distance) Same
(far)
Different
Same
(nearby)
Different
Modeled Joint Distributions: 3-parameter Bivariate Gamma [Kibble]
Andras Ferencz
March 18, 2005
29
Hyper-Feature Differences to Bivariate Gamma Distance is not the only indicator of dependence:
Model of dependence based on hyper-feature differences: Kibble's distribution: K() where ( define the marginals defines the correlation Let dZij = Zj - Zi; set ij= sigmoid( dZij *
) same
Joint distributions of Patchi, Patchj:
PD j ,Di∣same=K i
same
, i
, ij
PD j ,Di∣diff =K diff , diff , ij i i Andras Ferencz
March 18, 2005
30
Finding a Greedy Sequence of Patches Recall that C = {Same, Different} is the decision variable I(D ; C) is the mutual information between the j-th patch and C j
The 1st most salient patch is max IDi ; C i
The 2nd most salient patch is
Joint Distribution
max ID j ,Di ; C−IDi ;C j
The 3rd most salient patch is max min ID j ,Di ;C−IDi ;C j
i
* formulation similar to [Vidal-Naquet & Ullman] Andras Ferencz
March 18, 2005
31
Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference
2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each
1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities
Modeling Patch Dependencies Building the Cascade Results
Andras Ferencz
March 18, 2005
32
From a Sequence to a Cascade Defining stopping thresholds Progress of LLR score
R
number of patches m
R=∑ log j=1
PD j=d j∣same PD j=d j∣diff
Andras Ferencz
March 18, 2005
33
From a Sequence to a Cascade Defining stopping thresholds Progress of LLR score
R
number of patches m
R=∑ log j=1
PD j=d j∣same PD j=d j∣diff
Andras Ferencz
March 18, 2005
34
Top 10 Patches
Andras Ferencz
March 18, 2005
35
Numerical Results
Andras Ferencz
March 18, 2005
36