j - EECS Berkeley

Report 2 Downloads 303 Views
Building a Classification Cascade for Visual Identification from One Example

Andras Ferencz collaborators Erik Learned-Miller, Jitendra Malik

Andras Ferencz

March 18, 2005

1

Recognition Hierarchy Things Categorization Vehicles

Buses

Cars

Fruits

Bananas

Apples

Mammals

People

Chimps

Identification

Bill's Honda

Hillary's Toyota

Hillary

Bill

Andras Ferencz

March 18, 2005

2

Identification: Are you a car expert? Are these cars the same?

Are you sure? Andras Ferencz

March 18, 2005

3

Challenges of Identification (1) Differences between unique objects can be subtle

The answer is in the details... but which ones?

Andras Ferencz

March 18, 2005

4

Crash Course on Martian Identification Martian training set

Test: Find Bob

=

?

= Bob

=

Andras Ferencz

March 18, 2005

5

Challenges of Identification (2)

1) Differences between unique objects can be subtle - Requires careful selection of salient features while avoiding distracting ones

2) Typical to get only a single example for each class - Makes direct saliency testing for feature selection impossible

Andras Ferencz

March 18, 2005

6

An Example Our goal: Given a single image from known category (e.g. faces) be able to select a sequence of informative patches (difficult!) that can be matched to make a “same” vs. “different” decision

?

?

Andras Ferencz

March 18, 2005

7

Functional View: Categorization vs. Identification Object Categorization: 1) (Off-Line) Training Function Tcat: class training images --> Ccat 2) (On-Line) Classifier Ccat: test image --> class label

Object Identification: 1) (Off-Line) Training Function Tid: category training images --> Hid 2) (On-Line) Identifier Generator Hid : object image --> Cid 3) (On-Line) Classifier Cid: test image --> {same, different}

Identification has 2 training steps: 1) learn the category and 2) learn the object [see EigenFaces (PCA), FisherFaces (PCA+LDA)] Andras Ferencz

March 18, 2005

8

Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference

2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each

1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities

Modeling Patch Dependencies Building the Cascade Results

Andras Ferencz

March 18, 2005

9

Preprocessing: Detection and Alignment Alignment for (1) object centric coordinate system (2) help part correspondence

detect

warp

Camera 1

[see Faces in the News. Berg, et.al.] Andras Ferencz

March 18, 2005

10

Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference

2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each

1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities

Modeling Patch Dependencies Building the Cascade Results

Andras Ferencz

March 18, 2005

11

Classifier Cid: Test Image --> {same, different} Classifier Model: 1) Sequence of patches, j = {1,2,...,m} 2) Distributions P(Dj|same), P(Dj|diff) for each patch j

To classify test image: For each patch j: 1) Find matching patch in test image 2) Compute appearance difference dj and log likelihood ratio score

3 2 1 1

Given: 2 distributions for P(Dj|same) and

P(D)

3 P(D |diff) j

1

Compute LLR score 2

m

R=∑ log j=1

3

PD j=d j∣same PD j=d j∣diff 

D Andras Ferencz

March 18, 2005

12

Classifier Cid (summary) Classifier Cid needs: 1) List of patches from the object model image, j={1,...,m} 2) The densities P(Dj|same) and P(Dj|diff) for each patch j 3) Threshold  for decision Classifier Cid does: 1) Matches each patch j to test image, minimizing appearance distance: dj=1-NormalizedCorrelation(Obj_Patch, Test_Patch) 2) Record minimum dj for each patch and compute LLR m

R=∑ log j=1

PD j=d j∣same PD j=d j∣diff 

3) if R {same, different} - Match patches, compute log likelihood score from appearance difference

2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each

1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities

Modeling Patch Dependencies Building the Cascade Results

Andras Ferencz

March 18, 2005

14

Estimating Saliency Assume we knew P(Dj|same), P(Dj|diff) from many same/different examples: Saliency = Mutual Information I(Dj;C) where C={same,diff}

3 2 1

P(D)

1

I(D1;C) = .39 (best)

2

I(D2;C) = .23 (good)

3

I(D3;C) = .01 (bad)

D Andras Ferencz

March 18, 2005

15

Estimating P(Dj|same) and P(Dj|diff) The problem: we don't have same/different examples for this car. So we need to estimate P(Dj|same) and P(Dj|diff) 3

2

1

from the single image:

3 2 1

Intuition: we want P(Dj|same) and P(Dj|diff) to depend on the position and appearance of the patch j

Andras Ferencz

March 18, 2005

16

Estimating P(Dj|same) and P(Dj|diff) Assume functions Q from patch characteristics to probability densities P: Qsame : Patchj --> P(Dj|same) and Qdiff : Patchj --> P(Dj|diff)

(will get back to the exact form of Q and P(Dj|C) later)

Andras Ferencz

March 18, 2005

17

Classifier Generator Hid : object image --> Cid Simplified algorithm for Hid, given a single object image: 1) Scan through all candidate patches (size, position, resolution) For each Patch j: a) Compute P(Dj|same) and P(Dj|diff) from Patchj (function Q) b) Compute mutual information I(Dj;C) 2) Sort j according to I(Dj;C); pick top m patches

Q P(Dj) Dj Mutual Information I(Dj;C) = 0.21

Andras Ferencz

March 18, 2005

18

Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference

2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each

1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities

Modeling Patch Dependencies Building the Cascade Results

Andras Ferencz

March 18, 2005

19

Learning About the Category Goal of off-line training Tid : to define Qsame and Qdiff Given: labeled same and different image pairs Learns: to estimate P(Dj|same) and P(Dj|diff) for any Patch j Same

Different

=



: :

: :

=



Andras Ferencz

March 18, 2005

20

Parameterizing Patch Characteristics and P(Dj|C) Hyper-Features: Dimensionality Reduction for Patch Parameterize characteristics (position+appearance) of Patch j with hyperfeatures Zj Examples: x position, y position, contrast, dominant orientation, cornerness, mean intensity, ... +higher order terms derived from these (squares, cubes, cross terms)

Model P(Dj|same) and P(Dj|diff) using Gamma ( ) distributions:

PD j∣same= D j ;same 

Same (C=1)

PD j∣diff = D j ;diff 

Different (C=0)

where

= ,  2 degrees of freedom: mean, variance Andras Ferencz

March 18, 2005

21

Generalized Linear Model: Z -->  Example Image

Distribution of d vs. Y position (Z =[Y Y2 Y3])

same Y

Y

Y d

X Same

different

Different

d

Ordinary Linear Model  = [ Y Y2 Y3 1 ]





Same Different

Generalized Linear Model   = [ Y Y2 Y3 1 ] σ = [ Y Y2 Y3 1 ]

µ

 µ+σ

Mark Color - mutual information

µ+σ Andras Ferencz

March 18, 2005

22

Choosing an Encoding for Z (Feature Selection) Progress of Least Angle Regression (LARS) Based on Ordinary Linear Model

steps

Candidates for Hyper-Features (Z): X and Y coordinates directional filter energies intensity, contrast + quadratic, cubic and cross terms Feature Selection

Pick top N variables Z = [Y, Y^2, X*Y, Contrast, E^2 ... ] Train same and different GLMs by maximizing the likelihood: argmin −∑ log  d i ;  Z i    , 

i

Candidate Hyper-Features Score patch i by estimating its LLR P d i∣Z i , same  d i ; same  Z i  log ≈log P d i∣Z i , different   d i ;dif  Z i  Andras Ferencz

March 18, 2005

23

Summary

Object image

Test Image

Dj = dj

Andras Ferencz

March 18, 2005

24

Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference

2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each

1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities

Modeling Patch Dependencies Building the Cascade Results

Andras Ferencz

March 18, 2005

25

The Problem of Dependency How to pick the 2nd most salient patch?

Color = Saliency (Mutual Information)

Most Salient 2nd Most Salient 3rd Most Salient

Andras Ferencz

March 18, 2005

26

The Problem of Dependency

Same

Different

Andras Ferencz

March 18, 2005

27

The Problem of Dependency

Same

Different

Andras Ferencz

March 18, 2005

28

Modeling Dependency: Bivariate Gamma Empirical Joint Distributions (by distance) Same

(far)

Different

Same

(nearby)

Different

Modeled Joint Distributions: 3-parameter Bivariate Gamma [Kibble]

Andras Ferencz

March 18, 2005

29

Hyper-Feature Differences to Bivariate Gamma Distance is not the only indicator of dependence:

Model of dependence based on hyper-feature differences: Kibble's distribution: K() where ( define the marginals  defines the correlation  Let dZij = Zj - Zi; set  ij= sigmoid( dZij *

) same

Joint distributions of Patchi, Patchj:

PD j ,Di∣same=K i

same

, i

, ij 

PD j ,Di∣diff =K diff , diff , ij  i i Andras Ferencz

March 18, 2005

30

Finding a Greedy Sequence of Patches Recall that C = {Same, Different} is the decision variable I(D ; C) is the mutual information between the j-th patch and C j

The 1st most salient patch is max IDi ; C i

The 2nd most salient patch is

Joint Distribution

max ID j ,Di ; C−IDi ;C j

The 3rd most salient patch is max min ID j ,Di ;C−IDi ;C j

i

* formulation similar to [Vidal-Naquet & Ullman] Andras Ferencz

March 18, 2005

31

Outline Preprocessing: Alignment Steps of Object Identification (reverse order!): 3) (On-Line) Classifier Cid: test image --> {same, different} - Match patches, compute log likelihood score from appearance difference

2) (On-Line) Identifier Generator Hid : object image --> Cid - Find good patches, estimate probability densities for each

1) (Off-Line) Training Function Tid: category training images --> Hid - Learn a function from patch position+appearance to densities

Modeling Patch Dependencies Building the Cascade Results

Andras Ferencz

March 18, 2005

32

From a Sequence to a Cascade Defining stopping thresholds Progress of LLR score

R

number of patches m

R=∑ log j=1

PD j=d j∣same PD j=d j∣diff 

Andras Ferencz

March 18, 2005

33

From a Sequence to a Cascade Defining stopping thresholds Progress of LLR score

R

number of patches m

R=∑ log j=1

PD j=d j∣same PD j=d j∣diff 

Andras Ferencz

March 18, 2005

34

Top 10 Patches

Andras Ferencz

March 18, 2005

35

Numerical Results

Andras Ferencz

March 18, 2005

36