From Contours to 3D Object Detection and Pose Estimation Nadia Payet and Sinisa Todorovic
Wednesday, November 30, 11
1
Problem Statement
Given a single image: 1. Detect an object of interest 2. Delineate its boundaries 3. Estimate its continuous 3D pose Wednesday, November 30, 11
2
Prior Work Generative models e.g., aspect graphs Koendrik & Doorn 79 Kushal et al. 04 Saverese & Fei-Fei 07-09 Arie & Basri 09 Hu & Zhu 10
Discriminative models e.g., structured prediction Hoiem et al. 07 Su et al. ICCV 09 Ozuysal et al. 09 Liebelt & Schmid 08-10 Gu & Ren 10
Main characteristics of recent work: • Local image features • Sophisticated models • 3D pose = Interpolation of viewpoint classes Wednesday, November 30, 11
3
To Bridge the Semantic Gap... Recent work, typically semantic level
model
gap
local features pixels Wednesday, November 30, 11
4
To Bridge the Semantic Gap... Recent work, typically
Our approach
semantic level
semantic level model
model
local features pixels Wednesday, November 30, 11
mid-level gap features Prior work: contours Lowe & Binford 85 Cyr & Kimia 04
pixels 5
To Bridge the Semantic Gap... Recent work, typically
Our approach
semantic level
semantic level model
model
local features pixels Wednesday, November 30, 11
mid-level gap features contours pixels 6
To Bridge the Semantic Gap... Recent work, typically
Our approach
semantic level
semantic level model
model
gap
BoBs
Prior work: Zhu et al. 08 Zhang et al. 11
contours local features pixels Wednesday, November 30, 11
pixels 7
Bags of Boundaries = BoBs
If an object occurs, it must be in the spotlight of many BoBs jointly supporting the occurrence hypothesis Wednesday, November 30, 11
8
Bags of Boundaries = BoBs
latent indicator of boundaries shape context histogram of boundaries
s=
# bins # contours
⇥
# contours
Zhu et al. 08, Zhang et al. 11 Wednesday, November 30, 11
9
Bags of Boundaries vs. Bags-of-Words BoBs
Histogram of hidden features that must be inferred
Wednesday, November 30, 11
BoWs
Histogram of observable features
10
Approach input contour extraction Zhu et al. ICCV07
Wednesday, November 30, 11
11
Approach input contour extraction grid of BoBs
Wednesday, November 30, 11
12
Approach input contour extraction object model
Wednesday, November 30, 11
grid of BoBs
13
Approach input contour extraction object model
grid of BoBs estimate of 3D pose
Wednesday, November 30, 11
14
Approach input selected boundaries object model
grid warping estimate of 3D pose
Wednesday, November 30, 11
15
Approach input
output
object model
Wednesday, November 30, 11
16
Object Model = Shape Templates
2D probabilistic maps of shape for a set of viewpoints
Wednesday, November 30, 11
17
Learning view 1
view 2
view 3 ... view n
...
image 1 image m
Table top dataset Sun et al. 10 Wednesday, November 30, 11
18
Example Shape Templates
AUTOCAD dataset Liebelt & Schmid 08-10 Wednesday, November 30, 11
19
Representation of the Shape Template
Regular grid of shape-context descriptors + Affine projection matrix T
Wednesday, November 30, 11
20
Inference = Matching of BoBs
Wednesday, November 30, 11
21
Inference = Matching of BoBs
template 1 Wednesday, November 30, 11
template 2
...
template n 22
Inference = Matching of BoBs
under an arbitrary affine projection Wednesday, November 30, 11
23
Example Problem: Object Recognition
Given a set of edges in the image detect and localize all object instances and estimate their 3D pose Payet & Todorovic ICCV11 Wednesday, November 30, 11
24
Matching Formulation
T
tr C (X)F + ||T QF
min
X,F,T
min
X,F,T
T T +⇥||(T QF
T
T T (T QF
tr C (X)F +P )||T QF +⇥||(T QF
Wednesday, November 30, 11
T
P)
(T QF
T
P ||
T
P ||P )W || T
P )W ||
25
Matching Formulation
T
tr C (X)F + ||T QF
min
X,F,T
min
X,F,T
T T +⇥||(T QF
T
T T (T QF
tr C (X)F +P )||T QF +⇥||(T QF
T
s.t. X Wednesday, November 30, 11
P)
(T QF
N
[0, 1] ; T
T
P ||
T
P ||P )W || T
P )W ||
T; 26
Matching Formulation
T
tr C (X)F + ||T QF
min
X,F,T
min
X,F,T
T T +⇥||(T QF
T
T T (T QF
tr C (X)F +P )||T QF +⇥||(T QF
F s.t. X Wednesday, November 30, 11
T
P)
(T QF
T
P ||
T
P ||P )W || T
P )W ||
NT
0;1]F ;1TN =T 1; M ; F 1M 1N [0, 27
Matching Formulation
T
tr C (X)F + ||T QF
min
X,F,T
min
X,F,T
T T +⇥||(T QF
T
T T (T QF
tr C (X)F +P )||T QF +⇥||(T QF
F s.t. X Wednesday, November 30, 11
T
P)
(T QF
T
P ||
T
P ||P )W || T
P )W ||
NT
0;1]F ;1TN =T 1; M ; F 1M 1N [0, 28
Results: Object Detection
PASCAL VOC 2006 car dataset Wednesday, November 30, 11
Car show dataset
29
Results: Viewpoint Classification
3D#Object#dataset:#Cars## Wednesday, November 30, 11
30
Results: 3D Pose Estimation
Correct detection, localization, and pose estimation Wednesday, November 30, 11
31
Results: 3D Pose Estimation
Correct detection, localization, and pose estimation Wednesday, November 30, 11
32
Conclusion
•
Recent work:
• Pre-selected local features • Sophisticated object models and algorithms
•
Our approach:
• Mid-level features allow for: • Abstracting low-level features • Synergistic bottom-up/top-down interaction
• Simple models and algorithms Wednesday, November 30, 11
33