Depth-Encoded Hough Voting for Joint Object Detection and Shape ...

Report 2 Downloads 100 Views
Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery Min Sun Advisor: Prof. Silvio Savarese

Vision Lab 1

Tea Party

2

Tea Party: Object Detection

Cups Tea pot 3

Tea Party: Shape is the key Object 2D Location Object 3D Shape

4

Robotics approach 1.

2.

Images from Willow Garage, Inc

1. Identify the Object in 3D using active sensors 2. Grasp the object using motion planning 5

Our goals 1.

2.

Venetian fort

1. Jointly Detect an object 2. Infer its 3D shape Sun et. al. ECCV’10 at Crete, Greece 6

Outline • • • • •

Related Work Our Method Experiments Applications Conclusion & Future Work

7

Related Work: Part-based models Constellation Model

Deformable Part-based model

Multi-View Model

Key-view 1

Fergus et. al. CVPR’03

Key-view 2

Key-view 3 Sun et. al. ICCV’09

Felzenszwalb et. al. PAMI’08

Implicit Shape Model

Pictorial Structure

Leibe ECCV’04 workshop

Felzenszwalb & Huttenlocher IJCV’05

8

Multi-View Model Multi-View Model

Key-view 2

•Representation: -Dense representation - Multi-view generative part-based model

Key-view 1

Key-view 3

Sun et. al. ICCV’09

•Learning: -Weakly-supervised learning -Incremental. Sun et. al. ICCV’09 9

Multi-View Model

Sun et. al. ICCV’09 10

Related Work: Part-based models Constellation Model

Deformable Part-based model

Multi-View Model

Key-view 1

Felzenszwalb & Huttenlocher IJCV’05

Implicit Shape Model

Leibe ECCV’04 workshop

Key-view 2

Key-view 3 Sun et. al. ICCV’09

Felzenszwalb et. al. PAMI’08

Pictorial Structure

Felzenszwalb & Huttenlocher IJCV’05

11

Related Work: Hough Voting Scheme

• Voting is a general technique where we let the parts vote for all hypotheses that are compatible with it. • Popular for detecting parameterized shapes –Hough’59, Duda&Hart’72, Ballard’81,…

12

Slide Modified from S. Maji

Hough transform P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959

Given a set of points, find the curve or line that explains the data points best

y

m

x

n Hough space

y=mx+n

13

Hough transform y

m

n

x y

m 3

x

5

3

2

2

3 7

11 10

4

3

2 3

1

4

5

2

2

0

1

3

3

1

3

n

14

Generalized Hough transform • What if want to detect arbitrary shapes defined by boundary points and a reference point?

[Dana H. Ballard, Generalizing the Hough Transform to Detect Arbitrary Shapes, 1980] Credit slide: C. Grauman

15

Example Circle model



rx

ry

0

1

0

45

0.7

0.7

90

0

1

135

-0.7

0.7

0.7

-0.7



270

Query

P1   = 0

 R = [rx,ry] = [1,0]  C1 = P1 + R

P2   = 45  R = [rx,ry] = [.7,.7]  C2 = P2 + R Pk   = -180  R = [rx,ry] = [-1, 0]  Ck = Pk + R

… 16

Related Work: Implicit shape models

parts with displacement vectors

training image

• Instead of indexing displacements by manually defined parts, index by “visual codeword” B. Leibe, A. Leonardis, and B. Schiele, Combined Object Categorization and Segmentation with an Implicit Shape Model, ECCV Workshop on Statistical Learning in Computer Vision 2004 17 Credit slide: S. Lazebnik

Implicit shape models: Training 1. Build codebook of patches around extracted interest points using clustering

Credit slide: S. Lazebnik

18

Implicit shape models: Training 1. Build codebook of patches around extracted interest points using clustering 2. For each codebook entry, store all positions relative to object center [center is given]

19

Implicit shape models 1. Given test image, extract patches, match to codebook entry 2. Cast votes for possible positions of object center 3. Search for maxima in Hough voting space

test image 20

Outline • • • •

Related Work Our Method Experiments Applications

• Conclusion & Future Work

21

Depth Encoded Hough Voting Codebook match

Object hypothesis

Image patch

Detection Score

Position Posterior

Voting Confidence

Codeword From Object Probability

Scale Prior Given Depth22

Depth Encoded Hough Voting •Design the scale to depth relation where is a 1-to-1 mapping between s & d

23

Object (x) and Patch (l,s) Example

: Object location x : Patch with center location l and scale s 24

Inferring Depth from Scale

s

d

-1

d =m(s,l) 25

Inferred Depth Issue: Quantization Error

Object Box

Patch Center

26

Inferred Depth Issue: Phantom Objects

Object Box

Patch Center S= h/w -h: object 2D height -w:object height to patch scale ratio.

27

Issue of Depth Decoding

Object Box

Patch Center

28

Given Depth Helps Detection • Using depth to scale mapping s =m(d,l)

Without Depth

With Depth

29

Outline • • • • •

Related Work Our Method Experiments Applications Conclusion & Future Work

30

Experiments • Table-top Dataset (New dataset proposed by Sun et. al. ECCV’10) – 200 table-top objects with dense depths – 3~5 object instances, 3 object categories (mice, mugs, staplers)



31

Results: Table-top Object

Implicit shape model [leibe et. al. 2004] = baseline method

32

Results: Table-top Object

33

Experiments • ETHZ Shape mug (proposed by Ferrari et. al.)

• Pascal voc 2007 cars

34

Results

•ETHZ Shape Mugs

•PASCAL VOC’07 Cars

35

Outline • • • • •

Related Work Our Method Experiments Applications Conclusion & Future Work

36

Applications: 6DOF & Pop-Up CAD Model Registration

37

Results

38

Application: Scene Understanding Object Detector

Layout Estimator

Bao, Sun, and Savarese, CVPR’10

39

Assumptions about objects and scenes (1)

(2)

1) objects and their supporting surfaces 2) objects and observer

40

Precision

Results

Recall •

13% improvement over our original detector (41%) 41

Conclusion • Joint object detection and shape recovery • Improve detection performance given: -depth in training depth -depth both in training and testing • Applications: -6DOF pose estimation -Object pop-up -Scene understanding

42

Future work • Use more 3D information, like curvature, surface normal to improve detection • Build a system to allow user to easily generate visually pleasing Object Pop-up

43

Vision Lab

Thank You

Acknowledgements Work partially supported by: NSF (Grant CNS 0931474) and Gigascale Systems Research Center.

44