Recursive Coarse-to-Fine Localization for Fast Object Detection Marco Pedersoli [
[email protected]], Jordi Gonzàlez, Andrew D. Bagdanov, Juan José Villanueva Universitat Autònoma de Barcelona - Centre de Visió per Computador Motivation
Results
RCFL Object Model Pyramid Md
* Find a better way than sliding windows to scan an image.
RCFL versus Casade Example: Original Image
* Based on coarse-to-fine feature representation. Recursive CoarsetoFine Localization
* Two objects can not occupy the same spatial location up to a certain margin. * Common cascade approaches rely on thresholding: no assumption about the searching space. * Computation time independent of image conditions.
Level 0
We define Rsd as the partial score for a resolution level d of the object model at a position (x, y) and scale s of the pyramid of features H: X
w d h d Md (ˆ xd , yˆd )·Hs+λd (ˆ xd +(x− )2 , yˆd +(y− )2 ). 2 2
Thresholdbased Cascade Neigh. max
Thr. 0
Method Features Classifier M-R Time HikSvm HOG-like HIK SVN 0.24 140.0 Shapelet Gradients AdaBoost 0.50 60.0 FtrMine Haar AdaBoost 0.34 45.0 MultiFtr HOG+Haar AdaBoost 0.16 18.9 HOG HOG lin. SVN 0.23 13.3 LatSvm HOG lat. SVM 0.17 6.3 Haar Haar AdaBoost 0.48 7.0 RCFL HOG lin. SVM 0.20 1.2 False Positive Per-Image in the INRIA dataset. All curves but RCFL are drawn using data provided by [3].
hypothesis propagation
hypothesis propagation Level 1
INRIA pedestrian dataset:
Neigh. max
Thr. 1
x ˆd ,ˆ yd
The total score is the sum of the partial scores computed at the best location of each level Πds (x, y): Ds (x, y) =
X
hypothesis propagation
hypothesis propagation Level 2
dˆ dˆ Rs (Πs (x, y)),
Neigh. max
Thr. 2
dˆ
Detections
Sliding Windows
RCFL versus Casade on VOC2007:
Learning We minimize the regularized empirical risk: (a)
(b)
(c)
(a) Pyramid of images Is . (b) Pyramid of HOG [1] features Hs computed over the pyramid of images. (c) Object model M is a h × w matrix of f -dimensional weight vectors. The detection score Ds at location (x, y) and scale s is computed as: X Ds (x, y) = M (ˆ x, yˆ) · Hs (ˆ x + x − w/2, yˆ + y − h/2), x ˆ,ˆ y
which is an f -dimensional correlation.
1 2 ||w|| + C 2
n X
max(0, 1 − yi maxhw, f (x, h)i)
plane bike bird boat bottle bus car cat chair cow table dog horse mbike person plant Exact 24.1 41.3 11.3 3.9 20.8 36.8 35.4 25.5 16.0 19.4 21.2 23.0 42.9 39.8 24.9 14.6 Cascade 24.1 38.7 12.9 3.9 19.9 37.3 35.7 25.9 16.0 19.3 21.2 23.0 40.2 41.5 24.9 14.6 Speed 9.3 9.8 9.3 9.9 3.9 18.1 13.8 17.3 9.5 12.1 6.4 3.3 17.6 20.1 3.6 6.4 RCFL 23.6 39.4 12.9 2.7 19.7 39.2 34.5 25.9 17.0 21.6 23.1 24.1 42.0 41.1 25.3 14.2 Average-Precision computed on positive examples of training set of the VOC2007 database.
sheep 14.3 15.1 19.0 15.8
sofa 33.0 33.2 15.0 29.6
train 22.8 23.0 9.8 22.5
tv mean speed 37.4 25.4 1.0 42.2 25.6 10.9 2.8 41.0 25.8 12.2
Contributions
References
* New localization algorithm for object detection that is orthogonal to cascade approaches.
[1] N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. In CVPR 2005. [2] P. Felzenszwalb, D. McAllester, D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR 2008. [3] P. Dollar, C. Wojek, B. Schiele, P. Perona. Pedestrian detection: A benchmark. In CVPR 2009.
h
i=1
where f (x, h) is a function that maps the input features x to the corresponding latent variable h, which represents the object location at each resolution level. In our case: X hw, f (x, h)i = Rsd (x + xd , y + yd ) d
The empirical risk is semiconvex in w and its minimizer is found using the iterative stochastic gradient descent method proposed in [2].
* Use of the structure of the search space together with coarse-to-fine representation to greatly speed-up the image scan. * Same accuracy as brute-force sliding windows. * Constant speed-up of 12x (in the tested configuration) independent of image conditions. * No need for thresholds.
Acknowledgements This work has been supported by the Spanish Research Programs Consolider-Ingenio 2010:MIPRCV (CSD200700018) and Avanza I+D ViCoMo (TSI-020400-2009-133); and by the Spanish projects TIN2009-14501-C02-01 and TIN2009-14501-C02-02.