Obstacle detection using sparse stereovision and clustering techniques S. Kramm - A. Benrhair LITIS - Rouen, France
IEEE INTELLIGENT VEHICLES SYMPOSIUM 2012
S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
1 / 10
The problem
Target application : real-time identification of potential obstacles using embedded stereovision. Starting point : sparse 3D map build by matching low level features. (Declivity operator, extracts vertical contours of image)
Input data : 3D points cloud ( mi (u, v , d) ). (Recall that depth = k / d )
S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
2 / 10
The (now classical) V-disparity approach Mapping of 3D points m(u,v,d) on a 2D plane (v,d) :
Vertical obstacles appear as vertical lines. Identification of road plane and obstacles : Hough transform. Points beneath the road plane can be removed. Vertical position of obstacles given by line segments Horizontal position of obstacles given by other simple heuristics. S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
3 / 10
In real world...
⇒ Hough transform fails. S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
4 / 10
Our approach Achieve 3D localization of scene elements using multiscale disparity histograms and 2D clustering.
S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
5 / 10
Our approach Achieve 3D localization of scene elements using multiscale disparity histograms and 2D clustering. But first, remove 3D points : that are beneath the road (just like previous methods...), that are above an arbitrary line (3D plane) of ”maximum height”, that seem to be alone in their area.
S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
5 / 10
Histogram mode extraction We build histograms of points disparity, for different scale factors s : s = {1, 2, ..., 6}
Each histogram is mean-filtered. We search for modes using an auto-adaptative threshold (defined as µ + σ). ⇒ Produces a set M1 with, say, 2 + 3 + 6 + 7 + 8 + 8 = 44 values. S. Kramm - A. Benrhair (LITIS) IV 2012 June 2012
6 / 10
Merging of modes
All the extracted modes (disparity values) are compared to those produced at different scale factors. If close, then we consider they correspond to a scene element located at same depth, and merge them into a single disparity value. Threshold is auto-adaptative, based on considered disparity value.
Objective : robust detection of depth location where there might be something interesting in the scene. We produce a set of disparity values M2 (with card(M2 ) < card(M1 ) ).
S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
7 / 10
Clustering For each mode of the set, we extract from source data the subset of 3D points that respect the constraint dmin < d < dmax , with dmin and dmax based on considered mode disparity value. With this subset, we proceed to a 2D clustering step, using the DBSCAN algorithm [Ester96].
We produce a set of clusters for each disparity value of set M2 . S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
8 / 10
Building the set of Scene Elements
All the clusters don’t necessarly correspond to something relevant. Filtering heuristics : remove small ones (convex hull), remove the ones that have a low number of points, remove the one with low density.
The clusters are compared to all the others and merged, according to arbitrary thresholds, to build the final set of Scene Elements. A Scene Element is defined by set of 3D points, has the following attributes : mean and stddev disparity (related to depth), area (convex hull), density.
S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
9 / 10
Results
What’s left : Implement temporal consistency, and use it for tracking. Find a way to replace remaining absolute thresholds by auto-adaptative thresholds.
S. Kramm - A. Benrhair (LITIS)
IV 2012
June 2012
10 / 10