A Robust Stereo Matching Method for Low Texture ... - IEEE Xplore

Report 14 Downloads 135 Views
A Robust Stereo Matching Method for Low Texture Stereo Images Le Thanh SACH1 , Kiyoaki ATSUTA2 , Kazuhiko HAMAMOTO2 , and Shozo KONDO2 Department of Information Media Technology Faculty of Information Science and Technology, Tokai University 1 [email protected], 2 {atsuta,hama,kondo}@keyaki.cc.u-tokai.ac.jp

Abstract—Computing disparity images for stereo pairs of low texture images is a challenging task because matching costs inside low texture areas of the stereo pairs are almost similar. This problem can not be solved straightforwardly by increasing the size of aggregation windows or by using global optimization methods, e.g. dynamic programming, because those approaches will smooth depth discontinued boundaries as well. Based on the assumption that disparities of pixels in homogeneous regions are similar, this paper proposes a new method that is able to robustly perform stereo matching for low texture stereo images. The proposed method utilizes the edge maps computed from the stereo pairs to guide the cost aggregation process in stereo matching. By using edge maps, the proposed method can achieve the effect of using different shapes and sizes of aggregation windows. Moreover, the computational complexity of the proposed method is independent from the window size, similar to the moving average aggregation method. Experimental results from both of an artificial and a real stereo image sequence demonstrate that the proposed method can produce a larger number of and a better accuracy of reliable disparities for low texture stereo images than the moving average method. Index Terms—3D Reconstruction, Stereo Matching, Cost Aggregation, Moving Average Filter

I. I NTRODUCTION Stereo matching is a research field of computer vision that can produce 3D information from stereo images. It is useful for a broad range of applications, especially for autonomous navigation applications, e.g. Intelligent Transportation Systems (ITS) and Robot Vision. Stereo matching mainly performs the following tasks [1]: (a) compute pixel matching costs, (b) aggregate the matching costs inside support windows and (c) compute disparity images. In order to use stereo matching for the aforementioned applications, the trade-off between the computation time of stereo matching and the quality of disparity images should be taken into account. There exist several available approaches for computation time optimization, e.g. decreasing the computational complexity [2] and optimizing stereo matching by using special hardwares [3]. Unfortunately, because of the lack of textures in the stereo images captured for the aforementioned applications, the resultant disparity images contain a large number of noisy disparities. That problem is widely known by many research works [3], [4], [5]. The simplest solution to the low texture problem for the aforementioned applications is to use only reliable disparities. For example, sparse disparity maps computed for only pixels

having strong edges or high gradients are used to estimate the ground plane in [6], [7]. However, the total number of disparities in the sparse maps may not be sufficient for the estimation. In order to obtain a larger number of reliable disparities, the method proposed in [5] propagates disparities at strong edges into neighboring pixels of low texture regions by a method so-called ”Quasi-Dense” stereo matching [8]. The fact that both of the sparse and the quasi-dense map can not be used robustly unless there exist strong edges of the objects that are intended to be detected, e.g. lane-marks or traffic signs for road detection. Dense disparity maps were used in [9], [10] for road surface estimation and for obstacle detection. Several heuristic ways were proposed to select suitable disparities in those research works, for example heuristically designed mask and region. However, the robustness of [9], [10] seems to rely on predefined points (”Stabilization Points” [9] or ”Anchor Points” [10]) to which the road surfaces are tied. Kalman filter was mentioned in [4] as one of possible solution for the low texture problem. However, it has not been implemented yet. In [11], the low texture problem is solved by using very wide aggregation window. The aggregation was performed by using the moving average aggregation method [1], so the computation time is independent from the window size. However, using very wide window will smooth depth discontinued boundaries. Therefore, the method proposed in [11] can be used for road surface estimation, but it can not be used for on-road object detection. The size of aggregation windows should be large enough to include intensity variation for reliable matching, as well as be small enough to avoid disparity variation inside the windows [12]. Hence, it is reasonable to have an adaptive method for selecting the optimal aggregation window for stereo pairs. Such adaptive method was proposed in [13]. However, the computation time of the adaptive method is too expensive for navigation applications. Recent research works on stereo matching show that segmentation-based methods can achieve similar effect of using adaptive windows. Especially, the computational complexity of [14] is able to be independent from the window size. In [14], stereo images were segmented before the aggregation. In the aggregation process, the impact of pixels outside the segment that contains the center of the window is reduced by a predefined factor. However, the method in [14] still spends

978-1-4244-4568-4/09/$25.00 ©2009 IEEE

1

Fig. 1.

The proposed stereo matching system.

much time for segmenting the stereo images. This paper proposes a new method that is based on edge maps to guide the aggregation. The advantage of the proposed method is that it does not spend much time to segment images, but it can achieve the robustness for low texture problem like [14]. Moreover, the computational complexity of proposed method is independent from the window size. The proposed method is evaluated with public data sets captured for autonomous navigation applications in which the low texture problem is challenging for the state-of-the-art researches. The experimental results show that the proposed method can obtain a larger number of and a better accuracy of reliable disparities for low texture stereo images than the moving average method [1], [15]. II. P ROPOSED STEREO MATCHING SYSTEM The architecture of the proposed stereo matching system is shown in Fig. 1. Because stereo images captured with real conditions for autonomous navigation applications usually have much noises, bad contrast, and lack of textures, several methods have been proposed to pre-process the stereo pairs before performing stereo matching. As demonstrated in [16], Sobel filter is simple for computation but effective for emphasizing textures in images. Therefore, it is used to pre-process the stereo pairs in our application. In Fig. 1, module Sobel Filter performs the convolution between the horizonal Sobel kernel and each RGB component of the stereo images.

From filtered images, Pixel Matcher evaluates cost function used to compare pixels of left images to ones of right images. There are several cost functions, e.g. SAD - the sum of absolute differences, STAD - the sum of truncated absolute differences, and NCC - the normalized cross correlation. Detail evaluations of cost functions were presented in [12], [17]. A good cost function should be not only effective in reliably measuring the distance between two pixels but also simple in computation. For example, SAD is simple for computation, but it is sensitive to image noises. Therefore, very high matching costs at noisy pixels will disrupt the average of matching costs inside aggregation window. STAD can reduce the impact of noisy pixels, but it is difficult to select the truncating threshold. NCC can overcome the problem caused by noisy pixels, but it is a time-consuming task because it needs to call the squared root function. For the above reasons, in this paper, Pixel Matcher computes a modified cosine distance defined by Eq. 1, where gp = [gpr , gpg , gpb ]T and gq = [gqr , gqg , gqb ]T are the results of Sobel filter for RGB components of pixel p and q of the left and the right image respectively of a stereo pair. The modified cosine distance is similar to NCC. However, it does not need the squared root function. The output of Pixel Matcher is a 3D volume of matching costs denoted by costL pix (u, v, d), where U×V is the size of image, D is the disparity search range, u∈[0,U-1], v∈[0,V-1] and d∈[0,D-1]. costL pix (u, v, d) is referred to as ”3D left-to-right pixel matching cost volume” in the following text. The terminology ”left-to-right” means that one pixel of the left image is compared to D candidate correspondences of the right image. The 3D rightto-left pixel matching cost volume, i.e. costR pix (u, v, d), can be obtained from the left-to-right volume straightforwardly by rearranging the matching costs of costL pix (u, v, d) by L (u, v, d)=cost (u + d, v, d). In the following text, costR pix pix (u, v, d) for short. cost(u,v,d) is used to refer to costL pix f (p, q) = 1 −

gpr gqr + gpg gqg + gpb gqb (|gpr | + |gpg | + |gpb |)(|gqr | + |gqg | + |gqb |)

(1)

 1 − cos(gp , gq ) Mainly different from the other stereo matching methods, the proposed method utilizes edge maps computed by module Edge Detector to guide the aggregation of pixel matching costs. The aggregation is performed by module Cost Aggregation. The output of the aggregation are denoted R as costL aggr (u, v, d) and costaggr (u, v, d), which are the aggregated cost volumes for computing the left and the right disparity images respectively. The left and the right disparity images are computed by module WTA (Winner-Take-All). Edge Detector, Cost Aggregation and WTA will be explained in detail in the following sub-sections. A. Edge Detection Image segments computed from segmentation task can be used to guide the aggregation of pixel matching costs to achieve the effect of using adaptive windows (the adaptiveness

2

Fig. 2. (a) Examples of segments defined by an edge row/column. (b) Examples of the window’s mid-point and the window’s mid-segment for a window in the cost row/column corresponding to the edge row/column given in (a). (c) An example of the proposed aggregation method.

is in both of the size and the shape of the windows) [14]. The complexity of the aggregation itself is independent of the window size and can be performed effectively. However, image segmentation is an expensive computation task, especially, in the case of segmenting both of the left and the right images of stereo pairs. Therefore, as mentioned in [14], an efficient segmentation method is required. This paper demonstrates that the adaptiveness can be obtained by guiding the aggregation by edge maps. The complexity of the edge-based method is still independent of the window size. Moreover, edge detection can be performed more efficiently than segmentation task by using many available detectors, e.g. Sobel, LoG, and Canny. In the case of the moving average method [1], [15], the left and the right disparity maps of stereo pairs can be derived from only the left-to-right aggregated cost volumes. However, in the case of using the proposed method (and segmentation-based methods as well), the left-to-right aggregated cost volumes, which are guided by the left edge maps, are optimized for finding only the left disparity maps. Therefore, in order to obtain both of the left and the right disparity maps, the left and the right edge maps are needed. In this paper, instead of defining a segment as a group of pixels in a 2D homogeneous region of an image like [14], the proposed method defines a segment as a group of consecutive pixels on horizontal rows or vertical columns of left or right edge maps. As exemplified in Fig. 2(a), pixels from s to (t-1) in a row or a column of an edge map are grouped to a segment if and only if the row or the column changes value (0 → 1 or 1 → 0) from (s-1) to s and from (t-1) to t, and there is no value change from s to (t-1).

(a)

(b) Fig. 3. (a) The correspondence between cost rows and edge rows of the left and the right edge map. (b) The correspondence between cost columns and edge columns of the left and the right edge map.

B. Cost Aggregation The aggregation in the proposed method is divided into two steps: a horizontal and a vertical aggregation, as shown

Fig. 4.

Data structures used for the proposed aggregation method.

3

in Fig. 1. In the horizontal aggregation, left-to-right pixel matching costs are used to compute both of left-to-right and right-to-left aggregated cost volumes, i.e. costL haggr (u, v, d) and costR (u, v, d). The vertical aggregation is separately haggr performed on the results of the horizontal aggregation to R produce costL aggr (u, v, d) and costaggr (u, v, d). 1) Horizontal Aggregation: The input of Horizontal Aggregator are horizontal rows of cost(u,v,d) and their corresponding rows of the left and the right edge map. The correspondence between cost rows to edge rows is shown in Fig. 3(a). For example, in order to aggregate matching costs of horizontal row cost(u, v = vi , d = dj ), i.e. row vi of the U-V slice having d = dj , horizontal row edgeL (u, v = vi ) and edgeR (u, v = vi ) of the left and the right edge map are provided to the horizontal aggregator. The aggregation results are saved to costL haggr (u, v = vi , d = dj ) and costR (u, v = v , d = d i j ). haggr The horizontal aggregation is performed from the left to the right side of each horizontal row cost(u, v = vi , d = dj ). Let cost(u = r, v = vi , d = dj ) be the next cost under the aggregation. Then, the next window will occupy cost(u,v,d) from cost(u = r − W + 1, v = vi , d = dj ) to cost(u = r, v = vi , d = dj ), as shown in Fig. 3(a). Because the shift of right-to-left pixel matching costs on the left-to-right 3D cost volume, the next window has the corresponding windows on the left and the right edge map at different locations. These locations are edgeL (u = r − W + 1, v = vi ) → edgeL (u = r, v = vi ) and edgeR (u = r − W + 1 − dj , v = vi ) → edgeR (u = r − dj , v = vi ). In Fig. 4, the input cost row, the input edge rows and the output cost rows are represented by arrays cost in, l edge, r edge, l cost out and r cost out respectively. buf f er[l] → buf f er[r−1] are used to store elements of the current window, similar to [15]. The accumulated value (i.e. the summation) of those elements is saved to buf f er[a]. buf f er[r] is used to store the new incoming cost, i.e. cost in[r], when the window moves to right. A window may occupy several segments on the left and the right edge map, as shown in Fig. 2(b). In Fig. 4, these segments are indexed by l sl → l sr for the left edge map and by r sl → r sr for the right edge map. The number of pixels in each segment of the left edge map (of the right edge map) and the accumulated cost of those pixels are stored to l scount (r scount) and l scost (r scost) respectively. Array l label (r label) is used as a mapping from pixels to the segments of the left edge map (of the right edge map) to which the pixels belong. The segment that contains the middle pixel of the current window is referred to as the mid-segment hereafter. The underlying idea of the proposed aggregation method, which is exemplified in Fig. 2(c), is that the contribution of the segments different from the mid-segment of a window to the final aggregated cost of the window’s mid-point is reduced by a factor alpha(0 ≤ alpha ≤ 1). In this paper, the final aggregated cost of the window’s mid-point is composed of two components. The first is the average of matching costs inside the window’s mid-segment. The second is the product

of alpha and the average of matching costs outside the midsegment. Before processing the input cost row, i.e cost in, one initial segment is putted to l scount and l scost. This initialization is done as follows: (1) set l sl=l sr=0, (2) set l scost[0]= the accumulated value of W (that is the window size) left border elements, (3) set l scount[0]=W, and (4) set 0 (0: initial segment) to l label for W left border elements. r scount, r scost and r label are initialized by the same way mentioned above. For simplicity, border processing for the input cost row and the edge cost rows is skipped. Based on the data structures mentioned in the previous paragraph, the procedure for processing each cost of the input cost row is presented in Table I. The explanation for the proposed method is given in Table II. 2) Vertical Aggregation: Vertical aggregation is separately performed for the left-to-right and the right-to-left horizontally aggregated cost volumes, i.e costL haggr (u, v, d) and costL (u, v, d). In order to guide the aggregation for cost haggr L column costhaggr (u = ui , v, d = dj ), only the corresponding column of the left edge map is used. As shown in Fig. 3(b), that edge column is edgeL (u = ui , v). Similarly, only columns of the right edge map will be used to guide the aggregation of columns of costR haggr (u, v, d). However, the corresponding edge column of cost column costR haggr (u = ui , v, d = dj ) is shifted to the left side by dj pixels, i.e edgeR (u = ui − dj , v) as shown in Fig. 3(b). The aggregation algorithm of cost columns is similar to the one that is used for cost rows. For example, in order to aggregate cost column costL haggr (u = ui , v, d = dj ), in Fig. 4, costL (u = u , v, d = dj ) and edgeL (u = ui , v) i haggr are assigned to cost in and l edge respectively. In Table I, the processing task corresponding to the cost column of the right-to-left cost volume is skipped, i.e. lines (14-24), (2830) and (36-39) are skipped. Similarly, lines (3-13), (25-27), and (32-35) are skipped when columns of costR haggr (u = ui , v, d = dj ) are aggregated. For vertical aggregation, pixels of a cost column and the corresponding ones of the edge column are aligned on the same horizontal lines. Therefore, the conditions for the aggregation window to enter (in line (14) of Table I) and to leave (in line (28) of Table I) a segment are ”enter segment on right = r edge[r-1] ˆ r edge[r];” and ”leave segment on right = r edge[r-W] ˆ r edge[r-W+1];” respectively. C. Disparity Computation and Left-Right Validation WTA is a simple optimization task compared to global methods. It assigns a disparity dbest to pixel p(u, v) if the aggregated matching cost associated with dbest is the minimum among other candidate costs. Based on the result of Vertical Aggregator, the left and the right edge map are computed by mapL (u, v) = arg min costL aggr (u, v, d) and mapR (u, v) = d

arg min costR aggr (u + d, v, d) respectively. d Left-Right validation is used to determine the object points that are seen in both of two images of stereo pairs. This task

4

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(k)

(l)

(m)

Fig. 5. Examples of disparity images computed from the real sequence. (a, e, i): Left images. (b, f, k): Disparity images computed by the proposed method. (c, g, l): Disparity images computed by the referred method. (d, h, m): Disparity images computed by the referred method with the cost function and the window size used for the proposed method. TABLE III PARAMETERS USED FOR COMPUTING DISPARITY IMAGES The proposed method Edge detector: LoG Edge thresholding value: 0.002 Window size: 11 × 181 alpha: 0.2 Edge detector: LoG Edge thresholding value: 0.002 Window size: 21 × 501 alpha: 0.2

The referred method The Real Image Sequence Cost function: SAD Aggregation method: the moving average aggregation Window size: 11 × 11 Optimization method: WTA The Artificial Image Sequence Cost function: SAD Aggregation method: the moving average aggregation Window size: 11 × 11 Optimization method: WTA

can be used to reject error disparities as well. The pixels that pass the left-right validation are the ones that satisfy the following relation: mapL (u, v) == mapR (u − mapL (u, v), v). III. E XPERIMENTAL R ESULTS The proposed stereo matching approach is evaluated with a real sequence of stereo pairs from [10] and an artificial sequence from [3]. The real sequence contains 864 stereo pairs of color images. The size of each image is 320 × 240 pixels. The majority of the images have lack of textures at road areas. The artificial sequence contains 200 pairs of gray images. The

size of each image is 512 × 512 pixels. The images of the artificial sequence have more textures than the ones of the real sequence. The proposed method is compared to WTA method (called ”the referred method” hereafter) that is implemented in [1], [15]. Disparity images of the real and the artificial image sequence are computed using parameters given in Table III. Several examples of disparity images from the real sequence are presented in Fig. 5. Because the lack of texture at road pixels, almost all of disparities in the road areas in Fig. 5(c), (g) and (l), which are computed by the referred method, are

5

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 6. Examples of disparity images computed from the artificial sequence. (a, e): Left images. (b, f): Disparity images computed by the proposed method. (c) and (g): Disparity images computed by the referred method. (d, h): Disparity images computed by the referred method with the cost function and the window size used for the proposed method.

noisy. These results accord with the examples given in [10] and [11]. Although using very wide aggregation window [11] can produce reliable disparities at low texture regions (road regions), that solution smooths depth discontinued boundaries of on-road objects as well, as shown in Fig. 5(d), (h) and (m). In contrast, as shown in Fig. 5(b), (f) and (k), the proposed method can produce reliable disparities in low texture areas, and it can preserve the boundaries of on-road objects, side objects and even small electric pillars. Because there is no ground truth disparity maps for the real sequence, the quantitative comparison for the real sequence is performed only for the density of valid pixels, i.e. the pixels that pass the left-right validation. As shown in Fig. 7, the number of reliable disparities produced by the proposed method is much more than the referred method. Because the cost function of the referred method is the sum of absolute differences, which is sensitive to image noises, the variation of densities among consecutive frames may be very large. Several examples of disparity images from the artificial sequence are given in Fig. 6. Because the artificial sequence has more textures than the real sequence, both of the proposed and the referred method can produce a large majority of reliable disparities for road pixels. However, the proposed method is better than the referred method at low texture areas, like the area of the car in Fig. 6(a). In this case, the proposed method can produce a denser map than the referred method. As shown in Fig. 6(d) and (h), the disparity images computed

by using very wide aggregation window can be used for road detection, but on-road objects can not be detected from them. The density and the accuracy comparisons for the artificial sequence are given in Fig. 8. For the accuracy comparison, errors is measured by the average absolute function, defined by Eq. 2, where g(u,v) is the ground truth image, and d(u,v) is the disparity image computed by the proposed method or the referred method. As shown in Fig. 8, the proposed method can obtain a large number of reliable disparities, and can ensure that its errors are better than the referred method. The errors computed for Frame 70 to Frame 90 are greater than the ones for the other frames because the car is leaving the field of view of the stereo cameras. This phenomenon is also mentioned in [3]. V −1 U −1 1  |g(u, v) − d(u, v)| (2) Eabs = U ∗ V v=0 u=0 IV. C ONCLUSIONS AND FUTURE WORK This paper demonstrates that edge maps can be used to guide the aggregation task of stereo matching to obtain robust disparity maps for stereo pairs of low texture images. The proposed method can smooth the disparity surfaces for low texture regions of the stereo pairs. However, it can preserve depth discontinued boundaries of on-road objects, side objects or even very small objects like electric pillars. Moreover, the computational complexity of the proposed cost aggregation method is independent of the window size like the classical

6

TABLE I C OST AGGREGATION GUIDED BY EDGE MAPS (a) 1. 2.

Add new cost on the rightmost side of the window: buffer[r] = cost in[r]; aggr cost = buffer[a] + cost in[r];

3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

enter segment on left = l edge[r-1] ˆ l edge[r]; if(enter segment on left){ l sr= l sr + 1; l scost[l sr] =cost in[r]; l scount[l sr]=1; } else { l scost[l sr]+= cost in[r]; l scount[l sr] +=1; } l label[r] = l sr;

14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. (b) 25. 26. 27.

enter segment on right = r edge[r-1-d] ˆ r edge[r-d]; if(enter segment on right){ r sr = r sr + 1; r scost[r sr]=cost in[r]; r scount[r sr]=1; } else { r scost[r sr]+= cost in[r]; r scount[r sr] +=1; } r label[r] = r sr; Remove the element that the window has left: leave segment on left = l edge[r-W] ˆ l edge[r-W+1]; if(leave segment on left) l sl = l sl +1; else{ l scost[l sl]-= buffer[l]; l scount -=1;}

28. 29. 30.

leave segment on right = r edge[r-W-d] ˆ r edge[r-W+1-d]; if(leave segment on right) r sl = r sl +1; else{ r scost[r sl]-= buffer[l]; r scount -=1;}

31. (c) 32. 33. 34.

buffer[l] = aggr cost - buffer[l]; Compute the average cost for the window: mid = r - (W DIV 2); seg = l label[mid]; if((W-l scount[seg]) != 0){ l cost out[mid] = l scost[seg]/l scount[seg] + alpha*(buffer[l] - l scost[seg])/(W-l scount[seg]); } else l cost out[mid] = l scost[seg]/l scount[seg];

35.

39.

mid = r - (W DIV 2) - d; seg = r label[mid]; if((W- r scount[seg]) != 0){ r cost out[mid] = r scost[seg]/r scount[seg] + alpha*(buffer[l] - r scost[seg])/(W-r scount[seg]); } else r cost out[mid] = r scost[seg]/r scount[seg];

40.

a++; l++; r++

36. 37. 38.

TABLE II E XPLANATION OF THE PROPOSED AGGREGATION METHOD (a.) 1-2.

Add new cost on the rightmost side of the window: Add new cost to the previous accumulated value

3-13.

if new cost lies at the beginning pixel of a segment on left edge map Make one new segment at (l sr + 1) Set new segment’s cost = new cost Set new segment’s number of pixels = 1 else Accumulate new cost to the rightmost segment indexed by l sr Increase the total number of pixels of the rightmost segment by 1 end if Fill the mapping from pixel to segment for the left image

14-24.

if new cost lies at the beginning pixel of a segment on right edge map Make one new segment at (r sr + 1) Set new segment’s cost = new cost Set new segment’s number of pixels = 1 else Accumulate new cost to the rightmost segment indexed by r sr Increase the total number of pixels of the rightmost segment end if Fill the mapping from pixel to segment for the right image

(b.) 25-27.

Remove the element (called ”L-element”) that the window has left: if the window has left the leftmost segment identified by l sl Remove the segment by increasing l sl by 1 else Remove L-element from the segment identified by l sl Decrease the total number of pixels of the leftmost segment by 1 end if

28-30.

if the window has left the leftmost segment identified by r sl Remove the segment by increasing r sl by 1 else Remove L-element from the segment identified by r sl Decrease the total number of pixels of the leftmost segment by 1 end if

31.

Remove L-element from the current window Save the accumulated value for the current window at buffer[l]

(c.) 32-35.

Compute the average cost for the window: The aggregated cost of the current window for left-to-right volume = Average cost of the mid-segment + alpha*(Average cost of pixels outside of the mid-segment)

36-39.

The aggregated cost of the current window for right-to-left volume = Average cost of the mid-segment + alpha*(Average cost of pixels outside of the mid-segment) Move the window to right: Move the window to right by increasing a, l, and r by 1

(d.) 40.

ACKNOWLEDGMENT moving average approach. Because edge detection can be computed more efficiently than image segmentation, the total computation time of the proposed method is expected smaller than other segmentation-based stereo matching methods. In this paper, only the quality of disparity images is analyzed. In the future, we plan to optimize the proposed aggregation method by using special hardwares like [3] which was realized for the classical moving average approach. We also plan to propose a new cost function that can combine the color and the gradient information of image pixels.

I would like to thank Japan International Cooperation Agency (JICA) for funding my research. R EFERENCES [1] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” in International Journal of Computer Vision, vol. 47(1/2/3), 2002, pp. 7–42. [2] L.T. Sach, K. Hamamoto, K. Atsuta, and S. Kondo, “A new coarseto-fine method for computing disparity images by sampling disparity spaces,” in Trans. on Electronics, Information and Systems, vol. 129, no. 1, IEEJ, 2009, pp. 103–111. [3] W. van der Mark and D. M. Gavrila, “Real-time dense stereo for intelligent vehicles,” in Trans. on Intelligent Transport Systems, vol. 7. no. 1, IEEE, 2006, pp. 38–50.

7

Fig. 8.

Density and accuracy comparison for disparity images computed from the artificial stereo image sequence.

Fig. 7. Density comparison for disparity images computed from the real stereo image sequence.

[4] N. Suganuma, N. Fujiwara, “An obstacle extraction method using virtual disparity image,” in Proc. of IEEE Intelligent Vehicles Symposium, 2007, pp. 456–461. [5] N. Hautiere, R. Labayrade, M. Perrollaz, and D. Aubert, “Road scene analysis by stereovision: a robust and quasi-dense approach,” in Proc. of Control, Automation, Robotics and Vision, 2006, pp. 1–6. [6] S. Se and M. Brady, “Ground plane estimation, error analysis and applications,” in Trans. on Robotics and Autonomous Systems, vol. 39, no. 2, 2002, pp. 59–71. [7] R. Labayrade, D. Aubert and J.P. Tarel, “Real time obstacle detection in stereo vision on non-flat road geometry through v-disparity representation,” in Proc. of Intelligent Vehicle Symposium, vol. 2, IEEE, 2002, pp. 646–651.

[8] M. Lhuillier and L. Quan, “Match propagation for image-based modeling and rendering,” in Trans. on Pattern Analysis and Machine Intelligence, vol. 24, no. 8, IEEE, 2002, p. 1140–1146. [9] Nikolay Chumerin and Marc M. Van Hulle, “Ground plane estimation based on dense stereo disparity,” in Proc. of Control, Automation, Robotics and Vision, IEEE, 2006, pp. 1–6. [10] M. Zanin, “Localization of ahead vehicle with on-board stereo cemeras,” in Proc. of Image Analysis and Processing, IEEE, 2007, pp. 111–116. [11] Z. Zhao, J. Katupitiya, and J. Ward, “Global correlation based ground plane estimation using v-disparity image,” in Proc. of Robotics and Automation, IEEE, 2007, pp. 529–534. [12] W. Liang, G. Mingwei, G. Minglun, and Y. Ruigang, “How far we can go with local optimization in real-time stereo matching – a performance study on different cost aggregation approaches,” in Proc. of 3D Data Processing, Visualization, and Transmission, IEEE, 2006, pp. 129–136. [13] T. Kanade and M. Okutomi, “Stereo matching algorithm with an adaptive window: theory and experiment,” in Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 9, IEEE, 1994, pp. 920–932. [14] M. Gerrits and P. Bekaert, “Local stereo matching with segmentationbased outlier rejection,” in Proc. of Computer and Robot Vision, IEEE, 2006, pp. 66–72. [15] K. Muhlmann, D. Maier, R. Hesser, and R. Manner, “Calculating dense disparity maps from color stereo images, an efficient implementation,” in Proc. of Stereo and Multi-Baseline Vision, IEEE, 2001, pp. 30–36. [16] A. Broggi, C. Caraffi, R.I. Fedriga, and P. Grisleri, “Obstacle detection with stereo vision for off-road vehicle navigation,” in Proc. of Computer Vision and Pattern Analysis, IEEE, 2005, pp. 65–72. [17] Jasmine Bank and Peter Corke, “Quantitative evaluation of matching methods and validity measures for stereo vision,” in Trans. on International Journal of Robotics Research, vol. 20. no. 7, IEEE, 2001, pp. 512–532.

8