Stereo Matching with Symmetric Cost Functions - Semantic Scholar

Report 0 Downloads 98 Views
Stereo Matching with Symmetric Cost Functions Kuk-Jin Yoon and In So Kweon Robotics and Computer Vision Lab. Dept. Electrical Engineering and Computer Science, KAIST 373-1, Guseong-dong, Yuseong-gu, Daejeon, 305-701, Korea {kjyoon, iskwoen}@kaist.ac.kr

Abstract Recently, many global stereo methods have achieved good results by modeling a disparity surface as a Markov random field (MRF) and by solving an optimization problem with various techniques. However, most global methods mainly focus on how to minimize conventional cost functions efficiently, although it is more important to define cost functions well to improve performance. In this paper, we propose new symmetric cost functions for global stereo methods. We first present a symmetric data cost function for the likelihood and then propose a symmetric discontinuity cost function for the prior in the MRF model for stereo. In defining cost function, both the reference image and the target image are taken into account to improve performance without modeling half-occluded pixels explicitly and without using color segmentation. The performance improvement of stereo matching due to the proposed symmetric cost functions is verified by applying the proposed symmetric cost functions to the belief propagation (BP) based stereo method. Experimental results for standard testbed images show that the performance of the BP based stereo method is greatly improved by the proposed symmetric cost functions.

1. Introduction Stereo vision has been a long lasting research topic in computer vision and a large number of methods have been proposed to solve the stereo problem for decades [9]. Generally, stereo methods can be roughly divided into two categories according to the disparity selection method: local methods and global methods. Local methods [4, 5, 13–15] select the disparities of image pixels locally using the winner-takes-all (WTA) method and, therefore, are faster than global methods in general. However, local methods have difficulty in dealing with image ambiguity owing to insufficient or repetitive texture. On

the other hand, global methods seek a disparity surface that minimizes a global cost function that is defined by making explicit smoothness assumptions. Various optimization techniques such as the cooperative algorithm [7, 16], dynamic programming [2], non-linear diffusion [8], graph cut [3, 5, 6], and belief propagation (BP) [10, 11] are used to get minimum cost solutions. Therefore, global methods can deal with inherent image ambiguity more effectively than local methods. Many global stereo methods have recently achieved good results by modeling a disparity surface as a Markov random field (MRF) and by solving an optimization problem [8, 10, 11]. They mainly focus on how to minimize conventional cost functions efficiently to improve performance. However, lower cost solutions do not always correspond to better performance as pointed out in [12]. Therefore, it is more important to define cost functions to be minimized well than to improve optimization techniques for improving performance. Nevertheless, there is a relatively small amount of work on defining cost functions well. In this paper, we propose new symmetric cost functions that can be used for improving the performance of global stereo methods that are based on the MRF model. We first present a symmetric data cost function for the likelihood and then propose a symmetric discontinuity cost function for the prior in the MRF model for stereo. We finally verify the performance improvement of stereo matching due to the proposed symmetric cost functions by applying the proposed symmetric cost functions to the BP-based stereo method. In defining cost functions, we take both the reference image and the target image into account aiming at improving performance without modeling half-occluded pixels explicitly and without using color segmentation, which are also difficult problems. In fact, it is important to consider the reference image and the target image together to preserve depth discontinuities and to reduce the errors owing to halfoccluded pixels at depth discontinuities. Although Sun et al. [10] recently tried to consider the reference image and

the target image together, they just performed the same processes by switching the reference image and the target image with the same cost functions. Moreover, their method is computationally expensive. However, both the proposed symmetric cost functions are simple and, therefore, can be easily applied to global stereo methods without much modification. The remainder of this paper is organized as follows. We first introduce the MRF model for stereo briefly in Sec. 2 and then propose symmetric cost functions in Secs. 3–4. Stereo matching results for standard testbed images using the proposed symmetric cost function are given in Sec. 5 and we finally conclude the paper in Sec. 6.

2. MRF Model for Stereo Matching

P (I|D)P (D) P (I)

D

D

where q is the neighboring pixel of p in N (p) and ψc (dp , dq ) is the joint clique potential function of two disparities dp and dq .

2.3. Global cost function By combining Eq. (3) and Eq. (4), Eq. (1) becomes Y P (D|I) ∝ exp(−φ(p, dp , I)) × p

P (I|D)P (D) (2) P (I)

Here, P (I|D) is referred to as the likelihood and P (D) is referred to as the prior.

2.1. Likelihood When assuming that the observation follows an independent identical distribution (i.i.d.), the likelihood P (I|D) in Eq. (1) can be expressed as Y P (I|D) ∝ exp(−φ(p, dp , I)) (3) p

where φ(p, dp , I) is the cost function of pixel p with disparity dp given observation I. Therefore, the likelihood P (I|D) is related to the data cost when pixel p has the disparity dp with given images.

exp(−ψc (dp , dq ))

(5)

p q∈N (p)

and we can simply get the following equation by taking the minus log of Eq. (5). X X X − ln(P (D|I)) ∝ φ(p, dp , I) + ψc (dp , dq ) p

(1)

where D is the smooth disparity field of the reference image and I is a pair of input stereo images (i.e., I = (IL , IR ) where IL is the reference image and IR is the target image). The goal of the stereo problem is to find the disparity field D that maximizes Eq. (1) for given I as Dopt = arg max P (D|I) = arg max

p q∈N (p)

Y Y

Although global stereo methods formulate the stereo problem in various ways, the MRF formulation is most general. Bayesian stereo matching can be formulated as a maximum a posteriori MRF (MAP–MRF) problem. Given a rectified stereo pair of images, the stereo problem can be modeled using Bayes’ rule as P (D|I) =

specifying the first order neighborhood of pixel p, N (p), the prior can be expressed as Y Y P (D) ∝ exp(−ψc (dp , dq )) (4)

p q∈N (p)

(6) As a result, maximizing Eq. (5) is equal to minimizing Eq. (6). Here, Eq. (6) can be rewritten in terms of cost functions as X X X V (dp , dq ) (7) E(D|I) = D(p, dp , I) + p

p q∈N (p)

where E(D|I) ∝ − ln(P (D|I))

(8)

D(p, dp , I) = φ(p, dp , I)

(9)

V (dp , dq ) = ψc (dp , dq )

(10)

E(D|I) is a global cost to be minimized to obtain a disparity map. D(p, dp , I) is referred to as the data cost that measures the cost of assigning disparity dp to pixel p with given I. On the other hand, V (dp , dq ) is referred to as the discontinuity cost that measures the cost of assigning disparities dp and dq to two neighboring pixels p and q. Then, our goal is to well define D(p, dp , I) and V (dp , dq ) in consideration of both the reference image and the target image together to improve the performance of global methods without modeling half-occluded pixels explicitly and without using color segmentation.

2.2. Prior

3. Symmetric Data Cost Function for the Likelihood

The Markov property asserts that the probability of each site in a field depends only on its neighboring sites. By

Most global methods compute the data cost using an individual pixel intensity (or color) and then try to solve the

image ambiguity using global reasoning with a smoothness constraint, which results from the ambiguous local appearances of image pixels owing to image noise and insufficient/repetitive texture. However, it is still difficult to get an accurate disparity map when there are severe errors in the data cost. To get the reliable data cost, it may be useful to use local support windows as in local methods. However, local support windows cause the foreground fattening phenomenon resulting in severe errors at depth discontinuities. In this work, we use the symmetric data cost function that we have proposed in [15]. This method provides the reliable data cost in consideration of the reference image and the target image even near depth discontinuities even when using large local support windows.

3.1. Locally adaptive support-weight computation In [15], the data cost is computed by using local support windows with adaptive support-weights that are computed based on local proximity and color similarity between pixels. The support from the neighboring pixel q to the pixel under consideration p is weighted as µ ¶ ∆cpq ∆gpq w(p, q) = exp −( + ) (11) γc γp where ∆cpq is the color difference between p and q measured in the CIELab color space and ∆gpq is the distance between p and q in the image domain. γc and γp are the parameters that control weights.

3.2. Symmetric data cost function The data cost is then computed by aggregating the perpixel raw data cost with adaptive support-weights in both support windows. In this stage, support-weights in both a reference support window and a target support window are combined to take both images into account. This is because when considering only the reference support window, the computed data cost may be erroneous if there are pixels from different depths or half-occluded pixels in the target support window. The combined support-weights favor the points likely to have similar disparities with the centered pixels in both images. The data cost of p with the disparity dp can be expressed as P ¯dp , q¯dp )e(q, q¯dp ) q∈Wp ,¯ qdp ∈Wp¯d wc (p, q, p p P D(p, dp , I) = wc (p, q, p¯dp , q¯dp ) q∈Wp ,¯ qd ∈Wp¯ p

dp

(12)

where wc (p, q, p¯dp , q¯dp ) = w(p, q)w(¯ pdp , q¯dp )

(13)

(a) reference (b) weights (c) windows in reference windows windows

target (d) weights in target windows

Figure 1. Support-weight computation. The center pixels marked by rectangles are the pixels under consideration. The brighter pixels have larger support-weights in (b) and (d).

(a) weights for the first row in Fig. 1

(b) weights for the second row in Fig. 1

Figure 2. Combined support-weights used for the similarity computation.

p¯dp and q¯dp are the corresponding pixels in the target image when p and q in the reference image have the disparity dp , respectively. Wp and Wp¯d are the local support windows of p and p¯dp in the reference image and in the target image, respectively. e(q, q¯dp ) represents the per-pixel raw data cost computed using the colors of q and q¯dp . In this work, we use the pixel dissimilarity proposed in [1] to compute e(q, q¯dp ) instead of using a raw pixel difference as in [15]. Here, it is worthy of note that the form of the data cost function is symmetric — even when the reference image and the target image are switched, the form of this data cost function does not change. Figure 1 shows the results of support-weight computation for the reference and target support windows. The small rectangles in the middle indicate the pixels under consideration. The support-weights in each support window are computed independently and then combined as shown in Fig. 2. We can see that the local structures of support windows are reflected in the combined support-weights. The data cost is then computed by Eq. (12) using the combined support-weights.

occluded pixel

pa

pb

pc

qa

pd

qc

qd

Figure 3. Error in the discontinuity cost (pa and pb ) due to a half-occluded pixel when considering only the reference image.

4. Symmetric Discontinuity Cost Function for the Prior 4.1. Potts model Among the two cost functions in the MRF formulation, the discontinuity cost function between nodes, V (dp , dq ), determines how support is aggregated from neighboring nodes. This cost function is directly related to the smoothness constraint. In most global methods, it is generally computed by using the truncated linear model or the Potts model [3] assuming piecewise constant disparities. The typical Potts model can be expressed as ½ 0 if dp = dq (14) V (dp , dq ) = ρ(∆C) otherwise The function ρ(∆C) is defined in terms of the magnitude of image gradient between p and q, ∆C, as ½ P × s if∆C < T ρ(∆C) = (15) s otherwise where T is a magnitude threshold and s is a penalty term for violating the smoothness constraint. P is a penalty term that increases the penalty when the gradient magnitude is small. This form of the smoothness constraint makes depth discontinuities coincide with color or intensity discontinuities.

4.2. Symmetric Potts model The problem of the conventional Potts model is that it is based on only the reference image. This may cause the erroneous discontinuity cost and result in errors at depth discontinuities. For instance, suppose that the pixels pa , pc , and pd in the reference image correspond to the pixels qa , qc , and qd in the target image as shown in Fig. 3. Then, the pixel pb in the reference image is a half-occluded pixel. In this case, although two pixels pa and pb have the same

color, the discontinuity cost between pa and pb should be ignored. In other words, when hypothesizing that pa in the reference image corresponds to qa in the target image, the discontinuity cost between between pa and pb should be ignored even though they have the same color because pb may be a half-occluded pixel under the hypothesis. To reduce the errors in the discontinuity cost owing to half-occluded pixels, it may be useful to consider halfoccluded pixels formally in the MRF formulation. However, modeling the occlusion field and detecting halfoccluded pixels are also difficult problems. To solve this problem, we propose the new symmetric Potts model by redefining Eq. (14) and Eq. (15) while taking the reference and the target images into account together as ½ 0 if dp = dq Vs (dp , dq ) = (16) ρdp (∆Cr , ∆Ct ) otherwise  Pr × Pt × s    Pr × s ρdp (∆Cr , ∆Ct ) = Pt × s    s

if∆Cr < T, ∆Ct < T if∆Cr < T, ∆Ct ≥ T if∆Cr ≥ T, ∆Ct < T otherwise (17) Here, ∆Cr is the magnitude of color gradient between p and q in the reference image and ∆Ct is the magnitude of color gradient between p¯dp and q¯dp in the target image when the disparity of p is dp . p¯dp and q¯dp are the corresponding pixels in the target image when p and q in the reference image have the disparity dp . Pr and Pt are penalty terms that increase the penalty when the gradient magnitude is small. In fact, Pr = Pt in the proposed model. We can see that, as in the data cost function, Eq. (17) is also symmetric — when the reference image and the target image are switched, the form of this discontinuity cost function does not change. In addition, it is worth of note that Vs (dp , dq ) is dependent on actual dp and dq values while V (dp , dq ) in the conventional Potts model is not. In addition, it is possible to redefine the linear model in the symmetric form as the Potts model. The main advantage of the proposed symmetric discontinuity cost function is that we can improve the performance of global stereo methods at depth discontinuities without modeling half-occluded pixels explicitly and without using color segmentation. This is because the effect of halfoccluded pixels can be reduced by considering both images together when computing the discontinuity cost.

5. Experiments The proposed symmetric cost functions are simple and, therefore, can be easily applied to global stereo methods without much modification. To verify the performance improvement by the proposed symmetric cost functions, we

(a) conventional

(b) symmetric disc. cost

(c) symmetric data cost

(d) symmetric disc. & data cost

Figure 4. Matching results for the ‘Tsukuba’ image according to the applied symmetric cost function

applied the proposed symmetric cost functions to the BPbased method implemented by Tappen in [12]1 . To evaluate the performance of the stereo method, we used wellknown testbed images with ground truth, which are often used for the performance comparison of various two-frame stereo methods [9]. The stereo method is run with a constant parameter setting across all images (T = 8, Pr = Pt = 2, s = 1.1, γc = 4.0, γp = 15.5, window size=31 × 31). In fact, these parameters are not optimal and arbitrarily selected. However, it matters little in our experiments because what we want to do in our experiments is to verify the performance improvement of the stereo method not to obtain the best performance. We first applied the proposed symmetric data cost function and the proposed symmetric discontinuity cost function separately to the BP-based method to check the performance improvement due to each cost function. Fig. 4 shows the matching results for the ‘Tsukuba’ data set according to the applied symmetric cost function and Fig. 5 shows the matching results for testbed images with both the proposed symmetric cost functions. The performance according to the applied cost functions is summarized in Table 1. The numbers in Table 1 represent the percentage of bad pixels (i.e. pixel whose absolute disparity error is greater than 1) for all pixels, pixels in untextured areas (except for the ‘Map’ image), and pixels near depth discontinuities. Only non-occluded pixels are considered in performance evalu1 The implementation code can be found at http://www. middlebury.edu/stereo/.

ation and result display, and we ignore a border of 10 (18 for the ‘Tsukuba’ image) pixels when computing statistics. We can see that both the proposed symmetric cost functions really improve the performance of the BP-based method for all testbed images. We then compared the performance of the BP-based method using the proposed symmetric cost functions with the performance of other state-of-the-art BP-based methods as shown in Table 2, although the run parameters in our experiments are not optimal. The performance of the BPbased method with the proposed symmetric cost functions is comparable to the performance of the state-of-the-art methods even without modeling half-occluded pixels explicitly and without using color segmentation. However, the result for the ‘Map’ data set is worse than other methods. This is because the ‘Map’ images are highly textured while the proposed symmetric cost functions are dependent on the color (or intensity) and disparity gradient in both images.

6. Conclusion In this paper, we have proposed symmetric cost functions for both the likelihood and the prior in the MRF model for stereo, aiming at improving performance without modeling half-occluded pixels explicitly and without using color segmentation. In defining cost functions, we took both the reference image and the target image into account. We finally verified the performance improvement of stereo matching due to the proposed symmetric cost functions by applying the proposed symmetric cost functions to the BP-based stereo method. Experimental results for standard testbed images show that the performance of the BP based stereo method is greatly improved by the proposed symmetric cost functions. From the results, we can see that to design cost functions well is a good way to improve the performance of global stereo methods.

Acknowledgments This research was supported by the Korean Ministry of Science and Technology for the National Research Laboratory Program (grant number M1-0302-00-0064) and by the Korea Industrial Technology Foundation.

References [1] S. Birchfield and C. Tomasi. A pixel dissimilarity measure that is insensitive to image sampling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(4):401–406, 1998. [2] A. F. Bobick and S. S. Intille. Large occlusion stereo. International Journal of Computer Vision, 33(3):181–200, 1999.

conventional symm. disc. only symm. data only symm. data & disc.

(a) left images

Table 1. Performance according to the applied cost function Tsukuba Sawtooth Venus nonocc untex disc nonocc untex disc nonocc untex disc 2.76 1.62 9.97 7.06 2.51 20.09 9.23 14.07 14.51 2.52 1.45 9.69 5.24 0.49 18.15 4.72 5.37 11.63 1.15 0.41 6.49 0.75 0.16 4.18 0.78 0.87 2.91 1.07 0.35 6.05 0.69 0 4.17 0.64 0.62 3.05

(b) ground truth

(c) BP with symm. cost

Map nonocc disc 46.41 58.18 43.21 54.89 1.10 13.58 1.06 13.20

(d) bad pixels (error > 1)

Figure 5. Dense disparity maps for the ‘Tsukuba’, ‘Sawtooth’, ‘Venus’, and ‘Map’ images.

Table 2. Performance comparison

BP with symm. cost BP [11] BP+segm. [11] symm. BP (no segm.) [10] symm. BP+segm. [10]

Tsukuba nonocc untex 1.07 0.35 1.61 0.66 1.15 0.42 1.01 0.28 0.97 0.28

disc 6.05 9.17 6.31 5.79 5.45

Sawtooth nonocc untex 0.69 0.00 0.85 0.37 0.98 0.30 0.57 0.05 0.19 0.00

[3] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(11):1222– 1239, 2001. [4] T. Kanade and M. Okutomi. A stereo matching algorithm with an adaptive window: theory and experiment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(9):920–932, 1994. [5] S. B. Kang, R. Szeliski, and J. Chai. Handling occlusions in dense multi-view stereo. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 103–110, 2001. [6] V. Kolmogorov and R. Zabih. Computing visual correspondence with occlusions via graph cuts. In International Conference on Computer Vision, volume 2, pages 508–515, 2001. [7] D. Marr and T. Poggio. Cooperative computation of stereo disparity. Science, 194(4262):283–287, 1976. [8] D. Scharstein and R. Szeliski. Stereo matching with nonlinear diffusion. International Journal of Computer Vision, 28(2):155–174, 1998. [9] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(1–3):7–42, 2002. [10] J. Sun, Y. Li, S. B. Kang, and H.-Y. Shum. Symmetric stereo matching for occlusion handling. In IEEE Conference on Computer Vision and Pattern Recognition, volume 2, pages 399–406, 2005. [11] J. Sun, N.-N. Zheng, and H.-Y. Shum. Stereo matching using belief propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7):787–800, 2003. [12] M. F. Tappen and W. T. Freeman. Comparison of graph cuts with belief propagation for stereo, using identical MRF parameters. In International Conference on Computer Vision, volume 2, pages 900–906, 2003. [13] O. Veksler. Stereo correspondence with compact windows via minimum ratio cycle. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12):1654–1660, 2002. [14] O. Veksler. Fast variable window for stereo correspondence using integral images. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 556–561, 2003. [15] K.-J. Yoon and I. S. Kweon. Adaptive support-weight approach for correspondence search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):650–656, 2006.

disc 4.17 7.92 4.83 3.46 2.09

nonocc 0.64 1.17 1.00 0.66 0.16

Venus untex 0.62 1.00 0.76 0.71 0.02

disc 3.05 12.87 9.13 8.72 2.77

Map nonocc disc 1.06 13.20 0.67 3.42 0.84 5.27 0.14 1.97 0.16 2.20

[16] C. Zitnick and T. Kanade. A cooperative algorithm for stereo matching and occlusion detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):675–684, 2000.