Occlusion filling in stereo: Theory and experiments - CiteSeerX

Report 3 Downloads 31 Views
Computer Vision and Image Understanding 117 (2013) 688–704

Contents lists available at SciVerse ScienceDirect

Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu

Occlusion filling in stereo: Theory and experiments Shafik Huq, Andreas Koschan ⇑, Mongi Abidi Min Kao Department of Electrical Engineering and Computer Science, The University of Tennessee at Knoxville, TN 37996, USA

a r t i c l e

i n f o

Article history: Received 17 February 2011 Accepted 17 January 2013 Available online 1 February 2013 Keywords: Stereo matching Occlusion Probability Image noise Clustering

a b s t r a c t A number of stereo matching algorithms have been developed in the last few years, which also have successfully detected occlusions in stereo images. These algorithms typically fall short of a systematic study of occlusions; they predominantly emphasize matching and regard occlusion filling as a secondary operation. Filling occlusions, however, is useful in many applications such as image-based rendering where 3D models are desired to be as complete as possible. In this paper, we study occlusions in a systematic way and propose two algorithms to fill occlusions reliably by applying statistical modeling, visibility constraints, and scene constraints. We introduce a probabilistic, model-based filling order of the occluded points to maintain consistency in filling. Furthermore, we show how an ambiguity in the interpolation of the disparity value of an occluded point can safely be avoided using color homogeneity when the point’s neighborhood consists of multiple scene surfaces. We perform a comparative study and show that statistically, the new algorithms deliver good quality results compared to existing algorithms. Published by Elsevier Inc.

1. Introduction Stereo vision has been a subject of research for many years [1]; in stereo vision, disparities between points in a scene are estimated from two or more images of the scene. In many recent works, occlusions are detected in addition to estimating a disparity map [2–9]; in much of the work recently reported the detected occlusions are filled by assigning a derived or estimated disparity value to the occluded points. These works regard the occlusion filling as a secondary problem. Therefore, a lack of systematic study of occlusion filling is noted in the literature on stereo vision. Many algorithms consider occluded regions as noise during matching and thus avoid detection or filling of occlusions [10–13]; if occlusions are detected, detection is performed implicitly [14,15] or explicitly [1,3,5]. In implicit detection, occlusions are handled while matching is established. In the explicit version of occlusion detection, occlusions are first detected by performing matching both ways, i.e., left to right and right to left, followed by comparing disparity values of corresponding points from both disparity maps [3,5]. Inequality of magnitudes of the two-way disparity values for a matching point indicates that the point is occluded. In a subsequent step the detected occluded points are filled. Although algorithms for temporal occlusion filling exist, where the background model is learned from disocclusions of previous frames [16], in this paper, our focus is occlusion filling in static stereo images.

⇑ Corresponding author. E-mail address: [email protected] (A. Koschan). 1077-3142/$ - see front matter Published by Elsevier Inc. http://dx.doi.org/10.1016/j.cviu.2013.01.008

The disparity value of an occluded point is usually in agreement with the slope of the plane that fits disparities of the point’s nonoccluded neighbors (this plane is called the disparity plane). Occluded regions, filled directly with neighbors’ disparities, appear inconsistently as fronto-parallel shapes unless the disparity plane is fronto-parallel. In the papers of Yang et al. [5], Sun et al. [15], and Wang et al. [17], the disparity estimation of an occluded point did not consider the slope of the disparity plane. In [5] and [15], the disparity was directly assigned with the disparity of the horizontally closest left non-occluded neighbor. Although disparity information was used in occlusion filling in [17], the usage was limited to depth segmentation of widely separated objects, which is not always the case in stereo images of a scene. Hosni et al. [9] assign a disparity value to an occluded point based on the minimum among disparities of its horizontally closest left and right non-occluded points; this method is essentially the same as considering disparities of only the left non-occluded points as in [5,15], since the left points always have smaller disparities than those of the right points. Occlusion filling in these ways often introduces horizontal streaks in the disparity maps. In order to smooth out the streaks, Hosni et al. [9] applied a median filter based on weights computed from minimum geodesic distance in color. The geodesic distance between two pixels was the lowest aggregated cost along a path between the points, where a cost is Euclidean distance between three color channels of two neighboring points. The filter, which is difficult to design appropriately because of its manually chosen parameters, cannot completely correct errors in the filling, but can only mitigate them up to a limit. In the paper of Yang et al. [18], disparities of occluded points

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

were obtained using Graph-cuts [19,20]. Graph-cuts assign a number of disparity values ranging from 0 to a scene maximum for the occluded points and compute an energy cost for each assignment. The disparity value that requires the least energy is assigned to the occluded points. The algorithm continues assigning different disparity values in several consecutive iterations until no further decrement in energy is observed. Graph-cuts are known to have stair-like effects in disparity maps, which can worsen while filling occlusions, since the cost function of Graph-cuts does not have a validdata cost term inside occluded regions. Besides, the cut-off function for the smoothness cost is not designed to carry surface slope information further out into the occluded region. Furthermore, algorithms exist that detect and fill occlusions during the stereo matching process by using a data term that is occlusion-aware [20–22], i.e., they give a constant penalty for occluded pixels and return the matching costs otherwise. By using such data terms, occlusions are filled automatically via disparity extrapolation using the smoothness term of the energy function. Among these algorithms, Kolmogorov and Zabih [20] do not consider the slope of the disparity plane in occlusion filling, while Woodford et al. [21] and Bleyer et al. [22] do consider the slope. In another implicit algorithm, Klaus et al. [14] segment disparity planes of similar color intensities iteratively using a mean shift algorithm and analysis of matching costs. In their algorithm, disparity of an occluded point is implicitly extrapolated from the disparity plane. In this iterative scheme of the stereo-matching algorithm, current matching cost is used for segmentation of the disparity plane. Although the slope of the disparity plane is included in the disparity estimation in [21,22,14], the occlusion filling algorithm is not independent of the matching algorithm, and therefore cannot be applied to algorithms that detect occlusions explicitly in a separate processing step after matching is performed. Min and Sohn [3] detect occlusions explicitly from crosschecking of disparity maps obtained from both-way matching. Later, occlusions are filled by diffusing energies of neighboring non-occluded points. The energies are obtained from the matching process and therefore occlusion filling is still not independent. In addition, the disparity plane of the occluded points found through this process is not guaranteed to be in agreement with the slope of the disparity plane that contains the surrounding non-occluded points. Oh and Kuo [1] fill occlusions by applying intensity cues and interpolation to avoid slope disagreement. However, in their work ambiguity in occlusion filling is not resolved and occlusion filling order is not maintained; we will see later that ambiguity and order are two aspects of an occlusion filling algorithm which should not be neglected. Table 1 summarizes working principles, pros, and cons of the occlusion filling algorithms mentioned above.

689

In this paper, we describe two new algorithms for occlusion filling. One is based on absolute color difference and weighted least squares and the other one is based on least squares with segmented points. We compare the results obtained from these two new algorithms with results obtained applying two of the algorithms mentioned above: one is the neighbor’s disparity assignment [4,5,9,15] and the other one is an extension of the diffusion in intensity method [3]. All algorithms assume that occlusions are already detected. The new algorithm with the best performance takes surface slope into account when filling the occlusion using linear interpolation. The contributions of this paper include: (1) A comprehensive study of different occlusion types and their origins in stereo images. (2) Discussion of a number of ambiguities in occlusion filling and methods for their removal via application of various scene constraints and cues. (3) Introduction of a specific filling order for the occluded points to achieve higher accuracy in occlusion filling; the order is determined by applying color homogeneity of an occluded point, where homogeneity is defined in a probabilistic framework to avoid manual thresholds/parameters. We demonstrate that the accuracy of occlusion filling in stereo vision can be improved when applying a filling order in addition to ambiguity removal. These occlusion filling algorithms are independent of the stereo matching methods. In order to be usable with the widest possible number of stereo matching methods, these newly developed algorithms can also be applied as a post-processing step with any stereo matching algorithms that also deliver occlusion maps. In Section 3 we introduce a detailed theory of occlusions, where different kinds of occlusions are studied and the stereo scene surface arrangements that are responsible for the occurrence of each of these kinds are pictorially elucidated. Section 4 introduces a theorem of several pre-existing and our newly developed occlusion filling algorithms. Here, we focus on exclusively filling partial occlusions since partial occlusions are by far the dominant occlusion type. Section 5 presents experimental results including comparisons between previous methods and our new algorithms; in our experiments, we use ground truth disparity maps and disparity maps generated by 12 different stereo matching algorithms listed in the Middlebury College stereo algorithm evaluation site. Middlebury images have ground truth disparity maps with occlusions labeled and they are available to the computer vision community through the worldwide web [23]. In Section 6, we draw conclusions and suggest some performance improvements.

Table 1 Summary of occlusion filling algorithms. Papers

Occlusion filling strategies

Working principles, pros, and cons

Oh and Kuo [1]

Intensity cue

Min and Sohn [3]

Diffusion in intensity

Klaus et al. [14]

Implicit filling during the matching process

Yang et al. [4,5], Sun et al. [15], Hosni et al. [9], and Wang et al. [17]

Neighbor’s disparity assignment

Yang et al. [18]

Graph-cuts

Assigns disparity to the occluded point from non-occluded points that have similar color intensities. Applying interpolation, occlusion filling considers slope of the planes belonging to the occluded point. Ambiguity in occlusion filling is not resolved and occlusion filling order is not addressed Assigns disparity to the occluded point from non-occluded points with similar color intensities. Occlusion filling does not consider slope of the planes belonging to the occluded point Occlusion filling is done implicitly during the matching process by segmenting planes with similar color intensities. Occlusion filling of occluded planes remains undefined. The occlusion filling algorithm is not independent to matching Occlusion filling does not account for the slope of the plane that belongs to the occluded point. With this filling, disparity maps may contain visible artifacts in occluded regions (such as horizontal streaks) Since occluded regions usually do not have matched points, the algorithm only has a smoothness term. The cut-off function for smoothness term fails to carry surface slope information further out into the occluded region

690

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

2. Occlusions in stereo vision

3. Theory of occlusion filling

Regions visible in one of the two stereo images that are invisible in the other are called occlusions; this definition implies that occlusions can occur both ways, i.e., regions of the left image could be invisible in the right image and vice versa (one way occlusions are often called half occlusions). In this paper, we distinguish two types of occlusions: (1) Occlusions that occur near the image borders are called border occlusions. (2) Occlusions that appear inside the images when two or more distinct surfaces appear as foregrounds and backgrounds in the scene are called non-border occlusions. Border occlusions in the left image occur due to the right camera missing some of the left portion of the field of view of the left camera. In the case of non-border occlusions, parts of the background near the boundary of two surfaces become invisible (see Fig. 1). Fig. 2 illustrates our definitions of border and non-border occlusions in a Middlebury test image called ‘Teddy’. Notice that those occlusions are half occlusions, i.e., they are shown only in one direction, left-to-right. Although not demonstrated by the ‘Teddy’ test image, there are right-to-left half occlusions as well. Non-border occlusions can be divided into three classes: partial, self, and total occlusions (see Fig. 3). In partial occlusions, only a part of a background surface becomes invisible to one camera. In self-occlusions (also known as Limb-occlusion [24]), a portion of the visible foreground surface becomes invisible to one camera due to the curvature of the surface; note that in self-occlusion, part of a continuous and curved surface is being occluded by another part of the same surface. If images of the curved surface lack gradients, the surface appears planar to binocular stereo, making detection of the occlusion impossible. In total occlusion, an isolated scene surface visible to one camera becomes entirely invisible to the other camera. Noticeably, in total occlusion the occluded surface cannot be interpolated from non-occluded neighboring surfaces; rather, only a range or boundary value of its disparities can be obtained. A range of disparity can be obtained in the presence of two disjoint visible surfaces, one located in front of and the other behind the occluded surface. When the surface is totally occluded by only a disjoint foreground surface we obtain only an upper bound on the disparity of the occluded surface. In this case, the least upper bound is the disparity of the foreground surface. Creation processes of the three non-border occlusions are presented in Fig. 3. Early stereo vision algorithms and even most of the recent ones have not given much attention to occlusion filling. The main focus of most stereo vision algorithms is exclusively stereo matching, although occlusion filled 3D models are required in many rendering applications. Due to the importance of occlusions in stereo vision, the Middlebury College stereo testbed, a well-known testbed for testing stereo matching algorithms, has emphasized an evaluation of matching that includes occluded pixels of a disparity map [23].

In this paper, our focus is on filling border occlusions and partial occlusions. Partial occlusions are much more dominant in a scene than the other two kinds of non-border occlusions, self- and total occlusions. Detecting self-occlusion was addressed in the paper of Romeiro and Zickler [25] in limited cases applying a known 3D model of the scene (human faces that suffer from self-occlusions due to rounded cheeks). In detecting and filling border occlusions and partial occlusions, we deal with any scene of an unknown shape by assuming that the occluded regions, together with their neighborhoods, form planar surfaces that could be fronto-parallel or slanted with respect to the viewing stereo rig. Our assumption is based on an observation that occluded regions usually include the surface of an object along with nearby non-occluded regions of the object, and the occluded region of the object usually maintains the same slope as the neighboring, non-occluded, surface. To incorporate the slope of the plane in the estimation of the disparity of an occluded point it is necessary to extrapolate the disparity of the occluded point from its non-occluded neighbors. In the following, we describe an algorithm referred to here as the Neighbor’s Disparity Assignment (NDA), along with an extension of Min and Sohn’s algorithm [3], as well as two new algorithms for occlusion filling. In the NDA algorithm, a disparity is directly assigned to an occluded point by using the disparity of one of its non-occluded neighboring points. The second algorithm that we describe is called Diffusion in Intensity Space (DIS), which is an extension of the algorithm proposed by Min and Sohn in [3]. DIS assumes that the color of an occluded region is similar to the color of its nonoccluded neighborhood. We further introduce two new algorithms: Weighted Least Squares (WLS) and Segmentation-based Least Squares (SLS); both least squares solutions are linear. Depending on the location of the occluded point and the relative locations of its neighbors, we need to perform extrapolation or interpolation to account for the slope of the neighboring surface patch. Our assumption of the linear model (i.e., that the occluded and neighboring nonoccluded surfaces are planar) is a result of the disparity map. In a small neighborhood, a disparity map often could be mistaken as non-linear for two reasons: (1) disparities of some of the neighbors obtained by the stereo matching algorithm could be inaccurate and (2) the disparity map of a neighborhood may have stair effects due to the disparity values being discrete. A linear model enforces planarity in a neighborhood and thus attempts to correct inaccurately introduced non-linearity. In our new algorithms, a least squares estimation technique is used to estimate parameters of the linear model. It is known that theoretically, interpolation and extrapolation both can be applied with least squares to estimate an unknown value if it is known whether the underlying model is linear or non-linear; in our case, the model is known and is linear. Thus, the least squares estimation approach eliminates the requirement of checking whether interpolation or extrapolation needs to be applied depending on relative locations of the non-occluded neighbors. We will use the term ‘interpolation’

Fig. 1. Origin and classification of occlusions.

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

691

Fig. 2. (a) Stereo image Teddy, (b) border occlusion (in blue) and non-border occlusion (in red) of Teddy, (c) gray-coded ground truth disparity map, and (d) color-coded ground truth disparity map. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 3. Classification of non-border occlusions: left column – views of a 3D scene as seen by the left and right cameras; right column – cross-section of the top view of the scene and camera image planes set up.

hereafter to refer to both of them. All of the algorithms presented in this paper consider filling occlusions only in a left-to-right disparity map. 3.1. Neighbor’s Disparity Assignment (NDA) Fig. 4 shows a simple flow diagram of the occlusion filling algorithm that we call Neighbor’s Disparity Assignment (NDA). In this

algorithm, the border occlusions are filled with disparities of the closest non-occluded points located horizontally to their right and the non-border occlusions are filled with disparities of the closest non-occluded points located horizontally to their left. NDA assumes that a surface patch surrounding the occluded point is fronto-parallel. We conducted experiments with NDA on Middlebury ground truth disparity/occlusion maps and found disparity assignment error rates in the occluded regions as listed in Table 2.

692

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

Fig. 4. Flow diagram of the neighbor’s disparity assignment (NDA) approach.

Table 2 Performance (percentages of matching errors) of NDA. Tsukuba

Venus

Cones

Teddy

Sawtooth

Map

8.0%

5.5%

32.1%

24.6%

10.2%

4.7%

In the case of Tsukuba [23], 8.0% of occluded pixels were filled inaccurately. Inaccurate filling of an occluded point occurs when an estimated disparity of the point differs by more than 1 from the accurate disparity value. Fig. 5 shows NDA occlusion filling results on the image called ‘‘Teddy’’. Occluded regions surrounded by red and blue circles in this figure are incorrectly filled. 3.2. Weighted Least Squares (WLS) In the weighted least squares (WLS) approach (see Fig. 6 for a flow diagram), all non-occluded neighbors or occluded and filled neighbors in a neighborhood are considered as valid neighbors and all of them work as control points in the interpolation. Similar to NDA, the border occlusions are filled in the right-to-left direction and the non-border occlusions are filled from left to right. Some of the control points could belong to the foreground surface and thus introduce non-linearity. Yet, we have applied a linear model in the interpolation due to the following cue in imaging: usually, points on the foreground of a scene have color intensities much different compared to the occluded background. Applying this cue, we set values of the weights in such a way that the weights suppress influence of the foreground points in the interpolation. In a weighted least squares approach, each residual error term in the aggregated residual is weighted according to the above mentioned weighting scheme. Say, an occluded point pL has neighbors qL. The disparities of qL ; lpL ðqL Þ, are given. ^lpL ðpL Þ is the disparity of pL to be estimated from interpolation. The aggregated residual is defined as



X

wqL ð^lpL ðpL Þ  lpL ðqL ÞÞ2 ;

ð1Þ

where wqL ¼ expðlL jIðpL Þ  IðqL ÞjÞ is chosen as the likelihood of pL with qL under the assumption of an exponential distribution model of jIðpL Þ  IðqL Þj. IðpL Þ is the mean intensity of pL and lL is the decay rate. Estimation of these two parameters is described in Section 3.4.2; for now, we assume they are known. The choice of likelihood based weight supports the cue mentioned above, which also says that neighboring pixels with similar color intensities tend to stay on the same surface patch. A weight is large when the absolute intensity difference is small. Thus, similarity in color enforces linearity in WLS method. D is partially differentiated with respect to parameters of the unknown linear model in order to obtain a system of linear equations. We solve these equations with least squares. The model is described briefly as follows. 2 3 x1 y1 1 6 .. . . 7 . Say that F ¼ 4 . . .. 5 is the matrix of the coordinates of all xN

yN

1

the control points (non-occluded or occluded but filled neighbors) 0 and L ¼ ½ l1 l2    lN  is the vector of their corresponding labels. Then the linear model is

lpL ¼ a þ bxðpL Þ þ cyðpL Þ;

ð2Þ

where (x(pL), y(pL)) is the coordinate of pL; a, b, and c are the model parameters. Also, say that the weight vector corresponding to the  0 control points matrix is w ¼ wqL1 wqL2    wqLN . We compute FW = diag(w)F and LW = diag(w)L. If P = [a b c]0 is the parameter vector of the linear mapping,

 1 P ¼ F TW F W F TW LW :

ð3Þ

Once P is estimated, the disparity of the occluded point pL is estimated as

^lp ¼ ½ 1 xðp Þ yðp Þ P: L L L

qL 2N ðpL Þ

Fig. 5. Gray- and color-coded disparity maps for Teddy after filling occlusions in ground truth disparity maps with NDA.

Fig. 6. Flow diagram of the WLS approach for occlusion filling.

ð4Þ

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

693

3.3. Diffusion in Intensity Space (DIS)

Fig. 7. Pseudo code for Diffusion in Intensity Space (DIS).

The development of this method is inspired by [3], where Min and Sohn have solved stereo matching iteratively with non-linear diffusion. The authors estimated the weighted diffusion energy which indicated a good match for its lowest value at a particular disparity label. After detecting occlusions from cross-checking they approximated diffusion energy of the occluded region to determine disparities of the occluded points. One drawback of this algorithm is that the occlusion-filling algorithm is not independent of the stereo matching algorithm. The energies in the last iteration of the matching are considered as the initial diffusion energies of the occlusion filling iterations. Also, the diffusion energy estimation does not include an interpolation mechanism, i.e., the disparity plane of occluded regions is not guaranteed to be in agreement with the slope of the disparity plane of surrounding non-occluded

Fig. 8. A basic flow diagram of the new SLS occlusion filling algorithm.

Fig. 9. Occlusion when two background surfaces are present; (a) ground truth disparity map of the image Tsukuba, (b) a zoomed-in portion of the disparity map, (c) possibility of occluded point to be in one of the background surfaces.

694

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

Fig. 10. Occlusion with one background; (a) ground truth disparity of Tsukuba, (b) a zoomed-in portion of the disparity map, (c) neighborhood in occlusion created in the background by a foreground surface.

regions. In order to make this algorithm independent of the matching algorithms, we formulate a small change: we initially assign the diffusion energy E(pL) as zero when pL is non-occluded, while in the original algorithm, the initial energies are taken from the matching step. We call the updated algorithm DIS. Diffusion energies of the non-border and border occlusions points are updated according to the Eqs. (5) and (6) respectively. 0 EðpL Þ ¼

lp

min L ¼f0;...;lmax g

@

1 2jqL 2 N ðpL Þ ^ lqL ¼lpL j q

X

1 ðjIðpL Þ  IðqL Þj þ EðqL ÞÞA;

L 2N ðpL Þ^lqL ¼lp L

ð5Þ 0

1 EðpL Þ ¼ min @ lp ¼f0;...;lp 2g 2jqL 2 N ðpL Þ ^ lq ¼lp j L L L q Lf

1

X

L 2N ðpL Þ^lqL ¼lp

ðjIðpL Þ  IðqL Þj þ EðqL ÞÞA: L

ð6Þ

The updated equations basically integrate the energies of nonoccluded points that have the same disparity label. The non-occluded point, pL, is assigned with the disparity that corresponds to the minimum E(pL) estimated for all possible disparities. For non-border occlusions the minimum E(pL) is taken over the range from 0 to lpLf . Here, lpLf is the disparity of the horizontally closest non-occluded point located at the right side of pL; this non-occluded point belongs to the foreground which has a disparity bigger than the disparity of pL. The border occlusions are filled in the right-to-left direction and the non-border occlusions are filled in the opposite direction. A pseudo code of the DIS algorithm is presented in Fig. 7.

3.4. Segmentation-based Least Squares (SLS) One major difference between SLS and WLS is that in SLS the control points are a subset of the non-occluded neighbors. The control points are segmented from the neighborhood by applying visibility constraints, disparity gradient constraints, and color similarity cues. Interpolation accounts for the slope of the non-occluded surfaces in the neighborhood. Our approach can be summarized by the following sequence of operations: selecting an occluded point, selecting control points from the neighborhood of the occluded point, and interpolating the disparity of the occluded point from the segmented control points (see Fig. 8 for a basic flow diagram). Selection of the control points is done from a group of potential neighbors that include all non-occluded points in a neighborhood. A difficulty arises when the potential neighbors come from multiple background surfaces, since the occluded point belongs to only one that needs to be identified (see Fig. 9). When there are narrow objects in the scene, the assumption that the disparities of occluded points are smaller than the disparities of the non-occluded points is violated (see Figs. 10–12). A filling order is determined to fill an occluded point selected from a list of all the occluded points attached to at least one non-occluded point. When an occluded point’s neighborhood includes more than one surface, the occluded point may not belong to the surface formed by its non-occluded neighbors; rather, the occluded point may be located on a surface formed only by the occluded points that are yet to be filled. Therefore, occluded points that are attached to at least one non-occluded point are considered

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

695

Fig. 11. Occlusions created by narrow objects; (a) ground truth disparity map of the image Tsukuba, (b) a zoomed-in portion of the disparity map, (c) occlusion created in the background by a narrow object.

as potential points for filling in the first iteration. Among these potential points, the occluded point which has the lowest homogeneity estimate (i.e., the point exhibits high homogeneity or color similarity with its neighbors) is filled first. Then, we segment out control points for interpolation. Following, segmentation is described first and then filling order is described in a subsequent section. 3.4.1. Segmenting out control points Say that pL is the occluded point we want to fill and N ðpL Þ is defined as a set of non-occluded neighbors of pL. Our goal is to find a set of control points to use in the interpolation. Say that N ðpL Þ is currently empty. From the definition of occlusion in stereo images we obtain the following condition for non-border occlusions: the disparity of an occluded point is less than the disparity of its horizontally closest foreground point to the right. We call this foreground point and its disparity pLf and lpLf respectively. The nonoccluded neighboring points that have disparities less than lpLf are added to N ðpL Þ. If the occlusion is caused by a foreground point that is part of a narrow object, the non-occluded points in both sides (left and right) of pL hold similar disparities. Therefore, we define a second condition based on a disparity gradient constraint. Say, the horizontally closest non-occluded point in the left side is pLb and its disparity is lpLb . We apply the disparity gradient constraint jlpLb  lqL j 6 1 for a non-occluded neighbor, qL, with disparity lqL and include the neighboring points, which satisfy the constraint in N ðpL Þ. The combined condition is described as:

jlpLb  lqL j 6 1 _ lqL < lpLf :

ð7Þ

While N ðpL Þ is formed, the points qL in N ðpL Þ) may come from two or more background surfaces. If we observe the ground truth disparity maps of the Middlebury test images, we find that in a small neighborhood, more than three surfaces (belonging to both background and foreground) are not present. Such neighborhoods may exist but in a negligibly low number. Therefore, we can safely assume that N ðpL Þ contains points from not more than two background surfaces. These two surfaces have points with disparities in two distinct ranges and one of these two surfaces is closer to the camera than the other. Therefore, the minimum lmin of all the disparities of the pixels in N ðpL Þ belongs to one surface and the maximum, lmax to the other. If lmax  lmin 6 1 holds then there are points only from one surface in N ðpL Þ; otherwise, in the following way we determine which of the two surfaces pL belongs to. First, the points are segmented into two groups. One of the two groups contain the points that satisfy the condition, jlmax  lqL j 6 1 and the other contains the points that satisfy the condition, jlmin  lqL j 6 1; here, qL is a member of N ðpL Þ. We find the average truncated color distance (defined in Section 3.4.2) of pL to each group. pL belongs to one of the two groups that has a smaller average color distance than the other.

3.4.2. Determining the filling order The occluded point with the highest priority has the lowest homogeneity estimate. We define homogeneity in the following way. pL is homogeneous with its neighbor qL if the intensity of qL is within 3*sigma of the mean intensity of pL. Otherwise, the neighbor is non-homogeneous. In consequence, homogeneity estimation

696

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

Fig. 12. Border occlusion filling needs a filling order; (a) ground truth disparity map of the image Tsukuba, (b) a zoomed-in portion of the disparity map, (c) ground truth disparity map.

Fig. 13. Detailed flow diagram of the new SLS occlusion filling algorithm.

697

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704 Table 3 Percentages of errors in occlusion filling applying NDA, DIS, WLS, and SLS. Algorithms

Tsukuba

Venus

Cones

Teddy

Sawtooth

Map

Score

Neighbor’s Disparity Assignment (NDA) [4]

8.0 3 3.98 1 8.50 4 5.49 2

5.5 4 4.66 2 5.09 3 4.58 1

32.1 1 39.29 4 34.95 3 34.89 2

24.6 2 27.2 4 24.85 3 16.26 1

10.2 4 2.72 2 7.87 3 1.74 1

4.7 4 1.75 2 4.67 3 0.87 1

3.0

Diffusion in Intensity Space (DIS)[3] Weighted Least Squares (WLS) Segmentation-based Least Squares (SLS)

2.5 3.17 1.33

Fig. 14. Gray- and color-coded disparity maps after filling occlusions in ground truth disparity maps with (a) WLS, (b) DIS, and (c) SLS. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

leads to the estimation of the variance of the mean intensity of pL, where the mean intensity is conceptually the intensity value that pL would have if pL was not corrupted with noise; the estimation is done as follows. Say that qL is the immediate right neighbor of an image point pL, where pL is any image point (in one of the stereo images) except those along the right image border. qL can be chosen from any other location (immediate top of pL for instance), but the relative location of qL must be taken as the same for any pL. Then, for the variable DI = jI(pL)  I(qL)j we define a mixture probability model,

8 > < f expðlDIÞ; DI is due to noise; PðDIÞ ¼ N1 ; otherwise; i:e:; DI is due to texture variation; > :

where the absolute difference, DI, is due to noise when pL and qL are in a region without texture variation, l is a decay rate with l1 being the variance of DI, f is the normalizing factor with

lÞÞ f ¼ ð1expð and N = max(DI) + 1. Say that the probability of DI ð1expðlNÞÞ having the exponential distribution is a. Then, in the mixture model,

PðDIÞ ¼ af expðlDIÞ þ ð1  aÞ

1 : N

ð8Þ

This model was first used by Zhang and Seitz [26] and later by Huq et al. [27] in MRF stereo matching. For estimating l we refer to these papers, where an expectation maximization (EM) algorithm was adopted. The variable jIðpL Þ  IðqL Þj, introduced in Section 3.2, has a decay rate, lL, that is related to the decay rate, l, of the variable jI(pL)  I(qL)j, by l2L ¼ l, since both the variables are exponentially distributed. Accordingly, homogeneity H(pL, qL) is defined by the following mixture model,

( HðpL ; qL Þ ¼

PðjIðpL Þ  IðqL ÞjÞ; if jIðpL Þ  IðqL Þj 6 3l1 L ;  1  P 3lL ; otherwise:

ð9Þ

698

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

Fig. 15. Disparity maps, occlusions, and occlusion filling results on the Middlebury College test images: Map, Venus, Tsukuba, Sawtooth, Cones, and Teddy (from top to bottom). Occlusions are filled using the SLS linear interpolation model.

where IðpL Þ is the mean intensity of pL obtained by applying a mean shift algorithm in a window surrounding pL. For an estimation of IðpL Þ, first, IðpL Þ is initialized with the intensity of pL. Then, the mean shift algorithm repeatedly picks those neighbors inside the window that satisfy jIðpL Þ  IðqL Þj 6 3l1 and assign the average of intensities of the selected neighbors to IðpL Þ until IðpL Þ converges to a fixed average. For more details about the mean shift algorithm see [28]. Homogeneity of a site pL with its neighborhood, N ðpL Þ; HðpL ; N ðpL ÞÞ is defined as def

HðpL ; N ðpL ÞÞ¼

Y

HðpL ; qL Þ:

ð10Þ

qL 2N ðpL Þ

Since we are concerned with only a point’s relative homogeneity, a negative log-homogeneity,

 log HðpL ; N ðpL ÞÞ ¼ A þ

X

wðpL ; qL Þ;

ð11Þ

It was mentioned in Section 3.4.1 that if N ðpL Þ has points from two surfaces, N ðpL Þ is divided into two neighborhoods (i.e., surfaces), N 1 ðpL Þ and N 2 ðpL Þ; note that N ðpL Þ ¼ N 1 ðpL Þ [ N 2 ðpL Þ. One of these two neighborhoods works as the set of control points in the interpolation of the disparity of pL. We define the average truncated color distance as

DðpL ; N i ðpL ÞÞ ¼

X 1 wðpL ; qL Þ: jN i ðpL Þj q 2N ðp Þ L

ð13Þ

L

If DðpL ; N 1 ðpL ÞÞ < DðpL ; N 2 ðpL ÞÞ, we pick N 1 ðpL Þ as the set of control points; otherwise, we pick N 2 ðpL Þ. Once the control points are selected, the parameters of the planar surface comprised of the disparities of the control points are estimated in the same way as described before in the WLS-based algorithm with an exception that all weights are assigned to 1.0.

qL 2N ðpL Þ

is sufficient (and also computationally less expensive). In Eq. (11), A is a constant due to the normalization factor of P(). w(pL, qL) is a cut-off function

( wðpL ; qL Þ ¼

jIðpL Þ  IðqL Þj; if jIðpL Þ  IðqLL Þj 6 3l1 L 3l1 L ;

otherwise:

ð12Þ

Since A does not affect a search for occluded points with the lowest homogeneity estimate, it can be ignored. HðpL ; N ðpL ÞÞ can P be replaced with qL 2NðpL Þ wðpL ; qL Þ. Accordingly, the highest homogeneity, HðpL ; N ðpL ÞÞ, corresponds to the smallest estimate of P qL 2NðpL Þ wðpL ; qL Þ.

3.4.3. Summary of the SLS algorithm We summarize the SLS occlusion filling algorithm with a flow diagram in Fig. 13. First, we find the occluded point with the highP est priority, i.e., pL with the lowest qL 2NðpL Þ wðpL ; qL Þ. Then we apply the constraint jlpLb  lqL j 6 1 _ lqL < lpLf to find an initial set of control points. These control points form only one surface if lmax  lmin 6 1. We call the background surface, S0, and take all its neighbors as control points. Otherwise, we assume that there are two background surfaces. The minimum and maximum disparities of the neighbors are taken as two seed points and a disparity gradient constraint is applied on these two seed points to extract the two background surfaces, S1 and S2. We measure the respective

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

699

Fig. 16. Color-coded disparity maps for the Middlebury test images: Tsukuba, Venus, Sawtooth, Map, Teddy, and Cones (from top to bottom); left column: test images, middle column: ground truth, and right column: our results obtained with SLS presented with color-coding scale. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

color distances of these two surfaces from the occluded point. Then, it is more likely that pL is located on the surface with the smaller color distance. The neighbors that belong to this surface are the control points that are used in the interpolation.

4. Experiments We have described four methods in this paper – NDA, DIS, WLS, and SLS and applied them to the Middlebury stereo images to study their relative performances. Middlebury provides ground truth disparity maps for all the test images with their occlusions filled with correct disparities (disparities in both occluded and

non-occluded regions are determined using structured light technique [29]). The test images we have used are: Tsukuba, Venus, Cone, Teddy, Sawtooth, and Map. These images are also used to study the performance of stereo matching algorithms. To conduct the study, first we detect occlusions explicitly from cross-checking the left-to-right and right-to-left disparity maps. We used stereo matching algorithm of Huq et al. [27] to obtain the disparity maps. The matching algorithm is designed to deliver left-to-right and right-to-left disparity maps, which allows occlusion detection explicitly from cross-checking. Table 3 shows the performance of the four occlusion-filling algorithms applied on the test images. Each cell in the table has two numbers. The upper number is the percentage of occluded points in an image that are

700

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

Table 4 Percentages of errors of NDA, DIS, WLS, and SLS in occlusion filling for disparity maps from top stereo matching algorithms listed in the Middlebury new evaluation table. Tsukuba NDA DIS WLS SLS

NDA DIS WLS SLS

NDA DIS WLS SLS

NDA DIS WLS SLS

Cones

Teddy

Score

ADCensus (Mei et al., 2011) [30] 13.20 5.6 44.88 4 4 1 9.3 4.87 50.99 1 1 4 12.31 5.52 45.00 3 3 2 10.45 4.94 47.73 2 2 3

45.93 2 49.56 4 48.11 3 41.99 1

2.76

AdaptingBP (Klaus et al., 2006) [14] 9.87 5.56 41.40 3 4 2 6.15 4.65 45.88 1 1 4 11.02 5.27 42.34 4 3 3 9.74 4.98 39.32 2 2 1

36.60 3 34.95 2 40.22 4 31.61 1

RDP (Sun et al., 2011) [33] 14.01 7.26 4 4 8.10 6.06 2 2 10.98 5.85 3 1 8.06 7.51 1 3

43.18 3 50.91 4 42.77 2 39.21 1

SubPixDoubleBP 11.42 3 3.94 1 11.73 4 6.46 2

Venus

42.26 2 43.96 3 47.12 4 41.69 1

(Yang et al., 2007) [35] 6.42 47.53 4 2 4.73 48.58 1 4 5.56 45.64 3 1 4.91 48.36 2 3

43.62 3 41.77 2 43.66 4 36.02 1

2.5 2.76 2.0

3.0 2.0 3.5 2.0

3.25 2.75 3.0 1.33

3.0 2.0 3.0 2.0

Tsukuba

Venus

Cones

Teddy

Score

CoopRegion (Wang and Zheng, 2008) [31] 7.71 5.24 41.47 3 3 2 7.48 4.69 47.87 2 1 4 9.92 6.10 41.45 4 4 1 6.90 5.12 41.75 1 2 3

35.49 3 29.33 1 37.46 4 31.15 2

2.76

DoubleBP (Yang et al., 2008) [32] 11.42 6.42 46.51 3 4 3 3.94 4.73 47.70 1 2 4 11.47 5.16 44.82 4 3 2 6.51 4.55 44.23 2 1 1

44.79 4 42.35 2 43.53 3 41.12 1

OutlierConf (Xu and Jia, 2008) [34] 14.52 5.81 45.33 4 4 2 10.09 4.69 48.30 1 1 4 12.57 5.16 46.80 3 2 3 11.33 5.77 44.98 2 3 1

41.53 2 37.19 4 39.51 3 35.35 1

SurfaceStereo (Bleyer et al., 2010) [36] 12.26 5.99 43.69 4 4 1 9.69 4.44 49.13 2 1 4 11.69 6.17 43.77 3 3 2 8.1 4.91 43.81 1 2 3

36.22 3 32.18 2 36.71 4 30.87 1

2.0 3.25 2.0

3.5 2.25 3.0 1.25

3.0 2.5 2.75 1.75

3.0 2.25 3.0 1.75

Table 5 Percentages of errors of NDA, DIS, WLS, and SLS in occlusion filling for disparity maps from stereo matching algorithms listed in the Middlebury old evaluation table. Tsukuba NDA DIS WLS SLS

NDA DIS WLS SLS

Venus

Sawtooth

Map

Score

Patch-based (Deng et al., 2005) [37] 15.01 6.53 15.01 4 4 4 5.84 4.44 6.39 1 1 2 12.93 5.81 13.90 3 3 3 8.10 5.16 4.91 2 2 1

17.14 4 1.00 2 3.66 3 0.74 1

4.0

GC + mean shift (Jang et al., 2006) [39] 9.65 26.36 34.09 3 3 4 5.00 41.42 24.58 1 4 3 10.40 9.67 10.99 4 1 2 6.37 13.25 6.15 2 2 1

2.43 3 0.77 2 2.49 4 0.68 1

3.25

correctly filled. The lower number indicates the rank of performances by the competing algorithms. For example, for Tsukuba, DIS performs the best, and hence has the performance rank 1; WLS performs worst for the same image with a rank of 4. All algorithms are ranked in a similar way for each of the six images and then they are averaged to obtain a score. The table shows that SLS performs the best with the lowest score 1.33. NDA, DIS, and WLS each had performance scores of 2.5 or higher, with WLS having a score over 3. For the ground truth disparity and occlusion

1.5 3.0 1.5

2.5 2.75 1.5

Map

Score

Graph + segm. (Bleyer and Gelautz, 2005) [38] 10.67 5.52 15.30 3 4 4 9.92 4.65 2.95 2 1 1 11.91 5.16 12.49 4 3 3 8.72 5.01 6.65 1 2 2

Tsukuba

Venus

Sawtooth

1.65 4 0.93 3 0.77 1 0.77 1

2.75

Segm.+glob.vis. (Bleyer and Gelautz, 2004) [40] 10.45 10.97 9.29 4 3 4 6.90 5.45 1.57 1 1 1 9.78 11.19 7.32 3 4 3 8.50 10.14 1.64 2 2 2

1.97 4 1.26 3 0.84 2 0.93 1

1.75 3.17 1.5

3.0 1.5 3.17 1.75

map of the test image Teddy, we showed the occlusion filling results for NDA in Fig. 5. For visual comparison, Fig. 14 shows the results for the rest of the algorithms; occlusion filling has improved in the encircled regions shown in Fig. 5 with SLS showing the best filling performance. Fig. 15 presents visualizations of occlusion filling results for all the images obtained by applying SLS. From left to the right, the figure shows ground truth disparity maps, disparity maps of left-to-right and right-to-left matches, detected occlusions, and

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

701

Fig. 17. Occlusion filling for the first dataset of disparity maps for Tsukuba, Venus, Cones, and Teddy. From top, row 1 shows ground truth disparity map; rows 2 and 3 show disparity maps of ADCensus [30] with occlusion and after occlusion filling; rows 4 and 5 show disparity maps of CoopRegion [31] with occlusion and after occlusion filling; rows 6 and 7 show disparity maps of AdaptingBP [14] with occlusion and after occlusion filling.

disparity maps after occlusion filling. For a better visualization of the local disparity contrast after filling occlusions, Fig. 16 shows color-coded versions of the disparity maps. For the Middlebury test images, the run time of the SLS algorithm ranges from 3 to 10 min on an AMD dual core Athlon machine. Middlebury test images ranges from 384  288 to 450  375 in resolution. By contrast, WLS takes less than 10 s. The experiments on ground truth data suggest that SLS is the best and DIS is the second best performer in this comparison. The next set of experiments is designed to evaluate how well the algorithms perform on disparity maps generated from a variety of stereo matching algorithms. We use the disparity maps generated by the top stereo matching algorithms listed in the Middlebury stereo algorithm evaluation website. We collected these disparity maps from the site for two sets of data; we will call them Middlebury new table data and Middlebury old table data. The first

dataset includes disparity maps for Tsukuba, Venus, Cones, and Teddy and the second one includes disparity maps for Tsukuba, Venus, Sawtooth, and Map. For the first dataset, we have disparity maps from eight different algorithms. We use the same naming for the algorithms as in the Middlebury table [23]; the names of the algorithms are followed by authors’ names and year of publication in closed parentheses. The algorithms are ADCensus (Mei et al., 2011) [30], CoopRegion (Wang and Zheng, 2008) [31], AdaptingBP (Klaus et al., 2006) [14], DoubleBP (Yang et al., 2008) [32], RDP (Sun et al., 2011) [33], OutlierConf (Xu and Jia, 2008) [34], SubPixDoubleBP (Yang et al., 2007) [35], SurfaceStereo (Bleyer et al., 2010) [36]. In the same way, the algorithms for the second dataset are called Patch-based (Deng et al., 2005) [37], Graph + segm. (Bleyer and Gelautz, 2005) [38], GC + mean shift (Jang et al., 2006) [39], Segm.+glob.vis. (Bleyer and Gelautz, 2004) [40]. For the second dataset, we were able to collect consistent disparity

702

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

Fig. 18. Occlusion filling for the second dataset of disparity maps for Tsukuba, Venus, Sawtooth, and Map. From top, row 1 shows ground truth disparity map, rows 2 and 3 show disparity maps of Patch-based [37] with occlusion and after occlusion filling; rows 4 and 5 show disparity maps of Graph + segm [38] with occlusion and after occlusion filling; rows 6 and 7 show disparity maps of GC + mean shift [39] with occlusion and after occlusion filling.

maps for four algorithms. In our experiments with real disparity maps (left-to-right), we marked the occluded points from ground truth occlusion map. In absence of right-to-left disparity maps this is one way we could mark the occluded points. This procedure of marking, however, does not jeopardize our experiments, since the filling will still rely on real disparity data. We have conducted experiments with disparity maps from 12 different algorithms – 8 from the Middlebury new table and 4 from the Middlebury old table. Numerical occlusion filling results from these experiments are presented in Tables 4 and 5, respectively. Among the first 8 experiments, SLS performs the best in five cases; in the remaining three cases both SLS and DIS perform equally better than NDA and WLS. Fig. 17 shows visual occlusion filling results obtained from three of these experiments. Among the second 4 experiments, SLS performs

the best in two cases and DIS performs the best in one case; both SLS and DIS perform equally better than NDA and WLS in the remaining one case. Fig. 18 shows visual occlusion filling results obtained from three of these experiments. In summary, in the experiments with disparity maps created by all 12 algorithms, SLS performed the best in seven cases, performed equally with DIS in four cases, and worse than DIS in one case.

5. Conclusion In this paper, occlusions were studied by classifying them based on their origin of creation. Existing stereo matching algorithms were implemented to obtain disparity maps both ways (left-to-right and right-to-left matching) and detect occlusions

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

from cross-checking. We developed two new occlusion filling algorithms; one is built on absolute color difference based weights (WLS) and the other one is based on the segmentation of occluded points (SLS). We compared the performance of the two new occlusion filling algorithms with two other methods; one uses direct assignments of a neighbor’s disparity (NDA) and the second is an extension of the diffusion in intensity method (DIS). The algorithms were evaluated using disparity maps of Middlebury stereo test images. One evaluation was done using ground truth disparity maps and occlusions; the other evaluation used disparity maps obtained by 12 different stereo matching algorithms. In addition, a stereo matching algorithm [27] was implemented to obtain disparity maps both ways, detect occlusions from cross-checking, and fill the occlusion using one of the new algorithms. In contrast to many of the existing occlusion filling algorithms, the new algorithms introduced here are independent of the matching algorithm; hence, they can be used to fill occlusions once they are detected by any algorithm. Among the studied algorithms, Segmentation-based Least Squares (SLS) performed the best. SLS takes into account the slope of the surface that an occluded point is located on. The disparity value of the occluded point is interpolated from a set of control points selected from neighbors of the occluded point. The set of control points was extracted by applying the visibility constraint, the disparity gradient constraints, and the equality constraint. The equality constraint allows the SLS algorithm to fill occlusions created by narrow objects, although occlusions in this case do not hold the visibility constraint. To remove any ambiguity in selecting control points due to the presence of more than one background object, a probabilistic model of color homogeneity with statistical estimations of the constraint parameters was applied. To propagate the disparity map consistently while filling the occlusions, a filling order was introduced; the filling order is determined from estimates of color homogeneity. We have shown that statistically, SLS is the best performing algorithm in this comparison. Notice that we have introduced a statistical estimation of homogeneity where we estimate the decay parameter lL. lL is needed in Eq. (12) in finding a score for homogeneity. However, one can manually assign a value for this parameter and can still obtain an approximate score for the homogeneity. Manual assignment may help in fast implementation of the algorithm where achieving high accuracy is not so important. SLS is computationally more expensive than NDA and WLS, but faster than DIS. The run time for SLS is on the order of minutes. In general, the run time depends on the number of occluded points present in the scene; the number is determined by the contents of the scene and the baseline length of the stereo system. Most of the run time is spent on estimations of mean intensity and homogeneity of occluded points. Both estimations, however, can be done in parallel using a GPGPU (General Purpose Graphics Processing Unit). Such GPGPU-based parallel estimations guarantee computation times of the parameter estimates to remain within a given time limit, independent of the number of occluded points. Using a GPGPU for the parameter estimation would cut the computation time of SLS from approximately 3 min to a faster time frame (seconds) that would be more desirable for many applications.

Acknowledgments This work was supported in part by the University Research Program in Robotics under Grant DOE-DE-FG52-2004NA25589 and the US Air Force under Grant FA8650-10-1-5902.

703

References [1] M.Z. Brown, D. Burschka, G.D. Hager, Advances in computational stereo, IEEE Transsctions on Pattern Analysis and Machine Intelligence 25 (8) (2003) 993– 1008. [2] J. Oh, C.-C. Kuo, Robust stereo matching with improved graph and surface models and occlusion handling, Journal of Visual Communication and Image Representation 21 (5–6) (2010) 404–415. [3] D. Min, K. Sohn, Cost aggregation and occlusion handling with WLS in stereo matching, IEEE Transactions on Image Processing 17 (8) (2008) 1431–1442. [4] Q. Yang, L. Wang, R. Yang, H. Stewénius, D. Nistér, Stereo matching with colorweighted correlation, hierarchical belief propagation and occlusion handling, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (3) (2009) 492–504. [5] Q. Yang, L. Wang, R. Yang, H. Stewénius, D. Nistér, Stereo matching with colorweighted correlation, hierarchical belief propagation and occlusion handling, in: CVPR (2), 2006, pp. 2347–2354. [6] H. Ishikawa, D. Geiger, Occlusions, discontinuities, and epipolar lines in stereo, in: ECCV, 1998, pp. 232–248. [7] P.N. Belhumeur, D. Mumford, A Bayesian treatment of the stereo correspondence problem using half-occluded regions, in: CVPR, 1992, pp. 506–512. [8] Q. Luo, J. Zhou, S. Yu, D. Xiao, Stereo matching and occlusion detection with integrity and illusion sensitivity, Pattern Recognition Letters 24 (9–10) (2003) 1143–1149. [9] A. Hosni, M. Bleyer, M. Gelautz, C. Rhemann, Local stereo matching using geodesic support weights, in: ICIP, 2009. [10] J. Kim, K.M. Lee, B. Choi, S. Lee, A dense stereo matching using two-pass dynamic programming with generalized ground control points, in: CVPR (2), 2005, pp. 1075–1082. [11] S. Huq, B. Abidi, M. Abidi, Stereo-based 3D face modeling using annealing in local energy minimization, in: 14th International Conference on Image Analysis and Processing, 2007, pp. 10–13. [12] D. Scharstein, R. Szeliski, Stereo matching with nonlinear diffusion, International Journal of Computer Vision 28 (2) (1998) 155–174. [13] D. Scharstein, R. Szeliski, A taxonomy and evaluation of dense two-frame stereo correspondence algorithms, International Journal of Computer Vision 47 (1) (2002) 7–42. [14] A. Klaus, M. Sormann, K. Karner, Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure, in: Proceedings of ICIP, 2006, pp. 15–18. [15] J. Sun, Y. Li, S.B. Kang, H.-Y. Shum, Symmetric stereo matching for occlusion handling, in: CVPR (2), 2005, pp. 399–406. [16] A. Criminisi, J. Shotton, A. Blake, C. Rother, P.H.S. Torr, Efficient dense stereo with occlusions for new view-synthesis by four-state dynamic programming, International Journal of Computer Vision 71 (1) (2007) 89–110. [17] L. Wang, H. Ruigang Yang, and Minglun Gong, Stereoscopic inpainting: joint color and depth completion from stereo images, in: CVPR, 2008, pp. 1–8. [18] Q. Yang, Y. Deng, X. Tsang, X. Lin, Occlusion handling in stereo imaging, 2007, patented by Microsoft Corporation. [19] V. Kolmogorov, R. Zabih, Computing visual correspondence with occlusions using graph cuts, in: ICCV (11), 2001, pp. 508–515. [20] V. Kolmogorov, R. Zabih, Multi-camera scene reconstruction via graph cuts, in: ECCV (3), 2002, pp. 82–96. [21] O. Woodford, P. Torr, I. Reid, A. Fitzgibbon, Global stereo reconstruction under second order smoothness priors, in: CVPR, 2008, pp. 1–8. [22] M. Bleyer, C. Rother, P. Kohli, D. Scharstein, S. Sinha, Object stereo – joint stereo matching and object segmentation, in: CVPR, 2011, pp. 3081–3088. [23] http://vision.middlebury.edu/stereo/eval/(accessed in December 2011). [24] R. Chung, R. Nevatia, Use of monocular groupings and occlusion analysis in a hierarchical stereo system, Computer Vision and Image Understanding 62 (3) (1995) 245–268. [25] F. Romeiro, T. Zickler, Model-based stereo with occlusions, in: IEEE International Workshop on Automatic Face and Gesture Analysis (AMFG), 2007, pp. 31–45. [26] L. Zhang, S.M. Seitz, Estimating optimal parameters for MRF stereo from a single image pair, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2) (2007) 331–342. [27] S. Huq, A. Koschan, B. Abidi, M. Abidi, MRF stereo with statistical estimation of parameters, in: 4th International Symposium on 3D Data Processing, Visualization, and Transmission, 2008. [28] D. Comaniciu, P. Meer, Mean shift: a robust approach towards feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5) (2002) 603–619. [29] D. Scharstein, R. Szeliski, High-accuracy stereo depth maps using structured light, in: CVPR (1), 2003, pp. 195–202. [30] X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, X. Zhang, On building an accurate stereo matching system on graphics hardware, in: IEEE Workshop on GPUs for Computer Vision, 2011. [31] Z. Wang, Z. Zheng, A region based stereo matching algorithm using cooperative optimization, in: CVPR, 2008, pp. 1–8. [32] Q. Yang, L. Wang, R. Yang, H. Stewénius, D. Nistér, Stereo matching with colorweighted correlation, hierarchical belief propagation and occlusion handling, IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (3) (2008) 492–504.

704

S. Huq et al. / Computer Vision and Image Understanding 117 (2013) 688–704

[33] X. Sun, X. Mei, S. Jiao, M. Zhou, H. Wang, Stereo matching with reliable disparity propagation, in: International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), 2011, pp. 132–139. [34] L. Xu, J. Jia, Stereo matching: an outlier confidence approach, in: ECCV (IV), 2008, pp. 775–787. [35] Q. Yang, R. Yang, J. Davis, D. Nistér, Spatial-depth super resolution for range images, in: CVPR, 2007, pp. 1–8. [36] M. Bleyer, C. Rother, P. Kohli, Surface stereo with soft segmentation, in: CVPR, 2010, pp. 1570–1577.

[37] Y. Deng, Q. Yang, X. Lin, X. Tang, A symmetric patch-based correspondence model for occlusion handling, in: ICCV, 2005, pp. 1316–1322. [38] M. Bleyer, M. Gelautz, Graph-based surface reconstruction from stereo pairs using image segmentation, in: SPIE (5665), 2005, pp. 288–299. [39] J. Jang, K. Lee, S. Lee, Stereo matching using iterated graph cuts and mean shift filtering, in: 7th Asian Conference on Computer Vision (ACCV), 2006, pp. 31– 40. [40] M. Bleyer, M. Gelautz, A layered stereo algorithm using image segmentation and global visibility constraints, in: ICIP (5), 2004, pp. 2997–3000.