Blur and Contrast Invariant Fast Stereo Matching - Department of ...

Report 5 Downloads 21 Views
Blur and Contrast Invariant Fast Stereo Matching Matteo Pedone and Janne Heikkil¨a Department of Electrical and Information Engineering, University of Oulu, Finland [email protected],[email protected]

Abstract. We propose a novel approach for estimating a depth-map from a pair of rectified stereo images degraded by blur and contrast change. At each location in image space, information is encoded with a new class of descriptors that are invariant to convolution with centrally symmetric PSF and to variations in contrast. The descriptors are based on local-phase quantization, they can be computed very efficiently and encoded in a limited number of bits. A simple measure for comparing two encoded templates is also introduced. Results show that, the proposed method can represent a cheap but still effective way for estimating disparity maps from degraded images, without making restrictive assumptions; these advantages make it attractive for practical applications.

1

Introduction

Stereo matching is a widely researched topic, and cannot still be considered a solved problem. It has been addressed in a multitude of different ways, and currently there is still a large gap in terms of accuracy between the state-of-the-art methods, that are usually computationally expensive, and the faster ones that are more suitable for practical applications. Good overviews including analyses and comparison among different methods are [2,1]. Presently, more interest is being devoted to propose algorithms that work robustly under non-ideal conditions, due for example to the presence of highlights or transparent objects, exposure or contrast differences, and other common optical degradations [6,7]. Our work is focused in performing stereo matching with a pair of images degraded by different amount of blur and contrast change. Despite being an interesting and non-trivial task, little work has been done until the time of writing in this area, and most of the current approaches rely on extra information gained by estimating depth from (de)focus and integrating it with conventional stereo correspondence methods. These methods are either computationally expensive [8], or work under very restricting assumptions [9], and do not consider the influence of other radiometric changes besides out-of-focus blur. Concerning the type of scheme for estimating the disparity map, we opted for following the same strategy used by conventional area-based algorithms, since local methods are notably faster than global methods. In this sense, during the cost-aggregation step, the problem is essentially equivalent to that of matching templates in degraded images. The usual way of dealing with the issue is to recur to the use of blur and contrast invariant J. Blanc-Talon et al. (Eds.): ACIVS 2008, LNCS 5259, pp. 883–890, 2008. c Springer-Verlag Berlin Heidelberg 2008 

884

M. Pedone and J. Heikkil¨ a

descriptors. Flusser et al. in [3] proposed several descriptors that they obtain by specific combinations of higher order central moments; they are invariant to a wide range of typical geometric and radiometric degradations. Other methods like [5] are directly derived from properties of the Fourier transform, and they have been successfully used to perform blur-invariant phase correlation. Van de Weijer et al. propose color angles that are robust to blur, contrast changes and illuminant color [4], but they are apparently efficient mainly for building reliable histograms for image-retrieval. Considering that invariant descriptors are rather sensitive to noise and less efficient to calculate, we preferred to develop a new class of blur and contrast invariant phase-based descriptors. As we will show, these local descriptors can be derived in a very fast way, and we consider it an important requirement.

2

Blur Robust Stereo Matching

In this section we present blur and contrast invariant descriptors based on quantized local phase. We also introduce a measure of similarity between two encoded templates discussing also its limits, and describe the approach for the estimation of the disparity map. 2.1

Phase-Based Local Descriptors

Under the assumption that image noise is negligible and the blur point-spreadfunction (PSF) is centrally symmetric, it is fairly easy to show [3,5] that considering an arbitrary phase value ΦA (u, v) in the spectral domain, the term 2kΦA (u, v) (for any k ≥ 1) is convolution and contrast invariant (any variation in contrast affects only the magnitude spectrum). If we consider a discretized N × N image block A, a descriptor for A is naturally given by Gk (A) = {2kΦA (u, v) | 0 ≤ u, v ≤ N − 1} .

(1)

However further considerations can be done. Firstly, the stereo pair is assumed to be rectified, so corresponding pixels in the left and right images, will appear horizontally displaced by an amount that is inversely proportional to the z-depth associated with that point. This implies that when two templates of width W contain pixels of the same depth, for the shift-theorem their phase spectra will be (in this case, approximately) related according the following equation: ΦL (u, v) − ΦR (u, v) ≈

2πu t W

(2)

where t is the translation displacement in pixels. This observation is used by many phase-based stereo methods, that try to estimate the gradient of the phase difference between two portions of the left and right images. Moreover it is apparent that it is not necessary to sample the whole set Gk (A), in fact the left term in (2) is always 0 when u = 0 and it remains unchanged by varying v ; in

Blur and Contrast Invariant Fast Stereo Matching

885

Fig. 1. Components of the spectrum encoded in the descriptor for r=4, s=2 and for r=s=1

addition, being the image function A always real-valued, the resulting spectrum is always antisymmetric. Furthermore, justified by the work of Curtis et al. [11] that demonstrated the high informativeness of the sign of the phase, we consider b-bits discretized phase values, and we propose the following local descriptor:  b−1   2 k,b k arg(FA (u, v) ) | u ∈ [0, r], v ∈ [−s, s], u + sgn(v − 1) ≥ 0 Dr,s (A) = π (3) where b, r, s ≥ 0, k ∈ {1, 2}, FA (u, v) returns the spectral component at (u, v) of the image A (Figure 2.1),  k,band the arg function return values in the range [−π, π). (A) = r(2s + 1) + s, a descriptor can be encoded using Note that since D  Dr,s using bD bits, and that for k = 0 the values of the descriptor are not necessarily blur invariant; however as we will see, this particular case may turn convenient in some circumstances. It is also worth mentioning that all the local descriptors of the whole image can be efficiently computed with D convolutions with L × L sized 2d-filters, where L is the size in pixels of the neighborhood to be described. 2.2

Similarity Measure

Once it is possible to locally describe a rectangular portion of image with the proposed method, there is still the need for a fast and efficient way to compare the encoded templates and detect the right matches. Let’s denote with dA (u, v) k,b the element of Dr,s (A) evaluated at (u, v), and let’s define the function  x if |x| ≤ M f (x) = (4) 2M − x if |x| > M Considering that the values of the descriptors are essentially phase angles, using (2) with M = 2b−1 we have k

2πu |t| + 2πΔ ≈ f (|dLef t (u, v) − dRight (u, v)|) L

(5)

where, the term 2πΔ accounts for the phase-wrap. Ignoring the wrapping problem we introduce the following similarity measure between two templates A and B,

886

M. Pedone and J. Heikkil¨ a

m(A, B) =



f (|dA (j) − dB (j)|)

(6)

j: k,b dA (j) ∈ Dr,s (A)∧ k,b dB (j) ∈ Dr,s (B) and it is worth noticing that for 1 ≤ b ≤ 2, Equation (6) reduces to k,b k,b m(A, B) = H(Dr,s (A), Dr,s (B))

(7)

where H is the Hamming distance between two strings of bits. However, when avoiding any phase-unwrapping, several issues arise. The resulting value in (5) wraps for the first time when the original phase exceeds π, specifically at u=

L 2t · k

(8)

This suggests that in order to increase the reliability of (5), k should be as small as possible and L large. This creates a problematic multiple trade-off among reliability of the similarity measure, invariance to blur, and accuracy of disparity values, because of the well-known foreground fattening effect when increasing L [2]. However, in order to maximize the discriminative power, we set k = 1 observing that the phase values of an image convolved with a centrally symmetric PSF are the same as the ones of the original image, as long as the magnitude of the frequency spectrum is greater than zero. This is always true for a Gaussian kernel; anyway we model the PSF with a pillbox kernel of radius R, whose discrete-time Fourier transform is a two-dimensional periodic sinc function. It is possible to prove by basic calculus that the first zero of this function (that corresponds to the radius of the main lobe of the periodic sinc) is located at u=

L 2R

(9)

A totally analogous discussion could be done also for a PSF of linear motion blur. In our context, Equation (9) tells that the smaller the blur radius is (and 1,b eventually, the larger L is), the more values in Dr,s can be considered blur invariant. Another immediate consequence is that L ≥ 2R must be satisfied in 1,b . order to have the minimum amount of useful values in Dr,s 2.3

Disparity Map Estimation

Equation (8) instructs us that even in ideal circumstances, the proposed similarity measure can correctly detect displacements t such that t ≤ L2 . Moreover the behavior of m(A, B) becomes undefined when comparing totally different templates, and this is likely to generate false matches in the same scanline. Basing on these two observations, we reinforce the template matching process by adopting the common strategy of shiftable windows [2]; in particular at each spatial location we use five L2 -sized rectangular windows (in the the encoded images) in which the pixel of interest anchors the windows respectively to the center, top, bottom, left and right. The similarities of corresponding windows between the

Blur and Contrast Invariant Fast Stereo Matching

887

left and right images are computed and the minimum cost from the five comparisons is chosen. Finally we use a simple one-pass dynamic programming scanline optimization to assign disparity values respecting the ordering constrain.

3

Results

We tested the proposed algorithm against a conventional stereo method. For a fair comparison we used the same scheme for disparity-map estimation but 20

SAD proposed (UURU EDGSL[HOV

Error (%bad pixels)

25

20

15

(UURU EDGSL[HOV

22

5

7 9 11 13 0RWLRQíEOXUOHQJWK SL[HOV

15

10

5 3

15

16

SAD SURSRVHG

(UURU EDGSL[HOV

10 3

20 18 16

SAD SURSRVHG

5

7 9 11 13 0RWLRQíEOXUOHQJWK SL[HOV

15

SAD SURSRVHG

14 12 10 8

(UURU EDGSL[HOV

25

1.5

2 2.5 3 *DXVVLDQíEOXU σ

3.5

4

1

SAD SURSRVHG

20

15

20 (UURU EDGSL[HOV

14 1

1.5

2 2.5 3 *DXVVLDQíEOXU σ

3.5

4

SAD SURSRVHG

15

10

10 3 5 7 9 11 13 15 0RWLRQíEOXUOHQJWK3LOOER[GLDPHWHU SL[HOV

3 5 7 9 11 13 15 0RWLRQíEOXUOHQJWK3LOOER[GLDPHWHU SL[HOV

Fig. 2. Performance comparison between the proposed and SAD measures for ‘Cones’ stereo pair (left column) and ‘Tsukuba’ (right column)

888

M. Pedone and J. Heikkil¨ a 18.5 10

18

9

σ = 0.25 σ=1 σ=4

17

16.5

Error

Error

17.5

16 15.5 1

σ=1 σ = 0.25 σ=4

8

7 2

3

4

5

6

Bits used (b)

7

8

1

2

3

4

5

Bits used (b)

6

7

8

Fig. 3. Performance comparison between the proposed and SAD measures for ‘Cones’ stereo pair (left column) and ‘Tsukuba’ (right column)

Fig. 4. Some of the degraded images used respectively in the three experiments for the ‘Tsukuba’ and ‘Cones’ stereo pairs

replacing (7) with a sum-of-absolute-differences (SAD) for all the three color channels. A normalized version of SAD was used to reduce the sensitivity to contrast changes when necessary. Two stereo pairs were considered (Figure 4), and for each of them, one image was blurred with a different PSF at every run. Three different experiments were performed. In the first one robustness against motion-blur was tested. In the second one Gaussian blur has been considered, while in the last experiment different areas of the image have been degraded by constrast change and by motion and out-of-focus blur (with pillbox PSF) alternatively. Some of the degraded images that have been used are shown in Figure 4. During the aggregation phase, we used support windows of the same 2,2 for the first, size as the PSF. The descriptors used for the three tests were D2,1 1,2 and D2,1 for the remaining two; this way each neighborhood could be encoded in only 16-bits. The error measure used is the one proposed in [1] which is essentially a percentage of bad matching pixels in the final disparity map, where a bad disparity (measured in pixels) is assumed to be the one that differs from the ground truth for more than 1. Results are illustrated in Figure 2, and some depthmap computed are shown in Figure 5 for a visual comparison. In all the cases

Blur and Contrast Invariant Fast Stereo Matching

889

Fig. 5. Resulting disparities from the third experiment with blur factor set to 11 obtained using SAD (left column) and the proposed method (middle column). Ground truth disparities (right column).

considered the proposed method performed relatively better, although significant improvements in quality occurred mostly for larger amounts of blur, when our method produced in many cases error percentages between 5 and 9 percent better than SAD. We also tested the sensitivity to the number of bits used for phase quantization. In particular three different amount of Gaussian blur were applied to the two stereo pair considered, and the accuracy of the final disparities were computed by letting the parameter b vary (Figure 3). It is interesting to notice that the percentage of bad pixels in the final depth-maps is fairly constant, and this justifies the use of the smallest values for b.

4

Conclusion and Future Work

We described a novel method to compute disparity maps stereo pairs degraded by centrally-symmetric blur and contrast change. The algorithm runs fast also with a naive implementation. Each neighborhood of the image can be efficiently described with a limited number of bits, and the convolutions necessary to compute the local descriptors can be easily performed in a GPU, as well as the whole aggregation phase, opening the concrete possibility for a real-time implementation. The method proved to be relatively robust to the considered types of degradations in comparison to conventional fast approaches; however it should be eventually integrated in a multiscale approach in order to attempt to handle the cases in which the amount of blur is unknown. The similarity measure presently used has limitations that have been discussed; we believe that further improvements in this directions could yield significant results.

890

M. Pedone and J. Heikkil¨ a

References 1. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, USA (2007) 2. Hirschm¨ uller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, Minnesota, USA (2007) 3. Flusser, J., Suk, T.: Degraded image analysis: an invariant approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 590–603 (1998); CRAS Paris 287, 1013–1015 (1978) 4. van de Weijer, J., Schmid, C.: Blur robust and color constant image description. In: Proceedings of ICIP, Atlanta, USA (2006) 5. Ojansivu, V., Heikkil¨ a, J.: Image registration using blur invariant phase correlation. IEEE Signal Processing Letters 14(7), 449–452 (2007) 6. Ogale, A.S., Aloimonos, Y.: Robust contrast invariant stereo correspondence. In: Proc. IEEE Conf. on Robotics and Automation, ICRA (2005) 7. Tsing, Y., Kang, S.B., Szelinski, R.: Stereo matching with linear superposition of layers. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(2), 290–301 (2006) 8. Rajagopalan, A.N., Mudenagudi, U.: Depth estimation and image restoration using defocused stereo pairs. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(11), 1521–1525 (2004) 9. Frese, C., Gheta, I.: Robust depth estimation by fusion of stereo and focus series acquired with a camera array. In: IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, pp. 234–248 (2006) 10. Wang, L., Gong, M., Yang, R.: How far can we go with local optimization in realtime stereo matching. In: Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission, pp. 129–136 (2006) 11. Curtis, S.R., Lim, J.S., Oppenheim, A.V.: Signal reconstruction from fourier transform sign information. Technical report 500, Massachusetts Institute of Technology. Research Laboratory of Electronics (1984)