RECONSTRUCTION OF DENSE POINT CLOUD FROM UNCALIBRATED WIDEBASELINE IMAGES Yanli Wan, Zhenjiang Miao, Zhen Tang Institute of Information Science, Beijing Jiaotong University, Beijing 100044, P.R. China ABSTRACT This paper presents a new approach to reconstruct 3D dense point cloud from uncalibrated wide-baseline images. It includes three steps: acquiring quasi-dense point correspondences, recovering structure from motion, and reconstructing the 3D dense point cloud. We present a two level propagation algorithm. The first level is implemented in the 2D image space and the second level is in the 3D scene space. We use affine iterative model to acquire accurate quasidense correspondences, and iterative optimization to recover the camera parameters and the 3D scene structure. It increases the robust and accuracy of self-calibration. In the second level propagation, the strategy of view selection and local photometric consistency are used to minimize the effects of occlusions and highlights etc. We demonstrate our algorithm with some high-quality reconstruction examples. Index Terms— quasi-dense correspondences, selfcalibration, point cloud reconstruction 1. INTRODUCTION In recent years, the reconstruction of 3D geometry from images has been a highly active research field and several top-performing algorithms have been proposed. However, many researches take a set of calibrated images captured from different viewpoints around the object as input. They may not be feasible to the scene dataset which are taken with a hand-held digital camera by freeing to move in the scene and to zoom in and out. So our research focuses on the reconstruction of 3D dense point cloud from uncalibrated wide-baseline images, which is the key to get a high accuracy 3D geometry. The main contribution of this paper is presenting an algorithm to reconstruct 3D dense point cloud from a set of uncalibrated wide-baseline images. There are three modules: acquiring quasi-dense point correspondences, restoring structure from motion and reconstructing the 3D dense point cloud. Our algorithm has the following characteristics: z We present a two level propagation algorithm used in the 2D image space and in the 3D scene space respectively. The first level propagation improves the robust and accuracy of self-calibration by using the accurate
978-1-4244-4296-6/10/$25.00 ©2010 IEEE
1230
quasi-dense correspondences instead of sparse feature matches. The introduction of the affine iterative model improves the accuracy of quasi-dense correspondences. These quasi-dense correspondences are regularly distributed in the images after filtered, and the redundant quasi-dense correspondences are discarded using multi-view relations. z The iterative optimization strategy is used to refine the camera parameters and the 3D scene structure in order to further improve the accuracy of self-calibration. And the two kinds of parameters are refined separately using different methods. z In the second propagation, the 3D scene space expanding performs from quasi-dense 3D scene structure. The strategy of view selection and local photometric consistency is used to minimize the effects of occlusions, highlights and so forth. The rest of this paper is organized as follows. After reviewing the related works in Section 2, we introduce the first level propagation in the 2D image space and describe how to design the sampling model and filter for acquiring the regular and robust quasi-dense correspondences in section 3. Section 4 describes how to refine the camera parameters and the 3D scene structure, and how to reconstruct the dense point cloud by the second level propagation in 3D scene space. Experimental results are given in section 5. Section 6 concludes the paper. 2. RELATED WORKS In this section, we describe the most closely works with us, focusing on quasi-dense correspondences extraction, camera parameters optimization, and multi-view stereo reconstruction. The quasi-dense correspondence was first presented by Lhuillier and Quan’s [1]. They have proved that the quasidense correspondences can produce more robust and accurate reconstruction results. Different from their work, we introduce an affine iterative model in the first level propagation, which can improve the accuracy of quasi-dense correspondences. In addition, self-adaptive RANSAC algorithm is used to discard outliers before regular resampling in the image space. The redundant quasi-dense correspondences are further filtered with multi-view relations.
ICASSP 2010
The SFM [2] algorithm aims to simultaneously recover the camera parameters and the 3D scene structure. Bundle adjustment is usually used in the final step of reconstruction. Since each camera has 11 degrees of freedom and each 3D space point has 3 degrees of freedom, a reconstruction involving n points over m views requires minimization over 3n+11m parameters. As m and n increase bundle adjustment algorithm becomes extremely costly, and eventually impossible. So we optimize the camera parameters and the 3D scene structure separately. The multi-view stereo matching and reconstruction has received high attention recently. According to the taxonomy of Seitz et al. [3], multi-view stereo algorithms can be roughly categorized into four classes: methods that extract an optimal surface from its volume [4], surface evolution techniques [5], image-space algorithms that computer and merge a set of depth maps [6, 7], and approaches that fit a surface from a set of keypoints or seed points [8, 9]. Our algorithm belongs to the last category. The following discussion focuses on some closely works. Goesele et al. [6] proposed an algorithm which employs on robust statistics and adaptive view selection to produce reconstruction results of high quality. Although their method gives us some impressive results, there is a long processing time when the region growing is performed with view selection at pixel level. Furukawa and Ponce [8] proposed a novel algorithm for multi-view stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images. Our work is closely related to theirs. The key difference is that the algorithm of region growing starts from a sparse set of matched keypoints in their work while from a quasi-dense set of point correspondences in ours. Their research is based on the calibrated images while our algorithm based on the uncalibrated image sets. In addition, we combine the strategy of view selection and local photometric consistency to minimize the effects of occlusions, highlights and so forth. 3. QUASI-DENSE CORRESPONDENCES The quasi-dense correspondences are acquired by performing the first level propagation and resampling in the 2D image space. The main steps can be summarized as follows: z match the SIFT [10] keypoints between each reference image R and its neighboring image set VR , and compute the fundamental matrix using self-adaptive RANSAC algorithm [12]. z perform the first level propagation in the image space. z reconstruct the quasi-dense correspondences with the resampling model, and reestimate the fundamental matrix of two images using the quasi-dense correspondences. z perform the first level propagation again and resampling with reestimated fundamental matrix. The aim is to improve the robust of quasi-dense correspondences.
1231
z
filter the redundant quasi-dense correspondences using multi-view relations.
3.1. The First Level Propagation In order to acquire higher accuracy of quasi-dense correspondences, the affine iterative model is used in the first level propagation. So every initial matching seed not only gives pixel locations {x, x ' } in the image space, but also points out local affine transformation A0 between the matching regions. Initial seeds are determined using the HessianAffine invariant Region detectors [11] and SIFT region descriptors. The ZNCC (Zero-mean Normalized Cross Correlation) scores are computed for every seed and maintain a prioritized queue Q as candidates. In the first level propagation, the affine transformation of 4-neighborhood of every seed is initialized by the corresponding values of the seed. Considering the illumination effect, the matching between corresponding points can be represented as: P I 2 ( Ax d ) G I1 ( x) (1) where P depends on the reflection angle of light source, G depends on the camera gain. We can compute the best match by minimizing the residual:
H
¦ [(P I
xW
2
2
( Ax d ) G ) I1 ( x)]
(2)
This function is solved by the first order Taylor expansion at A A0 , d 0 , u 1 , and G 0 . 3.2. Design of Resampling Model and Filter
The design of resampling model and filter are very important in these steps. Each reference R is divided into a regular square grid of E1 u E1 ( E1 16 in our experiments) pixel cells CR ( x, y ) . The quasi-dense correspondences are reconstructed by the center of each cell in the reference image. Their matching points are located by fitting local homographise H R ( x, y, I ) ˈwhere I VR , and they are registered in the correspondence cell of VR . These local homographise H R ( x, y, I ) are robustly estimated by using the selfadaptive RANSAC algorithm and plenty of outliers will be discarded before regular resampling in the image space. There are a lot of redundant quasi-dense correspondences because we resample in each reference image. It is necessary to filter them with multi-view relations. We sort a list of correlation score of correspondences which include quasi-dense points and keypoints in every cell CR ( x, y ) of each R . The redundant correspondences are discarded by setting the threshold K1 ( K1 8 in our experiments) for the number of the correspondences in CR ( x, y ) . In addition, each cell CR ( x, y ) is subdivided into some regular square
grids of E 2 u E 2 pixel cells CR4 ( x, y ) . These cells CR4 ( x, y ) are used to help the second level propagation.
above factors, and we regard p as valid if the element number of V pv (i.e. V pv )is larger than X ( X ments) [8].
4. DENSE POINT CLOUD RECONSTRUCTION
V pv
There are two steps to reconstruct the 3D dense point cloud, the quasi-dense point cloud reconstruction and the second level propagation in the 3D scene space.
In this section, camera parameters and scene structure are separately optimized and iteratively switched. View selection and local photometric consistency are introduced to refine the position and direction of quasi-dense points. Bundle adjustment algorithm is only used to optimize the camera parameters in our method.
{I | I V p , NCC( p, R, I ) ! [ }
(¦ I V v NCC( p, R, I )) V pv p
where I V pv R .
V as follow:
4.2. The Second Level Propagation
S p (V )
¦ w( f , R,V ), i
i 1
w( fi , R,V )
(3)
where fi is a correspondence of the image R and V . D is an angle between the lines from fi in the R and V to p ( D min 10$ , D max 60$ in our experiments). Scores S p (V ) are sorted, and then a set of V p consisting
of the first O ( O 8 in our experiments) images which have the highest scores is found. So we can acquire a set of images V p for each quasi-dense point. 4.1.2. Photometric Consistency For each p , we reproject its 3D location into all images in V p and compute NCC( p, R, I ) ( I V p )between a m u m window centered on p and corresponding windows centered on the projections in each images of V p . We expect to get a high NCC score for each p . However, the NCC score would be low in the presence of specular highlights, occlusions or other factors. Therefore, we discard I when there is a small NCC score for minimizing the affects of
1232
(5)
4.1.3. Position and Direction Optimization We optimize p by finding the highest photometric consistency score. For the position of each p , we march along its projected ray with step size d . The direction of its tangent plane rotates respectively around the two perpendicular lines with intersection point of p . The initial direction of one of the line parallels the x-axis of R . The final angel was determined by re-running the algorithm with step size Z of two directions. If p has the highest photometric
4.1.1. View Selection Each 3D quasi-dense points and its normal is denoted * )* * as p( x, y, z , x, y, z ) . Every point’s normal is initialized ))))& as pOR , where OR is the optical center of reference image. A set of images in which p is visible is initialized as VR . VR is searched by setting a threshold D1 for the number of correspondences between the neighboring image and reference image. We regard VR as a candidate set. In order to get a good reconstruction, each image should have a large baseline and have enough number of correspondences with reference image. So, we compute a score S p for each image D min D D max 1 ® others ¯ D / D max
(4)
where [ ( [ 0.6 in our experiments) is the threshold of NCC score. The mean of the NCC scores of all images in V p is the photometric consistency score: C ( p)
4.1. Quasi-dense Point Cloud Reconstruction
N
3 in our experi-
consistency score and it is valid (i.e. V pv ! X ), we save it to the correspondence cells CI4 ( x, y ) (Section 3) of I , In addition, we need to determine that which p can be used to the following second level propagation in this section. We simply reserve one valid p with the highest photometric consistency score in the same cell.
The second level propagation includes three steps: determining the neighboring cells N ( p ) , initializing the new p , and optimizing the new p . The goal of propagation is to find a p in every cell CI4 ( x, y ) . We use the continuous cell in the image to perform the second level propagation. For each valid p , we determine the neighboring cells N ( p ) by collecting all the neighboring cells in V pv . If a cell already contains a valid p , it is removed from N ( p) . We extract the center of each cell in set N ( p) as keypoints. The direction, reference image and set of visible images of each keypoint is initialized the same as p . The position of each keypoint is estimated by the following steps. First, the correspondence cell CR ( x, y ) is identified. Second, the correspondence points are located by fitting local homographise H R ( x, y, I ) ( I V pv ˈ Section 3.2) in their neighboring image set V pv . At last, the position of p is estimated by those correspondence points.
Of course, it is not perfect that we simply consider that if the cell in the image is continuous, the object surface is also continuous. So, we compute the distance from the new estimated p to the old p , and the new p is discarded if the distance is larger than threshold d . The photometric consistency scores are computed to optimize the new p . We also restrict p marching along its projected ray. The optimizing process is similar to the method of Section 4.1.
the 3D scene space. The affine iterative model is introduced into the first propagation. An iterative optimization algorithm is used to recover the camera parameters and the 3D scene structure. It improves the robust and accuracy of selfcalibration. In the second level propagation, the strategy of view selection and local photometric consistency are used to minimize the affects of occlusions and highlights etc. The experiments prove that our algorithm can get the accurate result. 7. ACKNOWLEDGMENTS
5. EXPERIMENT
Figure 1 shows the example images from the bust sets [1] and some results of our experiments.
This work is supported by National 973 Key Research Program of China (2006CB303105), National 863 High Technology Research and Development Program of China (2009AA01Z334), National Natural Science Foundation of China (60973061) and (60803073), Innovative Team Program of Ministry of Education of China (IRT0707), Beijing Education Commission (YB20081000401), and Technological Innovation Fund of Excellent Doctoral Candidate of Beijing Jiaotong University (141047522). 8. REFERENCES
(a)
(e)
(b)
(f)
(c)
(g)
(d)
(h)
Figure 1 Top: Example images from the bust sets. Middle: (a) and (b) are the SIFT keypoints. (c) and (d) are the quasidense correspondences after the first level propagation and resampling. Bottom: (e) and (f) are the 3D quasi-dense points; (g) and (h) are the 3D dense point cloud which are acquired by performing the second level propagation. In this experiment, the resolution of images is 480 u 640 . The number of correspondences between the adjacent images is 320. The quasi-dense correspondences are 98528 after the first level propagation and 2875 resampled. The AED (Average Epipolar Distance) is 0.3063 after resampled. The number of 3D quasi-dense points ((e) and (f)) is 4516 by integrating the information of silhouette. The number of 3D dense points ((g) and (h)) after the second level propagation is 64683. 6. CONCLUSION
This paper presents a new method to reconstruct the 3D dense point cloud from a set of uncalibrated images by performing the two level propagation in the image space and
1233
[1] M. Lhuillier and L. Quan, “A quasi-dense approach to surface reconstruction from uncalibrated images,” PAMI, vol. 27, no. 3, pp. 418–433, 2005. [2] Snavely N., Seitz S.M., Szeliski R.S., “Modeling the World from Internet Photo Collections,” IJCV, vol.80, no. 2, November 2008, pp. 189-210. [3] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction algorithms,” In Proc. CVPR, pages 519–528, 2006. [4] G. Vogiatzis, C. H. Esteban, P. H. S. Torr, and R. Cipolla, “Multi-view stereo via volumetric graph-cuts and occlusion robust photo-consistency,” In PAMI, 2007. [5] P. Gargallo, E. Prados, and P. Sturm, “Minimizing the reprojection error in surface reconstruction from images,” In ICCV, 2007. [6] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M Seitz, “Multi-view stereo for community photo collections,” In ICCV, 2007. [7] D. Bradley, T. Boubekeur, W. Heidrich, “Accurate multi-view reconstruction using robust binocular stereo and surface meshing,” In Proc. CVPR, 2008, pages 1-8. [8] Y. Furukawa and J. Ponce, “Accurate, dense, and robust multiview stereopsis,” In Proc. CVPR, 2007. [9] P. Labatut, J.-P. Pons, and R. Keriven, “Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts,” In ICCV, 2007. [10] Lowe D G, “Distinctive image features from local scaleinvariant keypoints,” International Journal of Computer Vision, 2004, 2(60): 91-110. [11] http://www.robots.ox.ac.uk/~vgg/research/affine/index.html [12] Richard Hartley, Andrew Zisserman. Multiple View Geometry in Computer Vision, 2 edition. Cambridge University Press, March 2003.