Incremental 3D Model Generation using Depth ... - Semantic Scholar

Comment

Report 1 Downloads 183 Views

Incremental 3D Model Generation using Depth Cameras Joshua Blackburn, Dan Kubacki, and John Stratton Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL 61801 {jblackb2,kubacki1,stratton}@illinois.edu

Abstract The ability to directly acquire 3D scene information from a single viewpoint could have a large impact on many computer vision applications. Depth cameras are a new sensing device facilitating such data acquisition, but are limitated by occlusion, noise, and poor sampling of obliquely viewed surfaces. To extend the applicability of a single depth camera, we propose a method of retaining knowledge of surfaces from depth camera images acquired over time. This method draws on techniques from computer vision and computer graphics to reconstruct a 3D point cloud from a depth image, smooth and consolidate the point cloud into a good model, estimate surface normals of the model points, and register each new depth image to the model for tracking and model update.

1

Introduction and motivation

A new sensing modality is coming to market which is poised to change computer vision of the future. It is a camera that records the depth of a scene at each pixel. Many computer vision techniques exist which try to recover depth information based on single and multi-view geometry, but these are computationally expensive and prone to error in areas of low texture. The depth camera actively measures depth and thus is not affected by low texture and can operate in real-time. These improvements allow depth to be used for more than just robust 2D techniques such as segmentation and tracking, but also to expand image processing into 3D. One major limitation of a depth image is that it is limited by the point of view of the camera. Instead of capturing a full 3D representation of a scene, it can only capture what is often referred to as a 2.5D representation. This limitation can be easily overcome through the use of multiple depth cameras which cover multiple viewpoints, but this camera setup is impractical for the average user who often only has access to one depth camera and possibly one or more color cameras. In this project, we have taken initial steps towards a system which can “remember” 3D shape information over time and build up a rough object model in order to augment the real-time depth information coming from the camera. This additional 3D information can be used in many applications, especially when depth cameras are used in conjunction with one or more color cameras. The color cameras will be positioned at viewpoints different than the depth camera; thus, they can see parts of the scene which are not visible to the current depth image. Often, these viewpoint changes are small, but it is still difficult to recover depth information for the areas unseen by the depth camera. These areas are occluded from the depth camera because they are near the 2D boundaries of the object in the depth image. Noise is significant at these edge boundaries, making it difficult to estimate curvature. For objects that are moving, this problem can be solved by remembering the 3D structure of the occluded areas from a previous time when they were visible to the depth camera. This prior information can be 1

Depth Camera Image

Digital Model Point Cloud

Preprocessing and 3D Projection

Depth Data Point Cloud

Surface Normal Estimation

Registration & Alignment

Smoothing & Consolidation

Figure 1: Algorithm flow diagram

integrated into the current depth image in order to extend the 3D information available to depth camera applications.

2

Algorithm

Similar to [1], the goal of this algorithm is to create a single, consolidated model from depth images taken iteratively from multiple viewpoints. Like many computer vision applications, the type of model representation chosen depends both on the type of data provided and the expected uses of the model. As the input point cloud comes from a depth camera, model generation needs to be robust to noise. Specifically, an explicit or implicit surface needs to be defined prior to surface normal estimation. Because the model is created iteratively, the output needs to remain a discrete point set and needs to be uniformly sampled regardless of the amount of overlap between consecutive views. We considered multiple algorithms to define our surface. Meshes are the most common representation of surfaces; however, they are not easily updated in an iterative manner. Most algorithms that create an implicit surface from a point cloud were not able to provided a discrete, uniformly sampled point cloud output without multiple iterations of nonlinear optimization or preexisting surface normals [2–6]. Therefore, we decoupled the problems of uniform sampling and surface estimation. We chose [7] as a way to uniformly smooth points onto a 3D surface and fit planes to local neighborhoods to define the surface. Our overall approach is shown graphically in Figure 1. In the steady state, we maintain a model of unorganized 3D points representing the surface of an object. When a new frame of depth data arrives, it is first preprocessed and projected into 3D space. Currently, the only preprocessing done is to remove outliers which appear along edges. The preprocessed and 3D-projected data is then registered to the model surface using an ICP algorithm [8], with extensions and modifications to account for incomplete surfaces represented by the model and depth data. To facilitate this registration, surface normals for each point of the model are estimated by fitting a plane to local patches. Once the registration is completed, the model is rotated and translated into the coordinate system of the new depth data. The two surfaces together are used to generate a smoothed point cloud representing both the surface recorded by the old model and the new, noisy depth data. This smoothing and consolidation step uses a weighted, locally-optimal projection algorithm as outlined in [7]. The resulting point cloud is recorded as the new model representing the knowledge about the surface seen thus far. The entire process begins with no model, and is initialized by consolidating and smoothing the first frame of depth data directly.

3

Point cloud from depth image

The first step of our algorithm is a preprocessing of the depth images to convert them to 3D points and to remove edge noise. Our test depth images come from the Stanford bunny downloaded at [9]. The bunny was rendered in OpenGL and the z-buffer frame was captured as a depth image. Because the OpenGL z-buffer is inaccurate under perspective warping, the depth images come from an orthographic projection of the bunny onto the camera. Thus each pixel represents the z component 2

Figure 2: Illustration of pinhole camera depth measurement. Similar triangles come from a projection of the point P onto the xz plane. of a 3D point. The x and y components are recovered by similar triangles, as shown in Figure 2. xi =

zi ∆ui f

yi =

zi ∆vi f

In general, each pixel of a depth camera gives the radial distance to an object not the z component, but it is still possible to recover the 3D points by similar triangles. Edge noise is a characteristic of real depth data caused by the averaging of depth values along sharp edges. This causes spurious 3D points whose location is somewhere between the two objects. We blurred and downsampled our synthetic data in order that it demonstrate the edge artifacts of real depth cameras. Due to the accumulating nature of our proposed algorithm, we can disregard 3D points that are on strong edges with the assumption that we will see these points in subsequent views when they are not on an edge. We process edge noise while still in the image domain by finding strong edges and excluding these pixels from the 3D point cloud. These “cleaned” point clouds are then passed to the registration step of the algorithm.

4

Surface normal estimation

The surface normals are estimated by fitting a plane to the points in a neighborhood around each point. At each point, we find the k closest neighbors within some maximum ǫ of each point. For simplicity we estimate that the centroid of these points is a point on the plane. The centroid is defined as k 1X c= p k i=1 i The normal vector of the plane is chosen as the direction of minimum variance between the centroid and each point. This direction can be found by solving X ˆ = argmin n ((pi − c)T n)2 subject to ||n|| = 1. n

i

P The solution to this equation is the eigenvector of i (pi − c)(pi − c)T corresponding to the smallest eigenvalue. Other methods for estimating the surface normal were considered such as fitting an algebraic sphere to the k closest points [4], but through experimentation these produced results similar to the above normal estimation and the simpler method was thus chosen for this project.

5

Registration

The goal of registration is to align the previous model and new data. In our case, the previous model is a point cloud with associated surface normals and the new data is a point cloud without associated 3

surface normals. We seek to align the previous model to the new data using rigid body rotation and translation. To do this, we employ the Iterated Closest Point (ICP) algorithm [8, 10]. Our ICP algorithm iteratively performs the following steps until convergence: 1. Select 5% of the points in the new data set. 2. Match each point to the point in the model that is closest to it by Euclidean distance. 3. Reject pairs whose model point does not contain a surface normal or whose distance is greater than 0.005. 4. Construct the point-to-plane error metric as described below. 5. Minimize the error metric. 6. Apply the optimal rotation and translation. Most of these parameters where chosen because they provided the system with the best solution in the quickest time. We defined convergence empirically as the best transform found within the first 15 iterations. The most significant choice is using the point-to-plane error metric [8]. This metric quantifies the idea that the distance between two parallel planes which translate relative to one another does not change. Therefore, the only distance that matters is the distance along the normal vectors. For matched points (pi , q i ) in the new data and model respectively with the normal vector ni associated with the model, define the rotation and translation as i2 Xh T ˆ ˆt = argmin R, (Rpi + t − q i ) ni . R,t

i

This minimization is not quadratic in the arguments because R is a rotation matrix which contains trigonometric functions. The Rodrigues angle formula provides a decomposition of a the rotation matrix as R = (1 − cos θ)zz T + cos θI + sin θ[z]× (1) where z is a unit vector denoting the axis of rotation and θ is the angle of rotation [11]. Using the small angle approximation, this can be written as R ≈ I + θ[z]× . Using this decomposition and the properties of the triple scalar product, the minimization becomes X 2 (pi − q i )T ni + nTi t + (pi × ni )T r rˆ , ˆt = argmin r,t

i

where r = θz. This minimization is now quadratic in the unknowns and can be solved by standard least squares minimization. The components of r can be recovered given that z is a unit vector. The rotation matrix from the new data to the model can thus be calculated as (1). In our case, we desire the rotation matrix from the model to the new data, which is R−1 . The translation from the model to the new data is −R−1 t. The small angle approximation is valid because in the initial iterations, the error caused by incorrect matches is more than the error caused by the approximation, and in the final iterations, the angle is small [8].

6

Point cloud consolidation and smoothing

The process of consolidating the new depth data points into a good model of the surface they represent shares many attributes with previous work on generating digital surfaces from laser range scans of physical objects. However, for this work we need not generate a renderable object model with a mesh structure and oriented normals. A literature survey yielded a recent paper describing an iterative algorithm for a Weighted, Locally-Optimal Projection (WLOP) of a set of points onto the implicit surface defined by a noisy point cloud [7]. This section summarizes our implementation of that algorithm, as adapted to the situation of generating a single smooth surface from two other point clouds, one noisy and one previously smoothed, Conceptually, the point consolidation algorithm of [7] creates a set of 3D points X , and then iteratively updates the positions of those points to more faithfully represent a uniform sampling of the 4

surface underlying the (potentially noisy and nonuniformly sampled) input point cloud P. The update step computes the distances from each point in X to each point in P, contributing an attracting force towards each point in P that falls off with distance, to move or keep each point in X on the implicit surface. In addition, the distances between each pair of points in X is computed, and a repulsive force (also falling off with distance) is established between them, so that the points evenly spread out over the surface. Specifically, given the state of X at the current iteration, X i with index set I, and the point cloud P with index set J , the iterative update X i+1 is chosen to minimize XX X kxi − pj kθ(kxki − pj k)/vj + µλi −kxi − xki′ kwik θ(kxki − xki′ k), i∈I j∈J

i′ ∈I\{i}

2

2

where k.k is the L2-norm, θ(r) = e−r /(h/4) is a rapidly decreasing smoothing function with support radius P controlling the magnitude of the repulsion “step Ph, and 0 < µ ≤ 0.5 is a parameter size.” vj = j ′ ∈J θ(kpj − pj ′ k) and wik = i′ ∈I θ(kxki − xki′ k) are weight terms to give data points in densely sampled regions lower attraction weights and model points more repulsion weights in densely modeled regions, respectively. The regularization parameter λi P k θ(kxki − pj k) θ(kxki − xki′ k) j∈J αij /vj k k , α = λi = P , β ′ = ij ii k k kxki − pj k kxki − xki′ k i′ ∈I\{i} wi′ βii′ is a carefully chosen balancing term to make the iterative update formulas derived from this equation work out nicely and provably reach an O(h2 ) approximation of the underlying surface [12]. The to obtain the desired update step minimization equation is solved for xk+1 i xk+1 = i

X

j∈J

pj P

k αij /vj

k j ′ ∈J (αij ′ /vj ′ )

X

+µ

k wik′ βii ′ . k k (w ′′ i′′ βii′′ ) i ∈I\{i}

(xki − xki′ ) P

i′ ∈I\{i}

For our experiments, we set h to 0.008, µ to 0.4, and performed ten update iterations for every consolidation. On startup, when no previous model was present, we chose X 0 to be 50% of the depth data points taken at random, and P to be the full set of depth data points. In the steady state, consolidation of the old model and the new depth points is done by choosing X 0 as 90% of the old model points plus a number of points from the new depth data corresponding to 20% of the number of points in the old model, and P as the combined set of entire old model and new depth data points. In order to limit the size of the output model, we limit the size of X to 2000 points. If the creation of X creates a set with more than 2000 points, then 2000 points are randomly selected and the rest are discarded.

7

Results

We tested our algorithm using the Stanford bunny as described in Section 3. The received depth image and corresponding model, along with the preprocessed data to remove edge artifacts, are shown in Figure 3. 7.1

Noiseless data

In our first test, we directly used this noiseless data as the inputs to our algorithm. From the current estimate of the model, we computed the surface normals for each point as shown in Figure 4(a). This shows the the surface normal estimation is able to accurately estimate the surface normals of the point cloud. The ICP algorithm was used to register two point clouds with initial misalignment of 30 degrees, as shown in Figures 4(b) and 4(c). One of the point clouds is displayed as it corresponding mesh for ease of visual comparison; however, the point cloud, not the mesh, was used in the computations. We ran the complete algorithm with new depth images every 30 degrees around the bunny. The created model is shown in Figure 5. The uniform distribution of the points around the entire bunny demonstrate that the algorithm is able to properly consolidate the various point clouds together. 5

0.025 0.025

0.02

0.02 20

0.015

20

0.015

0.01 0.01

40

40

0.005 y

y

0.005 60

0

60

0

−0.005

−0.005

80

80

−0.01

−0.01 100

100

−0.015

−0.015

−0.02

−0.02

120

120

−0.09

−0.08

−0.07

−0.06

140

−0.05

−0.03

−0.02

−0.01

0

0.01

0.02

−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

140

−0.01

−0.02

x

x 20

40

60

80

100

120

z

140

(a) Raw depth

20

(b) Raw model

40

60

80

100

120

z

140

(c) Clean depth

(d) Clean model

Figure 3: Received depth images and corresponding models

0.025 0.02

0.02 0.015

0.015 0.025

0.01

0.01

0.02

0.005

0.005

0.01 0.005

0

0

y

y

0.015

−0.005

y

−0.005

0

−0.01

−0.01

−0.015

−0.015

−0.005 −0.01 −0.015

−0.02

−0.02

−0.02

−0.025 −0.08

−0.07

−0.06 z

−0.05

−0.04 −0.03

−0.02

−0.01

0

−0.025

−0.03 −0.025 −0.02 −0.015 −0.01 −0.005 x

0.01

x

(a) Surface normal estimation

0

0.005

0.01

0.015

0.02

−0.025 −0.02 −0.015 −0.01 −0.005 x

(b) Initial ICP alignment

0

0.005

0.01

(c) Final ICP alignment

Figure 4: Verification of the surface normal estimation and ICP registration.

0.025

0.025

0.02

0.015

0.015

0.01

0.01

0.005

0.005

y

y

0.02

0

0

−0.005

−0.005

−0.01

−0.01

−0.015

−0.015 −0.02

−0.02 −0.03 −0.025 −0.02 −0.015 −0.01 −0.005 x

0

0.005

0.01

0.015

0.02

0.02

(a) Front view

0.015

0.01

0.005

0

−0.005 −0.01 −0.015 −0.02 −0.025 −0.03 x

(b) Back view

Figure 5: Reconstructed model from noiseless depth images every 30 degrees.

6

0.015

0.02

0

0.01

0.025

0.025

0.02

0.02

0.015

0.015

0.01

0.01

0.005

0.005 0

y

y

0 −0.005

−0.005

−0.01

−0.01

−0.015

−0.015

−0.02

−0.02

−0.08

−0.08

−0.07

−0.07

−0.06

−0.06

−0.05

−0.05 −0.04 z

−0.03

−0.02

−0.01

0.01

0

−0.04 −0.03

z

x

(a) Noiseless data

−0.02

−0.01

0

0.01

x

(b) Noisy data

Figure 6: A typical point cloud created from a single depth image before and after the addition of noise. 0.03

0.03

0.025

0.025

0.02

0.02

0.015

0.015

0.005 y

0.01

0.005 y

0.01

0

0

−0.005

−0.005

−0.01

−0.01

−0.015

−0.015

−0.02

−0.02

−0.03 −0.025 −0.02 −0.015 −0.01 −0.005 x

0

0.005

0.01

0.015

0.02

0.02

(a) Front view

0.015

0.01

0.005

0

−0.005 −0.01 −0.015 −0.02 −0.025 −0.03 x

(b) Back view

Figure 7: Reconstructed model from noisy depth image every 30 degrees.

7.2

Noisy data

For the second test, we added Gaussian noise to the depth images. A point cloud created from a single noisy depth image with its edge noise removed is shown in Figure 6. The reconstructed model from depth images every 30 degrees is shown in Figure 7. The reconstruction for noisy data is nearly as good as with noiseless data.

8

Conclusion

Through this project, we demonstrated that it is possible to create a good 3D model from incremental depth data. This model overcomes the 2.5D limitation of depth cameras and opens doors for new applications which require more 3D knowledge. It is specifically well suited for scenarios that employ one depth camera with multiple color cameras. Future work on this project centers in three areas. The first is to ensure that the algorithm still works on real depth data. As we have already modeled both noise and the edge-averaging effects of depth cameras, this will probably only consist of parameter changes. The second is to work on the speed of the algorithm. At this point, the algorithm is far from real time. The speed can be improved in 7

two ways. The first is to incorporate algorithm changes such as changing the matching algorithm in the ICP or using a space partitioning such as a k-d tree. The second is to exploit the parallel programming techniques available on graphics processor units. Most of the algorithms used in this problem are data parallel and would map to these types of architectures very well. The third is to extend the registration algorithms to work with deformable objects.

References [1] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy, “Real-time 3D model acquisition,” in SIGGRAPH, 2002. [2] D. Levin, Mesh Independent Surface Interpolation. Springer, 2004, pp. 37–50. [3] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C. T. Silva, “Computing and rendering point set surfaces,” IEEE Transactions on Visualization and Computer Graphics, vol. 9, no. 1, pp. 3–15, 2003. [4] G. Guennebaud and M. Gross, “Algebraic point set surfaces,” in SIGGRAPH, 2007. [5] N. Amenta and Y. J. Kil, “Defining point-set surfaces,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 264–270, 2004. [6] C. Oztireli, G. Guennebaud, and M. Gross, “Feature preserving point set surfaces based on non-linear kernel regression,” Computer Graphics Forum, vol. 28, no. 2, 2009. [Online]. Available: http://vcg.isti.cnr.it/Publications/2009/OGG09 [7] H. Huang, D. Li, H. Zhang, U. Ascher, and D. Cohen-Or, “Consolidation of unorganized point clouds for surface reconstruction,” ACM Transactions on Graphics, vol. 28, no. 5, p. 176, 2009. [8] S. Rusinkiewicz, “3D scan matching and registration: Rigid body alignment,” presented at the International Conference on Computer Vision, 2005. [9] Stanford University Computer Graphics Laboratory, “The Stanford 3D scanning repository,” http://graphics.stanford.edu/data/3Dscanrep/, May 2010. [10] S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” in 3D Digital Imaging and Modeling, 2001. [11] D. Koks, Expolorations in Mathematical Physics. Springer, 2006. [12] Y. Lipman, D. Cohen-Or, D. Levin, and H. Tal-Ezer, “Parameterization-free projection for geometry reconstruction,” ACM Transactions on Graphics, vol. 26, no. 3, p. 22, 2007.

8

Recommend Documents