A Robust Method for Shape-based 3D Model Retrieval Yi Liu, Jiantao Pu, Guyu Xin, Hongbin Zha National Laboratory on Machine Perception Peking University, Beijing 100871, P.R. China
[email protected], {pjt, zha}@cis.pku.edu.cn
Weibin Liu1, Yusuke Uehara2 Internet Application Lab., Fujitsu R&D Center Co., LTD.1 Language Processing Lab., Information Technology Media Lab., Fujitsu Laboratories LTD.2
Abstract Proliferation of 3D models necessitates developing efficient methods for indexing or retrieving the models in a large database. Many previous methods for this purpose defined functions on concentric spheres as approximation of 3D geometry for spherical harmonic transform (SHT). In this paper, we point out that this is not robust as the surface of a model may shift between different shells under perturbation, and multi-layer of surfaces may exist in one shell, making the function definition ambiguous. To solve these problems, we propose a novel method to characterize 3D shape using Delta functions. Then spherical functions are defined by sampling in the frequency domain of the Delta functions for SHT. By doing so, our method can support retrieval with controllable acuity, which benefits wider range of applications and facilitates customization to different users. Experiments have shown that our method is more robust than previous approaches.
1. Introduction 3D models now play an important part in many applications, such as computer game, computer aided design, E-business, molecular biology, cultural relic preservation, etc. These applications lead to the number of models dramatically increasing in the past decade. Utilizing the existing models can be helpful in many occasions, which requires effective methods for indexing or retrieving the models in a large database. Annotation characters in the files of some types of 3D models benefit retrieving them with the well studied keywords matching algorithms. However, for other types of models, manually noting the property for each one is impractical, and many models have peculiar shapes which are hardly to express in a few words. Hence, developing shape-based retrieval algorithms is required for fully exploiting the 3D models available. As geometric shapes are the features people mostly are concerned with when retrieving models in a database, most existing shapebased retrieval algorithms [1, 2, 3, 4, 5, 6, 7, 8, 9] were designed by parsing the geometry of the models. Our method also falls into this type. The workflow of a 3D search engine can be summarized as follows: First, as an offline step,
geometric features for each model in a database are extracted and stored as an indexed table. Second, for one query, the input model is parsed for feature extraction. Third, the feature of the input model is compared with the features in the indexed table for similarity measuring. Fourth, the return of the query can be the k-nearest models or models within a tolerance ε of the feature measuring errors with the input model. Many previous methods for 3D model retrieval define functions on concentric spheres as approximations of 3D geometry for spherical harmonic transform (SHT) [3, 4, 9, 10, 11]. Some of the methods use Principal Component Analysis (PCA) [9] for pose normalization, while others [3, 10, 11] discard this normalization phase and use a collection of energies at each frequency of the spherical harmonic series to construct shape descriptors in themselves rotation invariant. The problem with these shape descriptors is that the instability and ambiguity of the definition of spherical functions make the methods unstable. In this paper, we propose a novel method to characterize 3D shape using Delta functions. Like the ray casting method [9], a cluster of rays are cast from the centroid of the model. Delta functions are used then for representing intersections of the rays with the surfaces of the model. Fourier analysis is performed afterwards to transform the Delta functions into the frequency domain. Finally, spherical functions are defined by sampling in the frequency domain of the Delta functions for SHT. The remainder of the paper is organized as follows: After summarizing the previous approaches to 3D model retrieval in Section 2, the method of using Delta functions to represent 3D shape as well as the definition of spherical functions in our scheme are introduced in Section 3. Some experimental results are demonstrated in Section 4. Finally, we conclude this paper in Section 5.
2. Previous Work The key problem for 3D model retrieval is how to quantify the similarity between two arbitrary shapes. Criterions for designing good shape comparison algorithms are: (1) A 3D model should be considered the same up to similarity transformations including position translation, scaling and rotation. (2) For a queried model, the ranking of models in the database should be close to
human perception. Moreover, an efficient shape comparison algorithm that can be used for real-time applications should be built on a per-model basis. That is, a quantified feature vector is extracted for each model, and then the similarity measure is merely based on the feature vectors. In this scheme, direct comparison of the input model with each one in the database is avoided. In the following, we review some of the most related approaches for 3D retrieval. The topology based approach [6, 7] extract skeletons of a 3D model, and then some graph matching algorithms are used for shape comparison. Although these approaches are flexible and can be used for matching deformable objects, the time-consuming nature makes the methods not suitable for real time applications. The Directional Histogram Model [5] characterizes 3D shape using hardware accelerated methods. Parallel rays are used to estimate the thickness distribution of a model in the directions per latitude and longitude. Then SHT is performed for constructing a rotation invariant feature matrix. Shortcomings of this method are two-fold. First, it underestimates features in the concave regions and completely loses features in the inner surface because of self-occlusion. Second, computation of DHM is slow. For example, generating the coarsest version of a shape descriptor needs about 8 sec. Shape Distributions [1, 2] are much easy to compute and they are robust to perturbations and noises. The methods first randomly sample points on a mesh, and then a statistical distribution is generated by calculating the Euclidean distances of the point pairs and counting the number of line segments that fall into each distance bin. Problem with this approach is that it is not discriminative enough to differentiate shapes. The binary spherical functions [3, 4, 10] have a good tradeoff between efficiency and efficacy. For a query, the input model is first rasterized into a 2 N ×2 N ×2 N voxel grid. The value in each voxel is 1 if there is a point on the mesh within it, otherwise is 0. Then spherical functions are defined on the grid with radii 1 to N for SHT to obtain rotation invariant feature matrix. Vranic [9] proposed an improvement of the shape descriptor based on SHT [3, 4, 10]. He uses ray casting from the centroid of a model in the directions per latitude and longitude to estimate 3D shape. Then distances from the centroid to the intersection points in each concentric shell are used to define the spherical functions. In this way, some fine details can be preserved, which may be lost in the voxel grid. Other shape descriptors are mostly rotation variant, and hence a pre-pose normalization procedure is required. Examples include Shape Histograms [8], Moments [12], Extended Gaussian Images [13], Complex Extended Gaussian Images [14], Spin-Images [15], etc. Experiment
proved that the performances of these shape descriptors are not very good in general.
3. Definition of Robust Spherical Functions 3.1. Basic Idea of our Method In this section, we propose a novel representation of 3D geometry using delta functions. Similar to the raycasting method [9], a collection of rays are projected in the directions per longitude and latitude π 2π (θ , ϕ ),θ = (i + 0.5) , ϕ = ( j + 0.5) , i, j = 0,1,..., N − 1 . i
j
i
N
j
N
This is illustrated in Figure 1. If the spherical functions are defined at different radii [3, 4, 9], the surface of a model may shift between different shells under perturbation, which is demonstrated in Figure 2. On the left part, the ring identifies the surface of a spherical model. All meshes are located in the outermost shell. After SHT, the energy of the model is totally contained in the outermost spherical function. On the right part, the curve shows the surface of a sphere with tubers as noises. In the frequency domain after SHT, the outermost shell only contains a small part of the energy, while the adjacent shell with a smaller radius contains most of the energy. Since L2-norm, which calculates the squared sum of the difference of energies at each frequency, is used for measuring the dissimilarity between feature vectors, the two models are considered totally different. This is obviously different from human perception. Therefore, the definition of spherical functions in shells is not robust enough. Generally, for a normal model, its surfaces in a shell are scattered small pieces, and the sampling rays are at discrete locations. This is the reason why the approximation for each piece leads to considerable distortions. Reducing the number of shells would alleviate the instability. However, the chance of a ray intersecting the surface with multi-points in one shell increases. A tradeoff has to be made between the instability and ambiguity. Vranic [9] suggests using 8 shells for the definition of spherical functions. Moreover, if we only record rotationally invariant feature vectors of the spherical functions defined on their corresponding shells, there exist rotation freedoms which should be fixed between any two of the functions. By using our method, this problem can be solved by preserving the inter-shell information. Spherical parametrization method [16] offers a powerful tool for decomposing a genus-zero model to its spherical harmonics. However, the method can not process models with arbitrary genus automatically, and can not handle models in the common repository robustly since many of these models may have topology errors that lead to the remeshing process uncontrollable.
with meshes, with the distances to the centroid of the model being d1 , d 2 , d3 , d 4, d5 . The distribution of the intersections is represented as summation of Delta functions M f ( x) = w(d )δ ( x − d ) .
∑ i =1
Here
i
i
⎪⎧⎪ 0, x ≠ 0 ⎪ , δ(x)= ⎪⎨ ε ⎪⎪∫ δ(x)dx = 1 ⎪⎪⎩ −ε
where ε is an arbitrary positive number, and M is the number of intersections of a ray with meshes. w(di ), i = 1, 2...M , are the weights of the Delta functions. In Figure 1. The Ray-Casting Method: A collection of rays are cast out from the centroid of a model, and intersections with the surface are utilized to characterize 3D geometry.
Figure 2. Definition of Spherical Functions in Shells: Noises on the surface will make the definition of spherical functions unstable.
Figure 3. Intersections of a Ray with Meshes: Each intersection corresponds to a Delta function, the summation of these functions characterize the geometry of a 3D model at a fixed direction. 3.2. Characterizing 3D Shape Using Delta Functions We propose using Delta functions to characterize the intersections of a projected ray with meshes. As shown in Figure 3, for example, there are five intersections of a ray
our implementation, we set w(d ) to be constant 1. In this scheme, geometric information is preserved without loss.
Figure 4. Robust Definition of Delta Functions: Intersections near the centroid may shift between two opposite rays under perturbation. Combining all the intersections in one function makes the method more robust to the displacement of shape centroid.
Figure 5. User Controlled Shape Matching: The distance between the queried model and each one in the database is defined as the difference between the colored parts of their feature matrices. λ ,η are used for specifying the scope of matched parts of feature vectors. However, this representation is still not robust enough under perturbations. As illustrated in Figure 4, deciding intersections in a specified direction belonging to ray (θ , ϕ ) or to ray (π − θ , π + ϕ ) is sensitive to the shape centroid. To avoid this problem, in our implementation,
we formulate the distribution of intersections in a specified direction (θ , ϕ ) as follows: M
N
i =1
j =1
f ( x) = ∑ w(di )δ ( x − di ) + ∑ w(l j )δ ( x + l j ) Here, M is the number of intersections of the ray (θ , ϕ ) , N the intersection number of the ray (π − θ , π + ϕ ) , and di , i = 1, 2,...M and l j , j = 1, 2...N the intersections with the rays with respectively.
directions (θ , ϕ ) and (π − θ , π + ϕ ) ,
Since Delta functions are ideal abstracts in mathematics, they are hard to represent and process in computer programs. A natural idea is to generate spherical functions as parametrization of a 3D model by sampling in the frequency domain. Hence, Fourier Transforms are utilized. In mathematics, the Fourier Transform of the Delta function δ ( x − d ) is formulated as
F( δ (x - d))= e-i2π dt . Then function G(t ) which represents the spectra of f ( x ) is derived as M
N
i =1
j =1
G (t ) = F ( f ( x)) = ∑ w(di )e− i 2π di t + ∑ w(l j )e
i 2π l j t
To ensure scale invariance, we use the method presented in [1, 2] to generate randomly distributed points on meshes. Then the scale of a model L is calculated as the averaged distance of these points to shape centroid. In this scheme, sampling scope in the frequency domain is defined as 1 . t ∈ (0, ) 2π L We regularly sample S nodes in the scope of the frequency domain k , k = 1, 2,...S . tk = 2π LS
In this scheme, S spherical functions are generated for a 3D model Rk (θ , ϕ ) = G(θ ,ϕ ) (tk ) , k = 1, 2,...S . 3.4. Retrieval with Controllable Acuity After the S spherical functions are defined, SHT can be utilized for generating shape descriptors for similarity measuring. As introduced in [3], a spherical function f (θ , ϕ ) could be decomposed into series of its harmonics l
f (θ , ϕ ) = ∑ ∑ al ,mYl m (θ , φ ), l = 0 m =− l
the frequency domain. In practice, Fast SHT methods [17] are used for deriving the coefficients in the first B frequencies by sampling a 2 B × 2 B grid per latitude and longitude for a spherical function B −1 l R (θ , ϕ ) ≈ a Y m (θ , ϕ ) .
∑∑
k
k ,l , m l
l = 0 m =− l
It is proved that the summation of the squared norms of coefficients { Ak ,l } in each frequency is a rotational invariant
3.3. Sampling in Frequency Domain
∞
where {al ,m } , l = 0,1,...., m = −l , −l + 1,....l are the coefficients in
{ A } , k = 1, 2,...S , l = 0,1,...B − 1, A k ,l
k ,l
l
∑
=
m =− l
ak2,l ,m .
Hence, the dissimilarity Ds between two shapes can be defined as the L2-norm of their feature matrices 2 S B −1 . Ds = ( A − A' )
∑∑ k =1 l = 0
k ,l
k ,l
In this definition, the quantified feature for each model is compact and in itself rotation invariant. However, much useful information in the coefficients is lost. Vranic [9] proposed to use Continuous Principal Component Analysis (CPCA) to translate a 3D model to its canonical pose to ensure rotation invariance. In fact, for models with distinct reflective symmetry, using the method in [18] for pose normalization is much robust. After aligning the posture for each model, the distance between two models can be formulated as Ds =
S
B −1
2
l
∑∑ ∑ (a k =1 l = 0 m =− l
k ,l , m
− ak' ,l , m )
Note that our definition of the spherical functions {Rk (θ ,ϕ )} , k = 1, 2.., S supports retrieval at different acuity levels while most of the previous approaches did not. In some applications, only models closest to the input are desirable; while in others, models which are roughly similar to the input are acceptable. Hence, the flexibility of our method benefits different requirements of retrieval. For compactness in restoring, in our current prototype system, we only record the rotationally invariant feature matrix for each model. To obtain more flexibility for online retrieval, the sampling process in the frequency domain of function G(t ) is taken at a larger scope tk =
k , k = 1, 2,...T , T = 2 .5 S . 2π LS
In the offline procedure of feature library building, more spherical functions are defined for each model since the sampling scope is wider. After the SHT, a rotation invariant vector is generated for each spherical function. Since feature matrix is composed of these vectors as its rows, the archived feature matrices have T rows and B columns, with B being the dimension of the vectors. At the online retrieval process, a user can select the frequency scope for the definition of spherical functions
by choice. In our implementation, two parameters λ and η are introduced for facilitating his selection. 3 λ k , k = 1, 2,...η S , λ ∈ (0, ) , η ∈ (0,1] . tk = + 4π L 2π LS λ is used for control the acuity level of the shape descriptor, and η is defined for specifying the number of feature vectors used for shape comparison. Then, the distance Ds between the queried model and a model in the database is defined as the root of the summation of squared distances between the common parts of their feature matrices as shown in Figure 5. λ
Ds =
2
η
( + ) L 2π L B −1
∑λ ∑ ( A
k=
l =0
k ,l
− Ak' ,l ) .
L
Previous methods can also allow a user to specify parts of the feature vector/matrix for shape matching. However, this would only lead to the efficacy of these methods decreased. For instance, in the binary spherical function method [3, 4, 10] or the ray-casting concentric spherical function method [9], each vector in the feature matrix characterizes the geometry of a model at a fixed radius. Discarding parts of their feature matrix will make the description of 3D shape insufficient. In our scheme, all row vectors of the feature matrix characterize the global feature of 3D geometry. However, sensitivity of these feature vectors to shape variation differs much: The vectors corresponding to larger t are much sensitive and vice versa. This is intuitively true since t identifies the frequency in the Fourier analysis. Therefore, adjusting the acuity of our shape descriptor is very easy. We just need to change the value of parameter λ . Now, we give a brief mathematical analysis below: Suppose two models A, B are similar. There are
k intersections of a ray (θ , ϕ ) with the surface of each model. The distances of these intersections to the shape centroids are: {a1 , a 2 ,...a k } , {b1 , b2 ,...bk } , respectively, which satisfy the inequality ai − bi < ε , 0 < ε 1 . The scope of sampling in the frequency domain is λ λ η . t ∈ (u , v), u = , v = + L L 2π L It is not hard to derive the inequality between the two functions Ga (t ) and Gb (t ) with the assumption that all weights w(di ) are set to be a constant 1: Ga (t ) − Gb (t ) < k (ei 2πε v − 1) , t ∈ (u , v) .
Since ε 1 , the above inequality can be approximated as Ga (t ) − Gb (t ) < 2π kε v , t ∈ (u , v) .
We can see from the last inequality that the acuity of our shape descriptor is proportional to λ when the models are similar since v ∝ λ .
4. Experimental Results In this section, we demonstrate some retrieval results which are implemented on our prototype 3D retrieval system. There are 740 CAD models in our database. Most of the models are ill defined, having irregular contours and complex inner structures. Our system works by first extracting the feature for each model in the database as an offline procedure. The rays are cast from the centroid of a model at 128 ×128 directions regularly distributed per longitude and latitude. Then the same number of functions is defined to represent the intersections for each direction. Finally, spherical functions are generated by sampling in the frequency domain of the functions at k , t = k
2π LS
where k = 1, 2,...T , T = 150 , S = 60 and L is the averaged distance of the points on the facets to the model centroid. We build the feature library by recording the rotationally invariant feature matrix { Ak ,l } , k = 1, 2,...150 , l = 0,1,...63 for each model. Then a user can specify the
parameters λ and η to customize the retrieval. If we set a threshold ε as the upper-limit of the distance to a queried model, the more insensitive shape descriptor generates more candidate outputs for user selection. Therefore, we recommend that the default setting of λ be 0. At the same time, η is set to be 1. The retrieval results are shown in Table 1. Obviously, our method performs quite well at most of the cases. However, in some occasions, the models that a user expects may not rank at the top of the output list. For traditional methods, we have no choice but just to search the whole database iteratively. In our scheme, we can simply enter new values for the parameters, and try the query once again. In Figure 6, we illustrate this process of finding a tea-pot from a flat cylinder as the input model. Generally, we observed that when λ is small, the method behaves better on geometrical structures than on geometrical details; when λ is large, the method becomes more sensitive to the details. For instance, a cube is among the top matches of a sphere when λ is small, since both of them are simple shapes with only one intersection of the rays at all directions with the meshes. When λ becomes larger, the cube no longer exists in the top matches of a sphere-liked model.
Table 1. Some Retrieval Results: We randomly selects a model in the database as the query, and then our system returns a list of outputs ranking on the degree of similarity to the input. In this table, we just show the top six matches. You can see the method with the default setting works quite well on our database which consists of 740 CAD models.
Figure 6. Retrieval with More Flexibility: If a user aims at finding a tea-pot from a flat cylinder through our 3D retrieval system with the default setting λ = 0,η = 1 , and suppose he just checks the top ten matches, the first time he may fail to get what he wants. Then, he can change the parameter η = 0.5 and re-performs the query. This time he succeeds in finding what he wants.
Figure 7. Precision-recall Curves Tested for 4 Shape Descriptors on Our Database.
We compared our method with many previous ones. In our database, the method with robust definition of spherical functions (RSF) performs better than the spherical extent function (SEF) [9], concentric spherical functions (CSF) [9], and Shape Distributions [1, 2]. In the implementation of the SD method, we sampled 50,000 point pairs and chose the number of distance bins as 60. The precision-recall curves are shown in Figure 7. We manually select as much as 282 3D models from our database; each model belongs to one of 27 large categories, respectively. Models do not belong to any of these categories are excluded from the Precision-recall test. The run-time of our method is about 3 sec for each query on a P4 2.4GHZ, 256M memory PC.
5. Conclusion and Future Work In this paper, we propose using a collection of Delta functions to represent 3D shape. Then, spherical functions are defined by regularly sampling a user specified range of Fourier spectra of the Delta functions. Finally, we use spherical harmonic series to approximate the spherical functions for obtaining a rotationally invariant manner. The advantage of our method is twofold. Primarily, it avoids the instability and ambiguity in specifying the intersections of rays with meshes at different concentric shells for the definition of spherical functions. Secondly, our method supports retrieval with more flexibility. One can select a range of spectra for shape comparison. Different selection strategies can benefit a wide range of applications. Although our method is more robust than previous ones, shortcoming of the shape descriptor can still be attributed to the dependence on the model centroid. Future work includes estimating the performance of our shape descriptor on large publicly available 3D shape repositories, which contains more general 3D models than our database. Moreover, we would like to design efficient methods for 3D model retrieval which are completely independent of the position of shape centroids.
Acknowledgements We would like to thank the invaluable comments of the reviewers. This work is partially supported by the NSFC grant (NO. 60333010), the President’s foundation of Peking University for undergraduates.
References [1] R. Osada, T. Funkhouser, B. Chazelle and D. Dobkin, “Matching 3D models with shape distributions.” Proc. Int. Conf. on Shape Modeling and Applications, pp. 154-166, 2001.
[2] R. Osada, T. Funkhouser, Bernard Chazelle, and David Dobkin, “Shape distributions,” ACM Trans. on Graphics, 21(4), pp. 807-832, 2002. [3] M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz, “Rotation invariant spheral harmonic representation of 3D shape descriptors,” Proc. Symp. on Geometry Processing, pp. 156-164, 2003. [4] M. Kazhdan and T. Funkhouser, “Harmonic 3D shape matching,” SIGGRAPH 2002, Technical Sketch, 2002. [5] X. Liu, R. Sun, S. Kang and H. Shum, “Directional histogram model for three-dimensional shape similarity,” Proc. IEEE. CVPR, Volume1, pp.18-20, 2003. [6] M. Hilaga, Y. Shinagawa, T. Kohmura and T. L. Kunii, “Topology matching for fully automatic similarity estimation of 3D shapes,” SIGGRAPH 2001, pp. 203–212, 2001. [7] H. Sundar, D. Silver, Gagvani and S. Dickinson, “Skeleton based shape matching and retrieval, Proc. Shape Modeling International 2003, pp. 130-143, 2003. [8] M. Ankerst, G. Kastenm¨uller, H.-P. Kriegel and T. Seidl, “3D shape histograms for similarity search and classification in spatial databases,” Proc. 6th Intl. Symp. on Advances in Spatial Databases, pp. 207–228, 1999. [9] D. V. Vranic, “An improvement of rotation invariant 3D shape descriptor based on functions on concentric spheres,” Proc. Int. Conf. on Image Processing (ICIP 2003), volume 3, pp. 757–760, 2003. [10] T. Funkhouser, P. Min, M. Kazhdan, J. Chen, A. Halderman, D. Dobkin and D. Jacobs, “A search engine for 3D models”, ACM Trans. on Graphics, 22(1), pp. 83– 105, 2003. [11] G. Burel, H. Henocq, “Three-dimensional invariants and their application to object recognition,” Signal Processing, 45(1), pp. 1-22, 1995. [12] M. Elad, A. Tal and S. Ar, “Content based retrieval of VRML objects- an iterative and interactive approach,” EG Multimedia, pp. 97–108, 2001. [13] B. Horn, “Extended Gaussian images,” Proceedings of the IEEE, 72(12), 1984, pp.1671–1686, 1984. [14] S. Kang and K. Ikeuchi, “Determining 3-D object pose using the complex extended Gaussian image,” Proc. IEEE. CVPR, pp. 580–585, 1991. [15] A. E. Johnson and M. Hebert, “Using spin-images for efficient multiple model recognition in cluttered 3-d scenes,” IEEE Trans. PAMI, 21(5), pp: 433–449, 1999. [16] E. Praun, H. Hoppe, “Spherical parametrization and Remeshing,” ACM. Trans. on Graphics, 22(3), pp. 340349, 2003. [17] D. Healy Jr., D. Rockmore, P. Kostelec and S. Moore, “FFTs for the 2-sphere - improvements and variations,” The Journal of Fourier Analysis and Applications, 9(4), pp. 341 – 385, 2003. [18] M. Kazhdan, B. Chazelle, D. Dobkin, A. Finkelstein and T. Funkhouser, “A reflective symmetry descriptor,” Proc. ECCV, pp. 642–656, 2002