A Perceptual Shape Descriptor Nafiz Arica and Fatos T. Yarman Vural Computer Engineering Department, Middle East Technical University, Ankara / Turkey {nafiz,vural}@ceng.metu.edu.tr Abstract In this study, we present the two dimensional object silhouette by a one dimensional descriptor, which preserves the perceptual structure of its shape. The proposed descriptor is based on the moments of the angles between the bearings of a point on the boundary, in a set of neighborhood systems. At each point on the boundary, the angle between a pair of bearings is calculated to extract the topological information of the boundary in a given locality. The proposed method does not use any heuristic rule or empirical threshold value in the shape representation. The similarity between the patterns is measured by elastic matching of the descriptors. The proposed shape descriptor is tested on dataset of MPEG 7 Core Experiments Shape-1. The experiments show better results than the previous studies reported in the literature.
1. Introduction The advances in MPEG-7 activities for contentbased description of images come to a point to represent the shape information in a standard format, which allows searching and browsing images in a database with respect to the shape information. The goal of the shape descriptors is to uniquely characterize the object shape. The shape descriptor should be affine-invariant and insensitive to noise. It should contain sufficient information to resolve distinct images and compact enough to ignore the redundancies in the shapes. Additionally, the descriptors should give results consistent to human visual system. Therefore, it is desirable to find a shape descriptor, which represents the perceptual shape information in a simple function. The shape description methods can be divided into three main categories [1]; contour based ([2], [3],[4]), image based [5], [1] and skeleton based descriptors [6]. In the Core Experiment CE-Shape-1 for the MPEG-7 standart, the performance of above referenced shape descriptors are evaluated. Among them, the descriptors proposed in [2] and [3] outperform the others in most of the experiments. In [2], the scale space approach is applied to shape boundary. The simplified contours are obtained by the scale-space curve evolution based on contour smoothing with a Gauss function. The scale
space image of the curvature function is then used as hierarchical shape descriptor. In [3], the shapes are simplified by a process of curve evolution and represented in tangent space. To compute the similarity measure, the best possible correspondence of visual parts is established. However none of the available techniques fully satisfies affine invariance and has limited performance under noise. Therefore, in order to fully utilize the shape information with its full power, more research is required. The motivation of this paper is to convert the closed boundary of an object to an open boundary by breaking it at one point and kindly overlaying it in a straight line without disturbing its topological properties. This requires a mapping, which transforms the 2-D shape information into 1-D single valued function, which is consistent with the human visual system. In other words this function should represent all the convexities and concavities of the shape with minimal distortion. The proposed descriptor is based on the concept of bearing, which indicates the direction to which a point lies from a reference point. A pair of bearings forms a pair of arms joining at the reference point. The characteristics of each boundary point can be extracted by using the bearings at a set of neighborhood systems. The angle between each pair of bearings is taken as the random variable at each point on the boundary. Then, the moment theorem provides all the statistical information, which is used to obtain the desired function where each valley and hill corresponds to a concave and convex visual part of the object shape. After the representation of 2-D shape boundary as a 1-D function, elastic matching is employed for similarity measurement. The main contribution of this study is to eliminate the use of any heuristic rule or empirical threshold value in representation of shape boundaries. The proposed method also gives globally discriminative features to each boundary point by using all other boundary points. Another advantage of this representation is its simplicity yet consistency with human perception through preserving the concave and convex parts of the shapes. It provides and compact representation of shapes, which is quite stable under noise and affine transform and invariant to size, orientation and position of the object. After giving a detailed explanation of the shape representation in section 2, the similarity measurement
1051-4651/02 $17.00 (c) 2002 IEEE
is described briefly in section 3. The similarity tests performed on a data set, which is also used in MPEG Core Experiments is explained in section 4. Finally, section 5 concludes the paper and directs the future studies.
the resolution of the representation. Therefore, selection of k depends on the image types, and variety of the images in the database.
2. The Representation Let the shape boundary B = { p1,...,pN } is represented by a connected sequence of points,
pi = ( xi , yi )
i = 1,...N
(a)
(1)
where N is the number of boundary points and pi=pi+N. When processing a point pi in the sequence, the algorithm takes the pair of arms, which are constructed by the subsequent and previous points at a distance k from that point under consideration. Based upon the above discussion, the arms at each point of the boundary can be represented by the forward (Vi+k) and backward (Vi-k) vectors for a positive integer k, as follows; Vi + l = (xi + l − xi , yi + l − yi )= (∆xi + l , ∆yi + l ), l = ± k (2) The slope of each vector (Vi-k) and (Vi+k) is then calculated as, ∆y θ Vi +l = tan −1 i +l , l = ± k (3) ∆xi +l For the point pi, the angle between the forward and backward vectors with length k, is then computed as (see figure 1.b),
(
C K (i ) = θ Vi + k − θ Vi − k + π
)
(b)
(c)
(4)
(d)
Figure 1. The angle between forward and backward vectors with k=5. For a fixed k, the plot of CK(i) versus i represents a one dimensional function where i indicates the boundary length with respect to a starting point. In figure 2, CK(i)s for a given shape are plotted with various k values. Note that the plots for small k indicate fine details. By increasing k, it is possible to plot fine to curse representation of a closed boundary in one dimension. In such a representation the major problem is to select the best k, which represent the shape information in a sufficient detail to distinguish distinct shapes in large databases. It is quite clear that k defines
(e) Figure 2. a) A sample shape boundary and Ci,k’s for b) k=N/40, c) N/10, d) N/4 and e) Ci,k plots for N/40…N/4 from top to bottom respectively. At this point, we need a stable measure, which provides a consistent descriptor to extract the shape information in an image database with sufficient detail. Although it is possible to estimate an `optimum` value for k, using the popular training methods, this approach requires a complete data set for the training process, which possess practical difficulties. In this study, for each boundary point i, we take C(i) as a random variable and propose to calculate its
1051-4651/02 $17.00 (c) 2002 IEEE
moments. For this purpose, the mth moment of the random variable C(i) is calculated from Ε C m (i ) = ∑ C Km PC (i ) (C K (i )) m = 0,1, 2, 3,... (5)
(
)
K
In the above formula E indicates the expected value operator and PC(i)(CK(i)) is the probability density function of CK(i). Note that the maximum value of K is N/2, where N represents the number of boundary points. Note also that, the value of CK(i) becomes sufficiently small as K approaches to N/2 and it becomes 0 at K=N/2. The moments describe the statistical behavior of the random variable C(i) at the boundary point i. Each boundary point is then represented by the vector whose components are the moments of C(i); Γ(i ) = Ε[C 1 (i )], Ε[C 2 (i )],K (6) For simplicity, the jth moment of C(i), E[Cj(i)], will be denoted by Γj(i) and referred as the jth component of the point descriptor, throughout the paper. The representation of each boundary point by the above vector converts the 2-D shape information into a multivalued 1-D function. In figure 3, the first three moments of boundary points are plotted for a sample shape. For this particular example, it should be noted that the first order moment suffice to represent the convexities and concavities of the shape. The second order moment increases the discriminative power of the representation. The third order moment on the other hand, does not bring additional significant information to the representation.
(
(a)
)
(b)
(c) Figure 4. a) Convex visual parts in sample shape and (b) its sheared and rescaled version c) The correspondence of visual parts and comparison of two representation.
3. Similarity Measurement
Figure 3) First three moments of C(i) for the shape in figure 2.a. The shape boundary is finally represented by plotting the moments of Ci’s for all the boundary points. In the proposed representation, the first moment preserves the most significant information of the object. Figure 4.c indicates the correspondence of each convex parts with the 2-D shape information yielding a consistent representation to the human visual system. The representation, explained above, allows us to compare any shapes as long as they are specified by simple closed curves. It is invariant to size, orientation and position of the object and stable under distortions and shear transform (see figure 4.b). In figure 4.c, it is shown that the representation of the shape in figure 2 is very similar to its sheared and rescaled version.
Elastic matching is one of the most powerful computational approaches for measuring the similarity of shape boundary. It is an application of dynamic programming algorithm where the objective is to minimize the distance between two patterns by allowing deformations on the patterns. Through elastic deformation, it promises to approximate human ways of perceiving similarity and to posses a remarkable robustness to distortions. Given two patterns, the elastic matching algorithm measures the distance based on the correspondence of the items, which construct the patterns. The algorithm attempts to minimize the total cost of matching items in two patterns. The cost of matching two items is calculated by a distance function defined in the feature space of items. The most common distance measurement is based on Euclidean metric. Given a choice of cost functions, the minimization problem can then be solved by a dynamic programming method. In this study, we used the elastic matching algorithm for measuring the similarity between two shape descriptors, which are defined by a multi-valued one-
1051-4651/02 $17.00 (c) 2002 IEEE
dimensional function. Since the shape boundary descriptors are ordered strings, the matching means a monotone correspondence between the points of two strings, taking endpoints to endpoints.
4. Experiments The experiments are performed on the data set of MPEG Core Experiments Shape-1 part B, which is the main part of the Core Experiment CE-Shape-1 for similarity-based retrieval systems. The total number of images in the database is 1400 with 70 classes of various shapes, each of which consists of 20 images. During the experiments each image is used as a query image and the number of similar images, which belong to the same class, was counted in the top 40 matches. Since the maximum number of correct matches for a single query image is 20, the total number of correct matches is 28000. The experiments are performed on 1400 test images, in C programming language on a Unix workstation environment. The boundaries of objects are represented by the proposed shape descriptor of a 1-D function. 100 points with equal distances are used to extract the feature vector. Ci values between 0 and 2π for the boundary points are taken as it is. The coded boundaries are used as shape descriptors. Then, the elastic matching algorithm is employed for similarity measurement. The proposed descriptor correctly matches 80.7% of the images correctly. In Table 1, the comparison of the proposed descriptor with the recently reported results of [2], [3], [4], [5], [6] and [7] is provided. As it is seen from the table, the proposed descriptor performs better then the bestperformance descriptors available in the literature, for the data set of MPEG CE Shape-1 part B.
to represent the resolution of the boundary, thus eliminates the dependency of the representation to the data set. The proposed method is based on moments of bearings of a point on the boundary, in a set of neighborhood systems. Elastic matching algorithm is used to measure the similarity distance. It catches the similar shapes in an image database, successfully. It pays no attention to the initial points on the shape boundary. It is invariant to shape size and position. It is also stable under noise and shear transform.
References [1] L. J. Latecki, R. Lakamper, U. Eckhardt, “Shape Descriptors for Nonrigid Shapes with a Single Closed Contour”, Proc. IEEE Conf. CVPR, June 2000. [2] F. Mokhtarian, S. Abbasi, J. Kittler, “Efficient and Robust Retrieval By Shape Content Through Curvature Scale Space”, Image Databases and Multimedia Search, A. W. M. smeulders and R. Jain ed. p. 51-58 World Scientific Publication, 1997. [3] L. J. Latecki, R. Lakamper, “Shape Similarity Measure Based on Correspondence of Visual Parts”, IEEE Trans. PAMI, vol.22, no.10, p.1185-1190, 2000. [4] G. Chuang, C. –C. Kuo, “Wavelet Descriptor of Planar Curves: Theory and Applications”, IEEE Trans. Image Processing vol.5, p.5670, 1996. [5] A. Khotanzan, Y. H. Hong, “Invariant Image Recognition By Zernike Moments”, IEEE Trans. PAMI, vol.12, p.489-497, 1990. [6] L. –j. Lin, S. Y. Kung, “Coding and Comparison of Dags as a Novel Neural Structure With Application To Online Handwritten Recognition”, IEEE Trans. Signal Processing, 1996. [7] S. Belongie, J. Malik, J. Puzicha, “Shape Matching and Object Recognition Using Shape Contexts”, IEEE Trasns. PAMI, vol.24, no.4, p.509-522, 2002.
TABLE 1: BEST PERFORMANCES ON MPEG CE SHAPE-1 PART B. [7] [3] [2] [5] [4] % 76.51 76.45 75.44 70.22 67.76
[6] 60
Our method 80.7
For this particular data set, Table-1 shows that our method performs better than the best published studies in the literature.
4. Conclusion This study presents a robust shape descriptor for identifying the similar objects in an image database. The two-dimensional object silhouette is represented by a one-dimensional function, which preserve the most significant topological properties using the statistical information based on the bearings of individual points. This approach avoids the selection of a threshold value
1051-4651/02 $17.00 (c) 2002 IEEE