Variational Shape Matching For Shape Classification and Retrieval Kamal Nasreddine, Abdesslam Benzinou Ecole Nationale d’Ing´enieurs de Brest, laboratoire RESO - 29238 BREST (FRANCE)
Ronan Fablet Telecom Bretagne, LabSTICC - 29238 BREST (FRANCE)
Abstract In this paper we define a distance between shapes based on geodesics in shape space. The proposed distance, robust to outliers, uses shape matching to compare shapes locally. Multiscale analysis is introduced in order to avoid problems of local and global variabilities. The resulting similarity measure is invariant to translation, rotation and scaling independently of constraints or landmarks, but constraints can be added to the approach formulation when needed. An evaluation of the proposed approach is reported for shape classification and shape retrieval on a complex benchmark shape database. It demonstrates in both cases that previous work is outperformed. Key words: Shape classification, shape retrieval, contour matching, shape geodesics, multiscale analysis, robustness.
Email addresses: {nasreddine,benzinou}@enib.fr (Kamal Nasreddine, Abdesslam Benzinou),
[email protected] (Ronan Fablet)
Preprint to be submitted to Pattern Recognition Letters
November 13, 2009
1
1. Introduction and related work
2
This work is concerned with the definition of a robust distance between
3
shapes based on shape geodesics. The proposed distance is shown to serve
4
for shape classification and shape retrieval. Recently, computer vision has
5
extensively studied object recognition and known significant progress, but
6
current techniques do not provide entirely significant solutions [Daliri and
7
Torre, 2008; Veltkamp and Hagedoorn, 2001].
8
Regarding shape analysis and classification, similarity measures may be
9
defined from information extracted from the whole area of the object (region-
10
based techniques) [Kim and Kim, 2000], or from some features which describe
11
only the object boundary (boundary-based techniques) [Costa and Cesar,
12
2001]. The latter category may also comprise skeleton description [Lin and
13
Kung, 1997; Sebastian and Kimia, 2005]. Skeleton description of shapes has a
14
lower sensitivity to articulation compared with boundary and region descrip-
15
tions, but it is with the cost of higher degree of computational complexity
16
due to tree or graph matching [Sebastian and Kimia, 2005; Sebastian et al.,
17
2003]. On the other hand, boundary-based object description is considered
18
more important than region-description because an object’s shape is mainly
19
disriminated by the boundary. In most cases, the central part of object
20
contributes little to shape recognition.
21
The boundary-based approach described in this paper is established on a
22
comparison between matched contours. Contour matching has been already
23
widely applied for object recognition based on shape boundary [Diplaros
24
and Milios, 2002]. In general, contour matching methods are devided in
25
two major classes: those based on rigid transformations, and those based 2
26
on non-rigid deformations [Veltkamp and Hagedoorn, 2001]. Methods of the
27
first type look for optimal parameters which align feature points assuming
28
that the transformation is composed of translation, rotation and scaling only.
29
They may lack accuracy. Methods based on elastic deformations rely on the
30
minimization of some appropriate matching criterion. They may present the
31
drawback of asymmetric treatment of the two curves and in many cases lack of
32
rotation and scaling invariance [Veltkamp and Hagedoorn, 2001]. Moreover,
33
existing techniques typically take advantage of constraints specific to the
34
applications or uses shape landmarks. These points are generally defined
35
as minimal or maximal shape curvature [Del Bimbo and Pala, 1999; Super,
36
2006], as zero curvature [Mokhtarian and Bober, 2003], at a distance from
37
specific points [Zhang et al., 2003], on convex or concave segments [Diplaros
38
and Milios, 2002], or any other criteria suitable to involved shapes.
39
Shape analysis from geodesics in shape space has emerged as a powerful
40
tool to develop geometrically invariant shape comparison methods [Younes,
41
2000]. Using shape geodesics, we can state the contour matching as a varia-
42
tional non rigid formulation ensuring a symmetric treatment of curves. The
43
resulting similarity measure is invariant to translation, rotation and scaling
44
independently of constraints or landmarks, but constraints can be added to
45
the approach formulation when needed. This paper is an extension of the
46
work presented in [Younes, 2000] to the task of shape classification and the
47
task of shape retrieval.
48
The following is a summary list of the contributions of our work:
49
− Geodesics in shape space have been introduced to develop efficient
50
shape warping methods [Younes, 2000]. We exploit the corresponding 3
51
similarity measure to define a new distance for shape classification and
52
shape retrieval. This distance issued from a shape matching procedure
53
based on shape geodesics takes advantage of local shape features while
54
ensuring invariance to geometric transformations (e.g. translation, ro-
55
tation and scaling). In addition, a hierarchical approach is considered
56
based on the resolution of the shape sampling to deal with local and
57
global variabilities.
58
− As optimization method, beside dynamic programming which is gener-
59
ally used to solve variationnal problems, we propose here to use a new
60
minimization technique based on an incremental iterative scheme.
61
− To ensure more robustness against outliers, we introduce a robust cri-
62
terion as a modification of the similarity measure issued from shape
63
geodesics. Evaluation results show that this modification ensures a
64
faster convergence of the iterative scheme and avoids a convergence to
65
a local minimum.
66
− We establish the superiority of the proposed method over state-of-art
67
methods already used for shape classification and shape retrieval. The
68
test is carried out on a complex benchmark shape database, the part
69
B of the MPEG-7 Core Experiment CE-Shape-1 data set [Jeannin and
70
Bober, 1999]. This database is the largest and the most widely tested
71
among available test shape databases.
72
The subsequent is organized as follows. In Section 2 is detailed the pro-
73
posed framework for shape matching based on shape space, from where a
4
74
robust similarity measure between two shapes is taken. For numerical im-
75
plementation, Section 3 describes a new optimization technique, based on an
76
incremental iterative scheme, beside dynamic pogramming which is classicaly
77
used for this purpose. We discuss in Section 4 the benefit of the proposed
78
similarity measure on shape matching performances. Sections 5 and 6 are
79
devoted to present the derived multiscale distance proposed for shape classi-
80
fication and shape retrieval. Finally, in Section 7 we evaluate the proposed
81
distance for shape classification and shape retrieval experiments on the part
82
B of the MPEG-7 shape database and we compare results to other state-of-
83
art schemes.
84
2. Proposed contour matching
85
In this paper a boundary-based approach is considered. The comparison
86
between shapes is based on a similarity measure using shape geodesics. The
87
proposed similarity measure will be exploited to define a new distance be-
88
tween shapes for classification and retrieval purposes. A multiscale analysis
89
will be performed to take into account both local and global differences in
90
the shapes.
91
2.1. Shape geodesics
92
There are various ways to solve for shape matching problem, and many
93
similarity measures have been proposed in the case of planar shapes [Veltkamp,
94
2001]. Recently, shape geodesics have emerged as a powerful tool [Younes,
95
2000], they are widely used in analyses concerned with studying variations
96
and changes in the shape of organisms, for instance morphometrics and image
97
warpings. 5
98
{Figure 1 goes here}
99
{Figure 2 goes here}
100
Geodesics in the shape space are defined as paths between two shapes
101
(Figure 1) with respect to some metric. This metric is chosen to be in-
102
variant for a given set of transformations (e.g. translation, rotation, scaling,
103
. . . ). Mostly, shapes are considered as points on an infinite-dimensional Rie-
104
mannian manifold and distances between shapes as minimal length geodesic
105
paths. Retrieving the geodesic path between any two closed shapes resorts
106
to a matching issue (Figure 2) with respect to the considered metric. Let us
107
˜ locally characterized by the angle between the consider two shapes Γ and Γ
108
tangent to the curve and the horizontal axis (θ and θ˜ respectively). Following
109
[Younes, 2000], the matching issue is stated as the minimization of a shape
110
similarity measure given by :
θ(s) − θ(φ(s)) ˜ p ˜ SM (θ, θ(φ)) = 2 arccos φs (s) cos ds 2 s∈[0,1] Z
(1)
111
where s refers to the normalized curvilinear abscissa defined on [0, 1], φ is
112
˜ to the curvia mapping function that maps the curvilinear abscissa on Γ
113
linear abscissa on Γ and φs =
dφ . ds
The similarity measure considered here
116
˜ includes a measure of the difference between the two orientations θ and θ, ˜ cos θ(s)−θ(φ(s)) , and a term that penalizes the torsion and stretching along 2 p the curve, ( φs (s)).
117
Curve parametrization via the angle function θ(s) defined on the normal-
118
ized arc-length s allows to a representation which complies with the expected
119
invariance properties (translation and scaling). A translation of the curve has
114
115
6
120
no effect on θ, and an homothetie with factor λ has no effect on the normal-
121
ized parameter s. Thus curves modulo translation and homothetie will be
122
represented by the same angle function θ(s). A rotation of angle c transforms
123
the function θ(s) into the function θ(s) + c modulo 2π. To add rotation in-
124
˜ variance, Youness proposes to minimize SM (θ, θ(φ)) (Equation 1) over all
125
choices for the origins of the curve parametrizations.
126
2.2. Robust variational formulation
127
˜ ˜ respectively encoded in θ(s) and θ(s), Given two shapes Γ and Γ the
128
matching problem comes to the registration of two 1D signals. The regis-
129
tration consists in retrieving the transformation that best matches points
130
of similar characteristics (Figure 2). Formally, it resorts to determining the
131
˜ transformation function φ(s) such that θ(s) = θ(φ(s)). Here, we propose to
132
state this issue as the minimization of an energy E(φ) involving a data-driven
133
term, ED , that evaluates the similarity between the reference and aligned sig-
134
nals and a regularization term, ER . The term of regularization is considered
135
in order to obtain a smooth transformation function.
E(φ) = (1 − α)ED (φ) + αER (φ) Z ER (φ(s)) = |φs (s)|2 ds
(2) (3)
s∈[0,1] 136
where α is a variable that controls the regularity. From time causality, the
137
minimization of E(φ) has to be carried out under the constraint φs > 0.
138
139
140
The similarity measure we propose to use is derived from the similarity measure (given in (1)) using shape geodesics (proposed by [Younes, 2000]).
˜ To improve its robustness to outliers, we introduce a robust norm θ(s) − θ(φ(s))
ρ
7
141
˜ instead of the simple difference (θ(s) − θ(φ(s))). The principle is supported
142
by the use of a function that adjusts a weight ω in order to penalize the data
143
points with high variation compared to other points. Several forms of the
144
robust estimator ρ were proposed [Black and Rangarajan, 1996]. We will use
145
the Leclerc estimator given by:
krkρ = 1 − exp(−r2 /(2σ 2 )) 146
(4)
with σ is the standard deviation of data errors r. Using the above data-driven term in the functional E(φ) and after adding the robust estimator, the shape registration issue resorts to minimizing: p R kr(s)k E(φ) = (1 − α)arccos s∈[0,1] φs (s) cos 2 ρ ds R +α s∈[0,1] |φs (s)|2 ds
147
˜ where r(s) = θ(s) − θ(φ(s)).
148
3. Numerical implementation
149
150
(5)
To solve for minimization of E(φ), two methods are considered: dynamic programming and iterative scheme.
151
The dynamic programming algorithm is applied as follows. Given a step
152
˜ sj )j=1..M , the algorithm of discretisation and the discretized θ(si )i=1..N and θ(˜
153
considers in the plane [s1 , sN ] × [˜ s1 , s˜M ] the grid G which contains the points
154
p = (x, y) such that either x = si and y ∈ [˜ s1 , s˜M ], or y = s˜j and x ∈ [s1 , sN ].
155
We fetch a continuous and increasing matching function that is linear on each
156
portion that does not cut the grid. The value of the energy E(φ) is calculated
157
at each point of the grid depending on the values at previous points, and the 8
158
minimum is chosen. This procedure is iterated over all choices for the origins
159
of the curves. This algorithm is more detailed in [Trouv´e and Younes, 2000].
160
Here we propose to use an incremental iterative minimization, which is
161
shown to be computationally more efficient than the dynamic technique in
162
the case of registration without landmarks (see section 4 for comparison). At
163
iteration k, given φk we solve for an incremental update: φk+1 = φk + δφk such that δφk = argmin E(φk + δφ). The initialization of the algorithm is
164
δφ
165
given by the identity function taken in turn for all choices for the origins of
166
the curves. For each of these initializations, the algorithm iterates two steps:
167
1. the computation of the robust weights ωik issued from the robust es-
168
timator ρ. For instance, the weight issued from the Leclerc estimator
169
is ωik =
170
standard deviation of data errors r,
2 2 exp( −rσ2(si ) ) σ2
˜ k (si )) and σ is the where r(si ) = θ(si ) − θ(φ
172
2. the estimation of δφk = {δφk (si )} as successive solutions of the linP k earized minimization δφk = argmin i Ei . The key approximation of
173
˜ k+1 ) = θ(φ ˜ k + δφk ) ≈ θ(φ ˜ k ) + θ˜s (φk ) · δφk . For this linearization is: θ(φ
174
α = 0, the equation we obtain does not have a unique solution. The
175
resulting δφk (si ) for α 6= 0 is given by:
171
δφ
9
N (si ) D(si ) p φk (si+1 ) − φk (si−1 ) S(si ) = k ωi r(si ) ˜ k ˜ k (si−1 ))] (1 − α) g(si ) = sin [θ(φ (si )) − θ(φ 2 k ωi r(si ) N (si ) = −S(si )g(si )cos 2 k k +2α[2φ (si ) − φ (si−1 ) − φk (si+1 )
δφk (si ) =
(6)
−δφk (si−1 ) − δφk−1 (si+1 )] 1 D(si ) = S(si )g 2 (si ) − 4α 2
176
4. Shape matching performances
177
To study the impact of adding the robust criterion and the regularization
178
term, we will test here the matching process on synthetic contours (one con-
179
tour is obtained by applying a known transformation to the other one). Some
180
examples of these synthetic shapes are given in Figure 3 with a representation
181
of the used transformation function φ.
182
{Figure 3 goes here}
183
2 ˜ In Figure 4 we have reported the mean square error M SE = E θ − θ(φ)
184
obtained for different values of α ∈ [0, 1]. This result is issued from the dy-
185
namic programming algorithm. For high values of α, the term of regularity is
186
favored over the similarity measure and the alignment is attained with high
187
188
values of M SE. For small values of α, the robust algorithm ensures solutions with smaller errors corresponding to M SEφ = E |φapplied − φestimated |2 ≈
10
189
0.001. The gain1 due to the robust solution is represented in Figure 4(b);
190
this gain is optimum for α = 0 and reaches 90%. The aligned shapes given
191
in Figures 4(c) and 4(d) show the superiority of the robust solution. The
192
consistency has been verified by testing many transformation functions with
193
different shapes.
194
{Figure 4 goes here}
195
Using the incremental iterative scheme, the minimization leads to the
196
same optimum as the dynamic programming except for α = 0 (Figure 5).
197
For the iterative scheme the term of regularity is necessary, α must should
198
have a nonzero value to lead to a unique solution. Experimentally, a value
199
of α in the range [0.1, 0.2] is optimal.
200
{Figure 5 goes here}
201
{Figure 6 goes here}
202
In Figure 6, we have reported another tested synthetic shape obtained
203
by applying an occlusion on the shape given in Figure 3(c). The results of
204
its matching with the reference shape given in Figure 3(a) are reported in
205
Figures 7 and 8. We see that the robust algorithm is more robust against
206
the occlusion, it is still able to align the curves and to find the applied
207
transformation with minor errors. The transformation found by the non
208
robust algorithm (Figure 7(b)) is so far than the real one (Figure 3(b)).
209
{Figure 7 goes here}
210
{Figure 8 goes here}
211
The effect of the robust solution is more visible when we analyze the 1
defined as:
M SEN onRobust −M SERobust M SEN onRobust
× 100
11
212
evolution of the algorithm through the initializations in turn for all choices for
213
the origins of the curves. For initializations at points which are far from the
214
correct solution, we have noticed that the mean square error M SE decreases
215
through iterations to attain the optimum with the robust algorithm, while it
216
stays high in the case of non robust algorithm. Without the robust estimator,
217
the minimization converges to a local minimum. Hence, the robust algorithm
218
can be carried out with only one initialization for one choice for the origins
219
of the curves. For the synthetic shapes given in Figure 3, we have reported in
220
table 1 the optima M SEs and the gain due to the robust solution for some
221
initializations at points which are far from the correct solution from different
222
angles (30◦ , 45◦ , 90◦ and 135◦ ). One can see that the robust algorithm always
223
converges to the global minimum in contrast to the non robust one.
224
Finally, the iterative method is also computationally more efficient in
225
the case of shapes without landmarks. The dynamic programming needs a
226
relatively longer time. For example, for the taken synthetic contours, this
227
time reaches 9.7 times that required by the robust iterative scheme.
228
229
{Table 1 goes here} 5. Distance-based shape classification
230
Here, we exploit shape geodesics for shape classification and propose to
231
compare shapes on the basis of a metric that takes into consideration shape
232
matching. The similarity measure used in Eq 5 is taken as the cost of de-
233
formation of the aligned shape. On the basis of a general algebraic and
234
variational framework, [Younes, 2000] has proved that the constructed cost
235
function meets all the conditions necessary for a true distance between planar 12
236
curves. Formally, the distance between two shapes S1 and S2 is defined as: d(S1 , S2 ) = ED (S1 , S2 (φ∗ )) where φ∗ = argmin E(S1 , S2 , φ)
(7)
φ 237
In this work, an hierarchical characterization will be issued from the com-
238
bination of shape matching at different sampling resolutions. Here, the scale
239
is considered related to the resolution of shape sampling, as considered in
240
[Attalla and Siy, 2005].
241
In order to avoid problems of local and global variabilities, the distance
242
used for shape comparison is a combination of distances measured at dif-
243
ferent scales. The final distance between shapes S1 and S2 used for shape
244
classification is defined as follows: N 1 X d(S1 , S2 ) = dk (S1 , S2 ) N k=1
(8)
245
where dk is the distance defined in Equation 7 between the same shapes at
246
the k th scale and N the number of considered scales.
247
Assuming we are provided with a set of categorized shapes, (Sl , Cl ), where
248
Sl is the shape of the lth sample in the database and Cl its class, the classi-
249
fication of a new shape S is issued from a nearest neighbor criterion.
250
6. Distance-based shape retrieval
251
In addition to shape classification performance, we also address shape
252
retrieval [Del Bimbo and Pala, 1999]. A retrieval problem consists in deter-
253
mining what are the shapes in the considered database that are the most
254
similar to a query shape. The classification accuracy of a shape descriptor 13
255
does not necessarily give a relevant guess of the retrieval efficiency [Kunttu
256
et al., 2006]. As for classification, the distance used for shape retrieval is the
257
distance defined in Equation 8.
258
7. Comparison to other schemes
259
To compare the proposed approach to the state-of-the-art shape recogni-
260
tion approaches, we evaluate it for shape classification and retrieval exper-
261
iments on the part B of the MPEG-7 shape database [Jeannin and Bober,
262
1999]. This database is composed of a large number of different types of
263
shapes: 70 classes of shapes with 20 examples of each class, for a total of
264
1400 shapes. The classes of shapes include natural and artificial objects. The
265
shape recognition on this database is not simple because elements present
266
outliers so that some samples are visually dissimilar from other members of
267
their own class (Figure 9). Furthermore, there are shapes that are highly
268
similar to examples of other classes (Figure 10).
269
{Figure 9 goes here}
270
{Figure 10 goes here}
271
We do not discuss edge detection here; it is an obvious step in image anal-
272
ysis. The dataset of shape outlines are issued from an automated extraction
273
of the outlines using the Matlab image processing toolbox2 .
274
With a view to being invariant to flip transformation, the optimal match-
275
ing between two shapes results from Equation 5 where matching costs are
276
computed between the first shape and the second flipped or not. 2
Website: http://www.mathworks.com/products/image/
14
277
Shape representation is given by points equally sampled along the bound-
278
ary. Shape sampling at different scales with 32, 48, 64 and 192 points is
279
considered.
280
Classification rates are issued from the leaving one out method where
281
each shape in turn is left out of the training set and used as a query image.
282
Retrieval accuracy is measured by the so-called Bull’s eye test [Jeannin and
283
Bober, 1999]: for every image in the database, the top 40 most similar shapes
284
are retrieved. At most 20 of the 40 retrieved shapes are correct hits. The
285
retrieval accuracy is measured as the ratio of the number of correct hits of
286
all images to the highest possible number of hits which is 20 × 1400.
287
As mentioned in Section 4, the best shape matching in term of mean
288
square error is obtained for α = 0.1. The results of shape classification
289
carried out on this database don’t change significantly (±0.01%) by taking α
290
in the range [0.05, 0.2]. Note that the value of α intervenes in the process of
291
convergence of the shape matching and not in the expression of the distance
292
of Equation 8. In Figure 11 we report the variation of the correct shape
293
classification rate with respect to α.
294
{Figure 11 goes here}
295
{Table 2 goes here}
296
The proposed approach based on shape geodesics has been compared to
297
state-of-the-art schemes for the benchmark dataset as reported in Table 2.
298
The proposed approach outperforms reported schemes with a correct classifi-
299
cation rate of 98.86% corresponding to a gain in term of correct classification
300
rate between 0.3% and 17%. Regarding the bull’s eye, a score of 89.05% is
301
reached. This is greater by 1.35% than the best result reported previously.
15
302
The highest scores of previous works are those of methods based on shape
303
matching and/or with hierarchical analysis; this fact justifies the choice of
304
the bases of the proposed approach.
305
In order to analyse the results presented in Table 2, we will describe the
306
methods listed above with specifications about the similarities and differences
307
with the proposed method.
308
Zernike moments [Kim and Kim, 2000] are the most potent moments for
309
shape description among the region-based descriptors. They are orthogonal
310
moments which represent the shape information optimally. However, the
311
computation of zernike polynomials remains difficult and complex. The shape
312
representation is global in this approach, in the sense that each moment holds
313
information about all shape points and the shape comparison is not spatially
314
local.
315
The curvature scale space (CSS) is a boundary representation introduced
316
in [Mokhtarian and Mackworth, 1986]. It is invariant under the affine trans-
317
forms. This method is based on finding points of inflection on the curve at
318
various levels of detail. The CSS-representation uses multiple resolutions re-
319
sulting from an iterated smoothing of the boundary. Compared to other tools,
320
the CSS has a relatively low shape recognition accuracy and efficiency [Zhang
321
and Lu, 2003]. Although this shape representation is local and in a multiscale
322
analysis, a key difference between this method and the one described in this
323
paper is that comparison between shapes in the CSS representation is done
324
by considering points of zero curvature only, not all points.
325
Many techniques based on Fourier descriptors have been proposed for
326
shape recognition. The method proposed in [Arbter et al., 1990] transforms
16
327
a parametrized boundary description into the Fourier domain to get a set
328
of coefficients. These coefficients are normalized to eliminate dependencies
329
on the affine transformations and the starting point. Multiscale Fourier-
330
based approach has been proposed in [Kunttu et al., 2006] to improve the
331
shape classification rate. Besides, elliptic Fourier descriptors [Nixon and
332
Aguado, 2007] are of the robust boundary-based shape descriptors. Despite
333
the fact that some Fourier methods are multiscale and invariant under affine
334
transformations, they remain global as corresponding descriptors are derived
335
from a calculation including all points of the boundary or the entire object
336
in the case of 2D Fourier descriptors .
337
Visual parts are used for shape matching in [Latecki and Lakamper, 2000].
338
This approach is boundary-based and uses a local representation of shapes.
339
Comparison between shapes follows here a shape matching, but the matching
340
in this approach is a correspondence of convex/concave arcs of the studied
341
boundaries.
342
Shape context [Belongie et al., 2002] is developped as a local descriptor
343
for finding correspondences between point sets. A shape is represented by
344
a discrete set of points sampled from the contour of the object. Given a
345
set of points, the shape context captures the relative distribution (distance
346
and orientation) of points in the plane relative to each point in the shape.
347
Shape contexts have been used as attributes for a weighted bipartie match-
348
ing problem. In order to improve the classification of articulated shapes,
349
shape contexts have been modified [Ling and Jacobs, 2007] by considering
350
the geodesic distance of contour instead of the Euclidean distance. This
351
object-based approach requires the definition of landmarks in the objects for
17
352
the correspondence.
353
The inner-distance is defined as the length of the shortest distance be-
354
tween shape landmarks. It has been used to characterize shapes in [Ling and
355
Jacobs, 2007].
356
In [Daliri and Torre, 2008], a recent technique represents shapes using a
357
string of symbols and the shape recognition is done by operations on this
358
string of symbols. It is a local boundary-based approach.
359
The shape tree approach [Felzenszwalb and Schwartz, 2007] is based on a
360
hierarchical representation of the sampled points of the curves. A shape-tree
361
is constructed for each curve and the curves are matched by looking for a
362
mapping from points in a curve to points in the other one such that the
363
shape-tree of the curve is deformed as little as possible.
364
The fixed correspondance approach [Super, 2006] and the Racer algorithm
365
[Super, 2003] are boundary-based methods with local description of curves. A
366
boundary matching is carried out in these approaches using key points: points
367
of local maximum or minimum curvature. These methods analyse shapes in
368
a one scale. Chance probability functions [Super, 2006] are used for learning
369
the classification process in order to improve recognition performances.
370
Wavelet transforms have been widely used in image analysis as multi-
371
scale tools [Chuang and Kuo, 1996]. For shape representation, wavelets are
372
boundary-based local descriptors. However, they are not suitable for describ-
373
ing shapes because the corresponding descriptors are not rotation invariant
374
[Yang et al., 1998].
375
The proposed approach in this paper is invariant to geometric transfor-
376
mations (translation, rotation and scaling) and exploits local shape features.
18
377
In particular, high curvature points play a key role. This local setting makes
378
also simpler the use of landmarks when needed. Landmarks are simply con-
379
sidered as points for which φ(s) is known. These landmarks can be detected
380
automatically or set by experts depending on the application.
381
Another important property of the proposed metric, compared to others
382
proposed for shape matching, is that it is symmetric, in the sense that if we
383
register one shape on the other one, we will have the same matching if we
384
have done the symmetric registration; in fact, in both cases we look for the
385
path of minimal cost of deformation aligning the two shapes which ensures
386
a symmetric treatment of curves.
387
In Figure 12 we have reported images of some objects from different
388
classes. These shapes are highly similar, curvature differs in a small number
389
of data points only. Experimentally we notice that the use of the robust
390
criterion leads to consider these data points as outliers. For example, if we
391
focus on the nearest 20 neighbors of the samples of the class spoon, more than
392
50% are elements of the classes: watch, pencil, key and bottle. However, if
393
we use the similarity measure without the robust weights, 95% of the nearest
394
20 neighbors are of the same class, spoon. Using robust weights, the average
395
retreival accuracy is penalized due to the low accuracies obtained for these 6
396
classes, but it remains higher than without the use of the robust weights.
397
{Figure 12 goes here}
398
Future work will explore the combination of the proposed approach to
399
kernel-based statistical-learning. Recently, in [Yang et al., 2008] authors
400
propose to combine classical metrics to learning through graph transduction.
401
It has been shown that this approach yields significant improvements on
19
402
retrieval accuracies. For example, the retrieval rate using the IDSC [Ling
403
and Jacobs, 2007] is improved by 5.6% when combined to the learning graph
404
transduction. This research direction will be investigated in future work.
405
Acknowledgement
406
The authors would like to thank Jean Le Bihan for fruitful discussions.
407
References
408
Arbter, K., Snyder, W., Burkhardt, H., Hirzinger, G., 1990. Application
409
of affine-invariant fourier descriptors to recognition of 3-d objects. IEEE
410
Transactions on Pattern Analysis and Machine Intelligence 12 (7), 640–
411
647.
412
Attalla, E., Siy, P., 2005. Robust shape similarity retrieval based on con-
413
tour segmentation polygonal multiresolution and elastic matching. Pattern
414
Recognition 38 (12), 2229 – 2241.
415
Belongie, S., Malik, J., Puzicha, J., 2002. Shape matching and object recog-
416
nition using shape contexts. IEEE Transactions on Pattern Analysis and
417
Machine Intelligence 24 (4), 509–522.
418
Black, M., Rangarajan, A., 1996. On the unification of line processes, outlier
419
rejection and robust statistics with applications in early vision. Computer
420
Vision 19 (5), 57–92.
421
Chuang, G. C., Kuo, C., 1996. Wavelet descriptor of planar curves: theory
422
and applications. IEEE Transactions on Image Processing 5 (1), 56–70. 20
423
424
425
426
427
428
Costa, L. F., Cesar, R. M., 2001. Shape analysis and classification, theory and practice. CRC Press, Boca Raton, Florida. Daliri, M. R., Torre, V., 2008. Robust symbolic representation for shape recognition and retrieval. Pattern Recognition 41 (5), 1799–1815. Del Bimbo, A., Pala, P., 1999. Shape indexing by multiscale representation. Image and Vision Computing 17 (3), 245–261.
429
Diplaros, A., Milios, E., 2002. Matching and retrieval of distorted and oc-
430
cluded shapes using dynamic programming. IEEE Transactions on Pattern
431
Analysis and Machine Intelligence 24 (11), 1501–1516.
432
Direkoglu, C., Nixon, M., 2008. Shape classification using multiscale fourier-
433
based description in 2-d shape. In: ICSP’08: Proceedings of the 9th Inter-
434
national Conference on Signal Processing. Vol. 1. pp. 820–823.
435
Felzenszwalb, P. F., Schwartz, J. D., 2007. Hierarchical matching of de-
436
formable shapes. In: CVPR’07: Proceedings of the IEEE Conference on
437
Computer Vision and Pattern Recognition. pp. 1–8.
438
Jeannin, S., Bober, M., 1999. Description of Core Experiments for MPEG-7
439
Motion/Shape. MPEG7, ISO/IEC JTC1/SC29/WG11 N2690, document
440
N2690, Seoul.
441
442
443
Kim, W. Y., Kim, Y. S., 2000. A region-based shape descriptor using zernike moments. Signal Processing: Image Communication 16, 95–102. Kunttu, I., Lepist¨o, L., Rauhamaa, J., Visa, A., 2006. Multiscale fourier
21
444
descriptors for defect image retrieval. Pattern Recognitions Letters 27 (2),
445
123–132.
446
447
Latecki, L. J., 2002. Application of planar shape comparison to object retrieval in image databases. Pattern Recognition 35 (1), 15–29.
448
Latecki, L. J., Lakamper, R., 2000. Shape similarity measure based on cor-
449
respondence of visual parts. IEEE Transactions on Pattern Analysis and
450
Machine Intelligence 22 (10), 1185–1190.
451
Lin, I. J., Kung, S. Y., 1997. Coding and comparison of dag’s as a novel neu-
452
ral structure with applications to on-line handwriting recognition. IEEE
453
Transactions on Signal Processing 45 (11), 2701–2708.
454
Ling, H., Jacobs, D. W., 2007. Shape classification using the inner-distance.
455
IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2),
456
286–299.
457
McNeill, G., Vijayakumar, S., 2006. Hierarchical procrustes matching for
458
shape retrieval. In: CVPR’06: Proceedings of the IEEE Conference on
459
Computer Vision and Pattern Recognition. Vol. 1. pp. 885–894.
460
Mokhtarian, F., Abbasi, S., Kittler, J., 1996. Efficient and robust retrieval
461
by shape content through curvature scale space. In: Proceedings of In-
462
ternational Workshop on Image DataBases and Multimedia Search. pp.
463
35–42.
464
Mokhtarian, F., Bober, M., 2003. Curvature scale space representation: the-
22
465
ory, applications, and MPEG-7 standardization. Kluwer Academic Pub-
466
lishers, Norwell, MA, USA.
467
Mokhtarian, F., Mackworth, A. K., 1986. Scale-based description and recog-
468
nition of planar curves and two-dimensional shapes. IEEE Transactions on
469
Pattern Analysis and Machine Intelligence 8 (1), 34–43.
470
471
472
473
Nixon, M. S., Aguado, A., 2007. Feature extraction and image processing. Academic Press. Sebastian, T. B., Kimia, B. B., 2005. Curves vs. skeletons in object recognition. Signal Processing 85 (2), 247–263.
474
Sebastian, T. B., Klein, P. N., Kimia, B. B., 2003. On aligning curves. IEEE
475
Transactions on Pattern Analysis and Machine Intelligence 25 (1), 116–
476
125.
477
Super, B., 2003. Improving object recognition accuracy and speed through
478
nonuniform sampling. In: SPIE’03: Proceedings of the Society of Photo-
479
Optical Instrumentation Engineers Conference. Vol. 5267. pp. 228–239.
480
Super, B. J., 2006. Retrieval from shape databases using chance probabil-
481
ity functions and fixed correspondence. Pattern Recognition and Artificial
482
Intelligence 20 (8), 1117–1138.
483
Trouv´e, A., Younes, L., 2000. Diffeomorphic matching problems in one di-
484
mension: Designing and minimizing matching functionals. In: ECCV ’00:
485
Proceedings of the 6th European Conference on Computer Vision-Part I.
486
Springer-Verlag, London, UK, pp. 573–587. 23
487
Veltkamp, R. C., 2001. Shape matching: similarity measures and algorithms.
488
In: SMI 2001: International Conference on Shape Modeling and Applica-
489
tions. pp. 188–197.
490
491
Veltkamp, R. C., Hagedoorn, M., 2001. State of the art in shape matching, 87–119.
492
Yang, H. S., Lee, S. U., Lee, K. M., 1998. Recognition of 2d object con-
493
tours using starting-point-independent wavelet coefficient matching. Visual
494
Communication and Image Representation 9 (2), 171–181.
495
Yang, X., Bai, X., Latecki, L. J., Tu, Z., 2008. Improving shape retrieval by
496
learning graph transduction. In: ECCV’08: Proceedings of the European
497
Conference on Computer Vision. Vol. 4. pp. 788–801.
498
499
Younes, L., 2000. Optimal matching between shapes via elastic deformations. Image and Vision Computing 17 (5), 381–389.
500
Zhang, D., Lu, G. A., 2003. Comparative study of curvature scale space and
501
fourier descriptors for shape-based image retrieval. Visual Communication
502
and Image Representation 14 (1), 41–60.
503
504
Zhang, J., Zhang, X., Krim, H., Walter, G., 2003. Object representation and recognition in shape spaces. Pattern Recognition 36 (5), 1143–1154.
24
(a)
(b) Interpolated curves: geodesic path from 1(a)
(c) Final
Starting
to 1(c) in the shape space.
curve
curve
Figure 1: Deformation path from fig. 1(a) to fig. 1(c).
(a) The mapping function for two
(b) The visualisation of the
˜ depicted shape outlines Γ and Γ,
mapping function φ as a 2D
in 2(b), as a monotonic function
outline matching
which matchs a curvilinear abscissa ˜ to a curvilinbetween 0 and 1 on Γ ear abscissa on Γ Figure 2: Example of contour matching.
25
(a) Reference
(b) Applied transformation
(c) Curve to
curve
be aligned
Figure 3: Test on synthetic shapes. We have applied a known transformation (3(b)) on the shape of 3(a) to get the shape 3(c).
Table 1: Optima M SEs obtained by the robust and the non robust algorithms with the gain due to tho robust solution for initializations of φ at points which are far from the correct solution from different angles. This experiment is carried out on synthetic shapes given in Figure 3. Angle
Gain=
M SEN onRobust −M SERobust M SEN onRobust
M SEN onRobust
M SERobust
35◦
0.293
0.087
70.30%
45◦
8.66
0.089
98.97%
90◦
0.296
0.085
71.28%
135◦
1.78
0.086
95.17%
26
× 100
(a) MSE rad2 versus α values
(b) Gain due to the robust algorithm
(c)
(d)
Aligned
curve the
with
Aligned
curve
robust
the
with non
algorithm for
robust
al-
α = 0.1
gorithm
for
α = 0.1
Figure 4: Results of shape matching on synthetic contours depicted in Figure 3 using the dynamic programming for different values of α ∈ [0, 1].
27
Figure 5: Results of shape matching on synthetic contours depicted in Figure 3 using the iterative scheme for different values of α ∈]0, 1]. The iterative algorithm leads to the same optimum as the dynamic programming (Figure 4(a)).
Figure 6: Test on synthetic shapes. Occluded shape obtained from the shape 3(c).
28
(a) Transformation found with the robust
(b) Transformation found with the non ro-
algorithm for α = 0.1
bust algorithm for α = 0.1
(c) MSE versus α values
(d) Gain due to the robust algorithm
Figure 7: Results of shape matching using the iterative scheme for different values of α ∈]0, 1]. We register here the occluded shape of Figure 6 with respect to the reference 3(a).
29
(a) curve the
(b)
Aligned
Aligned
with
curve with the
robust
non robust al-
algorithm
gorithm
Figure 8: Results of shape matching. Aligned shapes by the robust and non robust algorithms; the reference shape is given in Figure 3(a) and the shape to be aligned in Figure 6.
(a) Dogs
(b) Apples
(c) Beetles
(d) Elephants
(e) Flies
(f) Hats
(g) Horses
(h) Spoons
Figure 9: Examples of shapes that are visually dissimilar from other samples of their own class.
30
(a) Apple/ oc-
(b) Sea snake/
topus
lizzard
(c) Deer/ horse
(d) Hat/ device3
Figure 10: Examples of pair of shapes issued from different classes but highly similar.
Figure 11: The correct classification rate (in %) on the MPEG-7 shape database versus the values of α (α is the coefficient that controls the regularity of the solution).
31
(a)
(b)
(c) Pen-
(d) Lm-
Watch
Spoon
cil
fish
(e) Key
(f) Bottle
Figure 12: Examples of shapes from different classes with high similar curvature.
32
Table 2: Recognition accuracy measured as nearest neighbor classification rate and retrieval accuracy measured by the bull’s eye test on the MPEG-7 shape database. Method
Retrieval accuracy
Classification rate
Proposed scheme
89.05%
98.86%
String of symbols [Daliri and Torre, 2008]
85.92%
98.57%
Zernike moments
70.22%
90%
Multiscale FD 2D [Direkoglu and Nixon, 2008]
NA
95.5%
Elliptic FD [Direkoglu and Nixon, 2008; Nixon and Aguado, 2007]
NA
82%
Shape tree [Felzenszwalb and Schwartz, 2007]
87.7%
NA
Inner-distance shape context (IDSC) [Ling and Jacobs, 2007]
85.40%
NA
84%
97.4%
83.04%
97.2%
[Direkoglu and Nixon, 2008; Kim and Kim, 2000]
Fixed correspondence + aggregated-pose chance probability functions [Super, 2006] Fixed correspondence + Chance probability functions [Super, 2006] Fixed correspondence [Super, 2006]
80.78%
97%
Hierarchical procruste matching [McNeill and Vijayakumar, 2006]
86.35%
95.71%
Multilayer eigenvectors [Super, 2006]
70.33%
NA
Normalized squared distance [Super, 2003]
79.36%
96.9%
Racer [Super, 2003]
79.09%
96.8%
Optimized CSS [Mokhtarian and Bober, 2003]
81.12%
NA
Curve edit distance [Sebastian et al., 2003]
78.17%
NA
Shape context [Belongie et al., 2002]
76.51%
NA
Parts correspondence [Latecki, 2002; Latecki and Lakamper, 2000]
76.45%
NA
Visual parts [Latecki and Lakamper, 2000]
76.45%
NA
60%
NA
Curvature Scale Space [Mokhtarian et al., 1996]
75.44%
NA
Wavelet [Chuang and Kuo, 1996]
67.76%
NA
Skeleton DAG [Lin and Kung, 1997]
33