Pattern Analysis & Applications (1999)2:215–227 1999 Springer-Verlag London Limited
Definition and Validation of a Distance Measure Between Structural Primitives P. Foggia1, C. Sansone1, F. Tortorella2 and M. Vento1 1
Dipartimento di Informatica e Sistemistica, Universita` degli Studi di Napoli ‘Federico II’, Napoli, Italy; Dipartimento di Automazione, Elettromagnetismo, Ingegneria dell’Informazione e Matematica Industriale, Universita` degli Studi di Cassino, Cassino, Italy
2
Abstract: This paper proposes a structural description scheme using second order primitives, in particular circular arcs. In this framework, a distance measure between pairs of circular arcs and relations among them is introduced, and its main properties are discussed. This measure accomplishes some perceptive criteria for increasing its efficiency: it proved applicable in a wide class of application domains characterised by high variability in the shape of the visual patterns, where a structural approach is particularly useful. The description method together with the distance have been experimentally validated in the context of the recognition of handwritten digits coming from a standard character database. Keywords: Character recognition; Circular arcs; Graph matching; Nearest neighbour classifier; Optical character recognition; Shape distance; Structural descriptions
1. INTRODUCTION In most of the methods employed in a system for recognising real images, structural descriptions are used for representing the objects contained in a scene. Structural descriptions, formally defined by Shapiro and Haralick [1], assume that a complex pattern can be decomposed into simpler subpatterns, possibly in a recursive way, and then characterised (i.e. described) in terms of simple parts which are called primitives, and of their relations (see Arcelli et al [2]). This generally applies to the analysis and recognition of 2D objects, although similar approaches are employed in the recovery and recognition of 3D objects [3]. In real applications, deformations on samples due to noise or distortion may cause an input sample to be quite different from the other samples of the same class. For this reason, the recognition process cannot be carried out by conventional structural matching algorithms, even though the input pattern is deformed very slightly and can be easily recognised by a human as coming from one of the known classes. On the other hand, if a weaker matching criterion (i.e. a Received: 20 July 1998 Received in revised form: 24 November 1998 Accepted: 4 January 1999
criterion involving fewer constraints) is adopted, samples from different classes might be considered identical by the recognition system. In recent years, different approaches have been proposed to solve this problem. For example, in applications using Attributed Relational Graphs (ARG) for describing the visual patterns [4–7], the structural matching is generalised by defining criteria allowing recognition in presence of deformations. Most of them [1,8–10] relax the constraints on the matching by introducing a distance measure between two ARGs which allows a discrimination finer than that provided by the bare matching; in any case, the distance is evaluated under the condition that a mapping preserving some structural properties can be established between the ARGs. This hypothesis seems to be reasonable in all the application domains characterised by deformations on the structure of the graphs which are not very high. However, when the objects to be recognised have highly variable shapes, the descriptions could be so dissimilar that it is impossible to compare the structures of the obtained graphs, thus making these methods unusable. Other methods further relax the matching by determining a distance measure between structural descriptions supported by an ARG without relying on particular properties of the structure of the graphs. In particular, in Sanfeliu and Fu
216
[11] and Eshera and Fu [12], the distance measure is based on the evaluation of the minimum number of transformations to apply to one graph in order to obtain the other one. All the approaches mentioned, however, focus their attention only on the reduction of the computational complexity of the distance calculation. As an example, the bare definition of distance between structural descriptions given by Eshera and Fu [12] involves the computation of the sum of the distance between pairs of primitives and between pairs of relations for which a correspondence is established by a mapping between the two descriptions. Every possible mapping (or at least every mapping satisfying some suitably chosen criteria) must be considered in order to find the optimal one, i.e. the one yielding the minimum value for the distance. In the most general case, this is an NP complete process, and this implies evaluation times which are not acceptable in many applications. These methods do not address the problem of defining a distance between two given primitives and between two relations. In fact, all the cited papers assume that distances between primitives and relations have values fixed a priori in the considered application domain (no details are provided about how these parameters have been obtained), and use them to define a distance between whole objects. The definition of a distance for primitives and their relations is not, in general, an easy task; its difficulty, of course, may vary depending on the primitives used. A paper that deals with this problem is that by Arkin et al [13], in which a distance measure between polygonal curves is proposed. Even though these curves are attractive (because of their simplicity) and have already been used in several real applications, there are many classes of objects that cannot be adequately described by a polygonal since the large number of points required would make the successive processing phases very hard from a computational point of view. In these cases, second order curves are often used, but at present, there is no distance defined in the literature for this kind of description. In this paper, we define a distance measure for a structural description scheme using second order primitives, in particular circular arcs. These primitives are general enough to be used successfully in several applications (e.g. handwritten character recognition, symbol recognition, technical drawing interpretation, and so on), providing faithful and synthetic descriptions [14–16]. With this aim in mind, two distance functions are introduced which, respectively, measure the dissimilarity between two arcs and two relations; on the basis of these functions, a distance measure between structural descriptions is defined. The effectiveness of the distance has been experimentally evaluated using the description scheme for representing handwritten characters that are recognised by means of a nearest neighbour classifier.
2. THE PROPOSED METHOD In defining a distance D(x,y) between structural descriptions, an obvious aim is to simulate, in some way, the human capability of perceiving the similarity between two different
P. Foggia et al.
objects so as to provide small (large) values when comparing alike (different) patterns. However, this property is very hard to achieve, since it involves a thorough knowledge of the perception processes performed by a human when comparing two shapes. With a more realistic approach, our aim is to propose a scheme in which the distance between structural descriptions is continuous with respect to perception: in other words, when the shape of one object is changed, the distance variation should be consistent with the perceived amount of modification. To satisfy the above requirements, the distance D(x,y) should be formulated on the basis of a careful definition of both the attributes used to characterise the primitives and the relations constituting the structural description, as well as the distance functions which, by means of the chosen attributes, measure the dissimilarity between primitives and between relations. More precisely, the attributes should comply with the shape features a human perceives as significant and be able to follow, as far as possible in a continuous way, the variations a given pattern could exhibit. A similar approach should also be adopted for the design of the distance functions: in fact, in their definition, we have to take into account the characteristics a human seems to use when judging the similarity between two patterns. In the next few paragraphs we will address these topics with reference to a structural description scheme using circular arcs as primitives. 2.1. The Description Attributes
From a geometric point of view, a circular arc can be completely identified in the (x,y) plane through five parameters (i.e. the coordinates of the centre, the radius, the starting and the ending angles). However, for our aims, it is not important to reconstruct the original shape exactly, but to select the shape attributes which, from a perceptual point of view, seem to be the most significant for distinguishing a given shape from similar ones. Furthermore, the set of chosen attributes should exhibit a certain degree of orthogonality, i.e. the characteristics described by one attribute should not affect other attributes; this is essential to enhance as much as possible the descriptive power of the attributes. Since we are not interested in the absolute position of the arcs and, as will be seen later, their relative position is implicitly described by the relation attributes, it follows that the size, the form (whether the arc is elongated or bent), and the orientation are sufficient to characterize an arc, and fulfil our goal of orthogonality. Length could be the most immediate parameter for defining the size of an arc, but often it does not reflect the real perception a human has of the actual size of an arc (see Fig. 1(a)). A more appropriate definition could be obtained by assuming the length of the longest side of the rectangle enclosing the arc as the size parameter (see Fig. 1(b)). The second parameter considered takes into account the fact that an arc can assume different configurations ranging from the straight segment to the closed circle. At first glance, the curvature radius of the arc seems to be a suitable
Definition and Validation of a Distance Measure
217
Fig. 1. The length is not a good parameter for describing the size of an arc: the arcs in (a) have the same length, but arc 1 looks much greater than arc 2. The length of the longest side of the enclosing rectangle (b) can be considered representative of the perceived size of the arc. In fact, although with different lengths, the arcs in (b) seem similar with respect to the size.
parameter for this aim; however, it entails certain problems, such as dependence on the size parameter and ambiguities such as those in Fig. 2, where two arcs (2 and 3), considered very different by a human, have the same curvature radius, while the opposite happens for the pair 1 and 2 that exhibit a very different curvature radius even though their shapes seem fairly similar. For this reason, the span of the arc is the parameter adopted to define the form, i.e. the measure of the angle subtending the arc itself: it allows us to estimate the similarity between arcs, without being affected by the different sizes. The span for a straight segment is assumed to be 0°. The last parameter describes the orientation of the arc by means of the angle between a reference axis and the vector orthogonal to the chord of the arc which has the same direction as the concavity (see Fig. 3). A zero orientation is assigned to a closed circle (i.e. an arc having a span equal to 360°). From now on, we will denote with size(p), span(p) and orient(p) the three parameters of a given arc p. Let us now examine the attributes to be adopted for describing the relations existing among the primitives. These attributes should allow us to highlight the features that are
Fig. 3. The orientation of an arc.
actually distinctive for the given shape, and are strongly dependent upon the way in which primitives are connected to each other. The position of the contact point, measured along each arc, was adopted as the attribute characterising the relation between two circular arcs. In most cases, this attribute allows us to discriminate different objects having the same primitives; an example of such a situation is shown in Fig. 4. Denoting with p and q the two connected circular arcs,
Fig. 2. (a) The inadequacy of the radius of curvature as form parameter: arc 2 appears to be more similar to arc 1 than to arc 3, although 2 and 3 have the same radius. (b) Definition of the span which we have chosen as the form parameter.
218
P. Foggia et al.
Fig. 4. Relevance of the contact point between primitives: an example of two different objects made of the same primitives which can be distinguished only by considering the location of the contact point.
the relation R between them may be expressed by the 4-tuple R(p,q) = (RXp, RYp, RXq, RYq) where RXi and RYi represent the coordinates of the contact point on the primitive i; these coordinates are evaluated with respect to a reference system whose axes are parallel to x and y axes and origin is located in the centre of the bounding box of the primitive i (i.e. the smallest rectangle containing the primitive i and having sides parallel to x and y axes). Moreover, if we choose as reference unit on each axis the length of the projection of the primitive on such axis, then RXi and RYi assume values in the range [⫺0.5, 0.5] (see Fig. 5). 2.2. Distance Between Primitives
An effective measure of the similarity between two arcs p and p⬘ can be obtained by evaluating the way in which we should warp p to make it identical to p⬘. For the sake of simplicity, we only take into account the transformations composed of subsequent variations, each involving only one of the primitive attributes. Thus, the distance Dp(p,p⬘) between the two arcs is given by the sum of three contributions: Dp (p,p⬘) = wd · ⌬size ⫹ ws · ⌬span ⫹ wo · ⌬orient
(1)
where ⌬ indicates the variation of the parameter and wd, ws and wo are, respectively, the costs of a unitary variation of the size, span and orientation. Their values are fixed with reference to the requirements of the application at hand. However, a bare computation of the expression (1) may not lead to an effective distance measure. In fact, while the
Fig. 5. Definition of relation attributes: the relation between the arcs 1 and 2 is described by the 4-tuple (RX1, RY1, RX2, RY2). RX1 and RY1 (RX2 and RY2) represent the coordinates of the contact point in the reference system relative to arc 1 (arc 2).
variation of the size can be simply evaluated as a difference between the final and the initial value, the same is not true for the span and the orientation, because of discontinuities affecting these parameters in the representation of an arc. To better address this problem, let us consider a representation for the three attributes in which the size, the span and the orientation are, respectively, mapped on the radius, latitude and longitude of a spherical coordinate system; the scale is chosen in such a way that arcs whose span equals 0° (i.e. straight segments) are placed on the equator, while a closed circle is mapped on the pole (see Fig. 6). The advantage of such a representation is that it preserves the continuity with respect to perception notwithstanding singularities of the attribute values. For example, the point on the sphere representing the arc whose span = 45° and orientation = 5° is close to the point for span = 45° and orientation = 355°, even though the numeric values of the span attribute are quite different; another example is that all the arcs whose span = 360° are represented by a single point on the sphere, according to the fact that for closed circles the orientation attribute becomes undefined. When computing the distance between two arcs (say A and B) in the adopted spherical coordinate system, the variation in size involves moving from the sphere with radius size(A) to the sphere with radius size(B) and concentric with the first one. The difference 兩size(B) ⫺ size(A)兩 gives a measure for this first kind of transformation. Let us now suppose that the arcs A and B whose distance is to be evaluated have the same radius (i.e. their representative points belong to the same sphere): the warping of A into B can be described by means of a path (or route) connecting the two points on the sphere. A first possible route is obtained by modifying first of all the orientation and then the span of the first arc to match the parameters of the second arc (see Fig. 7(a)): the corresponding route runs through the hemisphere on which the starting point is placed (see Fig. 7(b)) and, for this reason, it will be called the hemispherical route from A to B. When evaluating variations in the orientation, we always choose, from among the possible difference values, the angle with the smallest absolute value between the orientations of the two arcs. In this way, if orient(B) = 355° and orient(A) = 5°, the difference orient(B)−orient(A) is not 350°, but ⫺10°.
Fig. 6. The representation of an arc A in the spherical coordinate system adopted.
Definition and Validation of a Distance Measure
219
In fact, although the arcs have equal span and are very similar from a perceptual point of view, the difference in orientation gives rise to an over-estimation of the dissimilarity between them. To find a solution to this problem, let us consider the intermediate steps of the transformation depicted in Fig. 8(a). The span is the only modified attribute which assumes decreasing values up to the point where the arc reduces to a straight segment and the span approaches 0°. To keep the continuity of our representation, we may assign a negative span value to the next step of the transformation (Fig. 8(b)) by introducing the following convention: an arc with a negative span is equivalent to the arc whose span is positive and has the same absolute value, and whose orientation is rotated by 180°. It is worth noting that as a consequence of the equivalence relation defined above, every arc is now represented by two antipodal points (see Fig. 9), thus adding a southern
Fig. 7. The distance between two arcs A and B evaluated by following the hemispherical route: in (a) the variations to transform the arc A in B are shown. In (b) the same sequence is identified through a route on the sphere. In both cases the arc H, which represents an intermediate result of the transformations, is pointed out.
The contribution to the whole distance given by the variations in span and in orientation is: DHEM (A,B) = ws · 兩span(B) ⫺ span(A)兩 ⫹ wo · 兩orient(B) ⫺ orient(A)兩
(2)
It is worth noting that the hemispherical route does not give adequate results when comparing two circular arcs with small span and opposite orientation like those in Fig. 8(a).
Fig. 9. Extending the representation: the arc with parameters (1, 30°, 70°) has two representative points on the sphere: the point A with coordinates (1, 30°, 70°) on the northern hemisphere and the point A⬘ (antipodal to A) with coordinates (1, ⫺30°, 250°) on the southern hemisphere.
Fig. 8. (a) Example of the inadequacy of the hemispherical route. Note that the orientation of the arc whose span = 0° can be indifferently assumed equal to 0° or 180°. (b) The introduction of negative span allows a continuous variation of the parameters of the arc.
220
hemisphere to the coordinate system. Such extension is particularly useful when facing situations like that in Fig. 10. In fact, the second representative point of the first arc (A⬘, span = ⫺20°, orientation = 180°) is closer to the first representative point of the second arc (B, span = ⫹25°, orientation = 200°), thus providing an estimate which is more consistent with the perceived difference between the arcs. For example, if the coefficients ws and wo in Eq. (2) are attributed the values 1/360 and 1/180, respectively (so as to normalise to 1 the maximum value of the corresponding terms), DHEM(A,B) equals 0.90 while DHEM(A⬘,B) equals 0.23. The route corresponding to this kind of transformation is called equatorial, since the path joining the considered points crosses the equator. The contribution DEQU supplied by the equatorial route
P. Foggia et al.
to the whole distance is given by DHEM (A⬘,B); in other words, for DEQU the same expression of Eq. (2) applies, provided that the representative point on the southern hemisphere (A⬘ in Fig. 10) is used for the first arc. Since span(A⬘) = ⫺span(A) and orient(A⬘) = orient (A)⫹180°, the equation becomes: DEQU (A,B) = ws · 兩span(B) ⫹ span(A兩) ⫹ wo · 兩orient(B) ⫺ (orient(A) ⫹ 180°)兩
(3)
Another problematic case for the hemispherical route occurs when the spans of the two arcs are very close to 360°: the difference in orientation becomes negligible since the two arcs are perceived as two broken circumferences and thus considered very similar. The relative distance is measured along a polar route illustrated in Fig. 11.
Fig. 10. The distance between two arcs evaluated by following the equatorial route. In (a) the transformations applied on the arc A to match the arc B are shown, while in (b) the related route on the reference sphere is illustrated.
Definition and Validation of a Distance Measure
221
Fig. 11. The distance between two arcs, evaluated by following the polar route.
According to this route, the distance between the two arcs is given by the expression DPOL(A,B) =
再
ws (720° − span(A) − span(B)) if condition (a) holds +⬁ otherwise
(a): span(A) ⬎ TPOL and span(B) ⬎ TPOL
(4)
where TPOL is a fixed threshold. At this point, we can define the distance between two arcs p and p⬘ by means of the following expression: Dp (A,B) = wd · 兩size(B) ⫺ size(A)兩 ⫹ min{DHEM (A,B), DEQU (A,B), DPOL (A,B)}
(5)
size(p) + size (p′) size(p) + size(p′) + size(q) + size(q′) size(q) + size(q′) w2 = size(p) + size(p′) + size(q) + size(q′)
w1 =
(7)
As regards D1 and D2, their expressions are
2.3. Distance Between Relations
Let us now consider the definition of a distance function between relations, say DR. According to the considerations introduced at the beginning of Section 2, the following properties should be satisfied: DR should assume a value equal to zero if the contact points between the primitives are located in the same position and should exhibit a continuous behaviour as the positions of the contact points vary. Let us call p and q (p⬘ and q⬘) two connected primitives in the first (second) object; the distance DR between the relations r = R(p,q) = (RXp, RYp,RXq,RYq) and r⬘ = R(p⬘,q⬘) = (RXp′,RYp′,RXq′,RYq′) can be expressed as DR(r,r′) = w1 · D1 ⫹ w2 · D2
variation of the contact point relative to the primitives p and p⬘ (q and q⬘). The coefficients w1 and w2 are computed in such a way as to assign a greater weight to the variation of the contact point relative to the primitive having a larger size. This choice stems from the consideration that a variation of the contact point along the greater arc is generally perceived as a more significant change of the shape (see Fig. 12). The coefficients w1 and w2 can be expressed as follows:
(6)
where w1,w2 ⱖ 0 and w1 + w2 = 1; D1 (D2) represents the
Fig. 12. The perceived amount of variation of the contact point is greater for the large circle than for the small segment, even though the difference of the relation attributes is greater for the latter.
D1 = w1x · 兩RXp − RXp′兩 + w1y · 兩RYp − RYp′兩 D2 = w2x · 兩RXq − RXq′兩 + w2y · 兩RYq − RYq′兩
(8)
where wix, wiy ⱖ 0 and wix ⫹ wiy = 1; they are defined in such a way that the component corresponding to the larger bounding box dimension is given a greater weight. It should be noted that the latter two definitions are not applicable if one of the two primitives is a straight segment and is normal to one of the axes: in this case, a zero value is conventionally assigned to the coordinate of the contact point along that axis. Thus, the situation illustrated in Fig. 13 may occur: it is easy to verify that for the two patterns shown in the figure, the value of D1 should be zero, since the contact point between the primitives is the same (middle point of primitive 1 and upper endpoint of primitive 2). However, if we apply the previous definitions we find that 兩RX2 ⫺ RX2′兩 = 0.5, since RX2 = 0 and RX2′ = 0.5; this implies that D1 is different from zero.
Fig. 13. An example of the case in which the definition of D1 and D2 is inadequate.
222
P. Foggia et al.
To correctly handle this case we have chosen to neglect the variation of the contact point with respect to an axis if the bounding box dimension along that axis is small in comparison to the dimension along the other one. A simple threshold criterion would not be satisfactory as it would introduce a discontinuity in our distance. Instead, we scale the differences in Eq. (8) by a factor which continuously tends to 0 as one of the involved arcs approaches a horizontal or vertical straight line segment. From a formal point of view, if k and k⬘ are primitives belonging to the structural descriptions to be compared, and Bx(k), By(k), Bx(k⬘) and By(k⬘) are the dimensions of the corresponding bounding boxes, we can perform this scaling by introducing the following quantities:
再 再
冎 冎
Bx(k) Bx(k′) , ⌬RX(k,k′) = 兩RXk − RXk′兩 · min 1, By(k) By(k′)
(9′)
By(k) By(k′) , ⌬RY(k,k′) = 兩RYk − RYk′兩 · min 1, Bx(k) Bx(k′)
(9″)
Thus, we can now define D1 and D2 in the following way: D1 = w1x · ⌬RX(p,p′) ⫹ w1y · ⌬RY(p,p′)
(10′)
D2 = w2x · ⌬RX(q,q′) ⫹ w2y · ⌬RY(q,q′)
(10″)
2.4. Proposed Distance Measure
To employ the distance functions described so far, let us consider the problem of matching the structural descriptions of two objects, S and S⬘, by means of one of the algorithms cited in the introduction which can profitably use a distance between primitives and relations for reducing the complexity of the matching process. Let us suppose that during the process the algorithm has obtained a mapping which establishes a correspondence between the nodes of S and the nodes of S⬘, and consequently the relations of S with the relations of S⬘. For the moment let us assume that the mapping is bijective, i.e. each node (relation) of S can be mapped to S⬘, and vice versa. Given the mapping, the primitives and the relations of S⬘ can be relabelled so that the primitive pi and the relation rj of S correspond, respectively, to p′i and r′j in S⬘. Starting from the definitions given in the previous paragraph, the distance between S and S⬘ under the mapping can be defined as D (S,S′) = wP
冘
DP (pi,pi′) ⫹ wR
i
冘
DR (rj,rj′)
(11)
j
Among all the possible mappings the algorithm can choose the one which minimises Eq. (11), and define the distance between S and S′ as D(S,S′) = min D (S,S′)
(12)
In the above equations, wP and wR are two coefficients weighting the two contributions and whose values are fixed according to the purposes of the particular application at hand.
Expression (11) applies if it is possible to find a bijective mapping involving all the components of S and S⬘. Since this is not always true, we must also consider the primitives and the relations for which a correspondence has not been established: this task is not very simple, because the contribution given by these components should be evaluated by taking into account the applicative context we refer to. A more general formulation, which takes into account the lacking components, is given by the following expression: D(S,S′) = wP
冘 冋冘 冋冘
DP(pi,pi′) + wR
i
LP(ph) +
+ wLP
h
+ wLR
m
冘
冘 冘
DR (rj,rj′)
j
册 册
LP(pk′)
k
LR(rm) +
(11′)
LR(rn′)
n
where ph (pk′) and rm (rn′) are the primitives and the relations of S (S⬘) that have no corresponding component in the other object under the mapping ; wLP and wLR are determined in a similar way to wP and wR. LP (LR) evaluates the contribution provided by a lacking primitive (relation) as a function of its attributes. Obviously, their expressions are dependent on the specific application. In this case, the algorithm can choose which is the best matching (and thus which primitives and relations have not to be matched) by selecting the minimum value for expression (11⬘).
3. EXPERIMENTAL RESULTS Experimental results refer to the problem of automatic recognition of unconstrained handwritten digits. This application domain is particularly interesting both for the high shape variability among samples belonging to a class and for the presence of samples belonging to different classes that exhibit a high shape similarity. For our experiments we used the ETL1 database [17] which contains about 14000 handwritten digits. Each sample image is digitised with a resolution of 300 dpi with 16 grey levels. Some preprocessing steps are accomplished to obtain the character structural descriptions. The character images are preliminarily submitted to a filtering and a binarisation process leading to the corresponding bit-maps whose maximum size is equal to 63 ⫻ 64 (see Fig. 14). Next, character bit maps are thinned and the skeletons obtained are approximated by sets of polygonal lines. These are further decomposed in circular arcs by means of a method described by Cordella et al [18]. The final descriptions in terms of Attributed Relational Graphs are extracted according to a method whose details can be found in Cordella et al [19,20]. The generic node k of the ARG is described by the triple (size(k), span(k), orient(k)) that characterises the circular arc to which the node refers. Similarly, the generic edge of the ARG connecting nodes n and m, is described by the 4-tuple R(n,m) = (RXn,RYn,RXm,RYm) that characterises the relation between the circular arcs related to these nodes. In Fig. 15 the main phases of the description process are shown.
Definition and Validation of a Distance Measure
Fig. 14. Some examples extracted from the adopted test set.
223
For our experiments we used the values of the weights shown in Table 1; such values where chosen heuristically. Table 1 also contains the expressions adopted for the functions LP and LR. To evaluate the distance between the structural descriptions of two characters S and S′, we employed a simple matching based on a State Space Representation (SSR) [21], which allows us to find the mapping yielding the minimum distance as required by Eq. (12). Each state represents a partial mapping of a subset of the primitives in S onto the primitives in S′. The successors of a state s are generated by extending the mapping to a new pair of nodes. The search is performed by means of the A* algorithm (described by Nilsson [21]), which employs a heuristic function that estimates the cost needed to reach a solution from the state s so as to
Fig. 15. (a) The bit map of a handwritten digit, (b) the approximation in terms of circular arcs, (c) the attributes of each arc, (d) the attributes of the contact point.
224
P. Foggia et al.
Table 1. The values of the distance parameters adopted for the experiments Parameter Equation Parameter value name wd ws wo TPOL w1x
(1),(5) (1)–(4) (1)–(3) (4) (8)
w1Y
(8)
w2x and w2y (8) wP
(11), (11′) (11), (11′) (11′) (11′) (11′) (11′)
wR wLP wLR LP(p) LR(r)
we used certain parameters characterising the samples with respect to the distance. If we define D*(Ci,Cj) as the mean value of the distance between samples of class Ci and their nearest neighbours belonging to class Cj, the following interesting evaluation parameters are found: DI (Ci) = D* (Ci,Ci) 1 D* (Ci,Cj) DB(Ci) = NC ⫺ 1 j
1 1/360 1/180 355°
冘
BX(p) ⫹ BX(p′) BX(p) ⫹ BX(p′) ⫹ BY(p) ⫹ BY(p′) BY(p) ⫹ BY(p′) BX(p) ⫹ BX(p′) ⫹ BY(p) ⫹ BY(p′) Like w1x and wly, using q and q′ instead of p and p′. 1 1 1 0.5 0.3 ⫹ size (p) 1
expand the most promising states only. The function we used neglects the cost of matching the relations, and computes a lower bound for the cost of matching the primitive unmatched in s by relaxing the constraint that the mapping must be injective. Although there are other algorithms which could also be used (e.g. the IDA* algorithm [22], the SMA* [23], etc.), we chose the A* algorithm because it is well known in literature and proved adequate for the application at hand. In any case, it is worth noting that the effectiveness of the approach does not depend upon the efficiency of the matching algorithm, whose choice should be based on the particular application. To experimentally validate the proposed distance measure,
j苷i
where NC is the number of classes DN (Ci) = min D* (Ci,Cj) j j苷i
DI gives an indication of the variability inside a class, while DB allows us to measure the possibility of confusion with all the other classes and DN with the nearest one. The above defined parameters were evaluated with respect to the whole database. In Table 2, the values of D* for each pair of classes are shown. It can be seen that the classes are acceptably separated from each other, and a certain degree of overlapping is present only between classes that are morphologically very similar.
Fig. 16. Histograms of the quantities DI, DB, DN relative to each class.
Table 2. Values of D*(Ci,Cj) Classes 0
1
2
3
4
5
6
7
8
9
0 1 2 3 4 5 6 7 8 9
0.617 0.015 0.405 0.558 0.573 0.570 0.532 0.336 0.678 0.514
0.572 0.379 0.150 0.525 0.487 0.514 0.455 0.334 0.569 0.424
0.559 0.429 0.523 0.121 0.621 0.473 0.489 0.484 0.515 0.415
0.516 0.405 0.444 0.524 0.122 0.473 0.481 0.463 0.544 0.365
0.598 0.447 0.506 0.468 0.477 0.149 0.297 0.515 0.563 0.481
0.501 0.165 0.518 0.550 0.499 0.447 0.066 0.518 0.516 0.432
0.571 0.344 0.552 0.553 0.626 0.579 0.563 0.083 0.684 0.272
0.436 0.463 0.511 0.418 0.448 0.410 0.290 0.501 0.116 0.213
0.456 0.266 0.566 0.486 0.541 0.548 0.426 0.430 0.551 0.087
0.054 0.440 0.633 0.669 0.690 0.615 0.423 0.542 0.666 0.477
Definition and Validation of a Distance Measure
225
Fig. 17. In (a), the distributions, given in terms of percentiles for each class, of the distances within the same class distances with respect to all the other classes (second bar) and of the distances with respect to the nearest class (third shown the correspondence between the values of the percentiles and the colours of the bars. Note that, for class ‘0’ percentiles of the third bar from the 85th to the 95th are very negligible and thus they have not been plotted. See text
(first bar), of the bar). In (b), it is the values of the for further details.
Table 3. Classification and error rates Classes 0
1
0 1 2 3 4 5 6 7 8 9
0.22 98.73 0.44 0.11
98.59
2
0.11
3
98.36 0.11 0.11
0.74 0.32 0.54
5
6
0.22
0.22
0.11
0.11 98.09
0.11
0.11 0.32
99.14 0.96 0.74
0.11 0.11 0.11 0.21
0.22
4
0.21 0.22 0.32
98.29 0.74
7 0.22 0.42 0.33
8 0.11 0.53 0.44 0.11
9 0.22 0.11 0.11 0.74 0.22 0.21 0.21
0.53 97.04 99.15
0.76 0.54
The values of the parameters DI, DB, DN are presented in Fig. 16 for each class. The histograms in Fig. 16, however, only show the mean values of the distances among the classes. To obtain a more precise estimate, in Fig. 17 another diagram is presented
0.98
0.33 0.54
94.89 1.07
0.11 0.21 0.53 0.43 0.42 0.21 2.50 96.79
which reports, for each class, the distributions of the distances within the same class (which DI refers to), with respect to all the other classes (which DB refers to), and with respect to the nearest class (which DN refers to). The data are given in terms of percentiles: in particular, for the
Table 4. Nishida’s results Classes 0 0 1 2 3 4 5 6 7 8 9
1
2
93.5
1.3 99.7
3
4
5
6
7
8
9
99.9 0.8 0.1 99.8 0.1 0.1
98.9 0.1 0.1 0.1
0.1
0.1
0.1
0.4 99.1
0.1 99.0
0.1 99.0 99.8
0.1 0.1 0.1 99.5
226
distribution of the distance within the same class, each bar refers to a specific percentile and indicates the maximum distance value for the amount of samples given by that percentile. As an example, 80% of the samples belonging to class ‘0’ has the nearest neighbour in the same class at a distance less than or equal to 0.060, while 95% has its nearest neighbour at a distance less than or equal to 0.200. For the other two distributions the bar indicates the corresponding minimum distance value: for example, 80% of the samples of the database not belonging to class ‘0’ has its nearest neighbour in class ‘0’ at a distance greater than or equal to 0.435, while 80% of the samples of the class nearest to class ‘0’ (i.e. class ‘8’, as can be seen from Table 2) has its nearest neighbour in class ‘0’ at a distance greater than or equal to 0.294. It is possible to note that there is no great overlap among the classes. The only exception is given by class ‘9’, which exhibits some overlap with class ‘8’; this is reasonable because of the similarity between the corresponding shapes. A further evaluation of the introduced distance has been performed by employing it for a character recognition problem. In particular, we adopted a simple nearest neighbour classifier [24] using our distance; the test was carried out with a reference set of about 4600 samples (33% of the whole data set) and a test set of about 9400 samples (67% of the whole data set), disjoint from the reference set. In Fig. 18 the classification results for each class and the overall percentage of correct classification (equal to 97.91%) are shown. Table 3 presents the confusion matrix which reports the classification and error rates for each class; in particular, the table entry ij contains the percentage of samples of class Ci that have been assigned to class Cj. The overlap between classes ‘8’ and ‘9’, noted when discussing Fig. 17, is still present here, although it is not so critical as the histogram in Fig. 17 seems to indicate. To verify the effectiveness of our approach, we compared our results with those presented by Nishida [25], which obtained a correct recognition rate of 98.8%. In Table 4 the confusion matrix obtained by Nishida is reported. A comparison between Tables 3 and 4 shows how our results are quite close to those of Nishida, which
Fig. 18. Recognition rate for each class. The solid line represents the recognition rate on the whole test set.
P. Foggia et al.
were obtained by means of a complex method specially devised for the handwritten character recognition and based on a structural description containing a statistical model of the possible deformations affecting the shape of the characters.
4. CONCLUSIONS AND FUTURE WORKS In this paper, we have proposed a structural description scheme using circular arcs as primitives. In this framework, a distance measure between pairs of circular arcs and relations among them has been introduced, which accomplishes certain perceptive criteria for increasing its efficiency. It has proved applicable in a wide class of applications characterised by high variability in the shape of visual patterns where the structural approach is particularly useful. The description method together with the distance have been experimentally validated in the clustering and recognition of handwritten digits coming from a standard character database. The results obtained prove satisfactory, taking into account that the primary goal of our experiments was not to achieve the best classification results with respect to the chosen data set, but to assess the performance in terms of separability among classes which can be obtained by employing the distance introduced.
References 1. Shapiro LG, Haralick RM. Structural description and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 1981; 3: 505–519 2. Arcelli C, Cordella LP, De Floriani L. Looking for visual primitives. In: Cantoni V (ed). Human and Machine Vision. Analogies and Divergences. Plenum Press, 1994 3. Biederman I. Recognition by components – a theory of human image understanding. Psychological Review 1987; 94: 115–147 4. Eshera MA, Fu KS. An image understanding system using attributed symbolic representation and inexact graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 1986; 8: 604–617 5. Rocha J, Pavlidis T. A shape analysis model with applications to a character recognition system. IEEE Transactions on Pattern Analysis and Machine Intelligence 1994; 16: 393–404 6. Bunke H, Messmer BT. Efficient attributed graph matching and its application to image analysis. In: Braccini C, De Floriani L, Vernazza G (eds). Lecture Notes in Computer Science 974. Springer-Verlag, 1995 7. Wang YK, Fan KC, Horng JT. Genetic-based search for errorcorrecting graph isomorphism. IEEE Transactions on Systems, Man and Cybernetics (part B) 1997; 27 (4): 588–597 8. Shapiro LG, Haralick RM. A metric for comparing relational descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 1985; 7: 90–94 9. Tsai WH, Fu KS. Error-correcting isomorphisms of attributed
Definition and Validation of a Distance Measure
10.
11.
12.
13.
14. 15.
16.
17.
18.
19.
20.
21. 22. 23.
24. 25.
relational graphs for pattern analysis. IEEE Transactions on Systems, Man and Cybernetics 1979; 9: 757–768 Tsai WH, Fu KS. Subgraph error-correcting isomorphisms for syntactic pattern recognition. IEEE Transactions on Systems, Man and Cybernetics 1983; 13: 48–62 Sanfeliu A, Fu KS. A distance measure between Attributed Relational Graphs for pattern recognition. IEEE Transactions on Systems, Man and Cybernetics 1983; 13: 353–362 Eshera MA, Fu KS. A graph distance measure for image analysis. IEEE Transactions on Systems, Man and Cybernetics 1984; 14: 398–408 Arkin EM, Chew LP, Huttenlocher DP, Kedem K, Mitchell JSB. An efficiently computable metric for comparing polygonal shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991; 13: 209–215 Pei S, Horng J. Fitting digital curve using circular arcs. Pattern Recognition 1995; 28: 107–116 West GAW, Rosin PL. Techniques for segmenting image curves into meaningful descriptions. Pattern Recognition 1991; 24: 643–652 Howell GW, Fausett DW, Fausett LV. Quasi-circular splines: a shape preserving approximation. CVGIP: Graphical Models and Image Processing 1993; 55: 89–97 ETL database distributed by Electrotechnical Laboratory – The Japanese Technical Committee for Optical Character Recognition, during the 2nd International Conference on Document Analysis and Recognition, 1993 Cordella LP, Tortorella F, Vento M. Shape description through line decomposition. In Aspects of Visual Form Processing, Arcelli C, Cordella LP, Sanniti di Baja G. (eds). World Scientific, 1994 Cordella LP, De Stefano C, Vento M. A neural network classifier for OCR using structural descriptions. Machine Vision and Applications 1995; 8: 336–342 Cordella LP, De Stefano C, Tortorella F, Vento M. A method for improving classification reliability of multi-layer perceptrons. IEEE Transactions on Neural Networks 1995; 6: 1140–1147 Nilsson NJ. Principles of Artificial Intelligence, Springer-Verlag, 1982 Korf RE. Depth-first iterative-deepening: An optimal admissible tree search. Artificial Intelligence 1985; 27: 97–109 Russel SJ. Efficient memory-bounded search methods. Proceedings of the 10th European Conference on Artificial Intelligence, Vienna, Austria, 1992; 1–5 Cover TM, Hart PE. Nearest neighbour classification. IEEE Transactions on Information Theory 1967; 13: 21–27 Nishida H. Shape recognition by integrating structural descriptions and geometrical/statistical transforms. Computer Vision and Image Understanding 1996; 64: 248–262
227
Pasquale Foggia, was born in Naples, Italy in 1971. He received a Laurea degree with honours in Computer Engineering from the University of Naples “Federico II” in 1995. He is currently a PhD student at the Dipartimento di Informatica e Sistemistica of the University of Naples “Federico II”. His research interests are in the fields of classification algorithms, optical character recognition, graph matching and inductive learning. Pasquale Foggia is a member of the International Association for Pattern Recognition (IAPR).
Carlo Sansone was born in Naples, Italy in 1969. He received a Laurea degree with honours in Electronic Engineering in 1993 and a PhD degree in Electronic and Computer Engineering in 1997, both from the University of Naples “Federico II”. He has been Assistant Professor of Computer Sciences and Databases at the University of Naples “Federico II” and Assistant Professor of Computer Science at the University of Cassino. His research interests are in the fields of classification algorithms, optical character recognition and neural networks theory and applications. Carlo Sansone is a member of the International Association for Pattern Recognition (IAPR).
Francesco Tortorella was born in Salerno, Italy in 1963. He received a Laurea degree with honours in Electronic Engineering in 1991 and a PhD degree in Electronic and Computer Engineering in 1995, both from the University of Naples “Federico II”, Italy. From 1995 to 1996 he was Assistant Professor of Computer Architectures at the University of Naples “Federico II”; from 1997 to 1998 he was Assistant Professor of Computer Science at the University of Cassino. In september 1998 he joined the Dipartimento di Automazione, Elettromagnetismo, Ingegneria dell’Informazione e Matematica Industriale, University of Cassino, as a researcher. His current research interests include: classification algorithms, optical character recognition, map and document processing, neural networks. Dr. Tortorella is a member of the International Association for Pattern Recognition (IAPR).
Mario Vento, was born in Naples, Italy in 1960. In 1984 he received a Laurea degree with honours in Electronic Engineering, and in 1988 a PhD in Electronic and Computer Engineering, both from University of Naples “Federico II”, Italy. Since 1989, he has been a researcher associated with the Dipartimento di Informatica e Sistemistica at the above University. Currently he is Associate Professor of Artificial Intelligence and Computer Science at the Faculty of Engineering of the University of Naples. His present research interests are in the field of image analysis and recognition, image description and classification techniques, soft computing, machine learning and artificial intelligence. Mario Vento is a member of the International Association for Pattern Recognition (IAPR).
Correspondence and offprint requests to: Professor M. Vento, Dipartimento di Informatica e Sistemistica, Universita` degti Studi di Napoli ‘Federico II’, Via Claudio 21, I-80125 Napoli, Italy. Email: vento얀unina.it