Image and Vision Computing 17 (1999) 701–711
Combining statistical and structural approaches for handwritten character description Pasquale Foggia, Carlo Sansone, Francesco Tortorella, Mario Vento* Dipartimento di Informatica e Sistemistica, Universita` degli Studi di Napoli “Federico II”, via Claudio 21, I-80125 Naples, Italy Received 20 February 1998; received in revised form 2 June 1998; accepted 10 June 1998
Abstract In this paper a new character description method, based on the combination of structural and statistical approaches, is presented. Characters are preliminarily decomposed in terms of structural primitives (circular arcs) and successively described in terms of statistical features (geometric moments). The obtained description is much more stable and yields significant improvements in classification performance: its effectiveness has been demonstrated by comparing the recognition results obtained by applying the geometric moments directly on the character bit maps and, as proposed, on the character decomposition in circular arcs. Absolute and relative performance is significant especially for particularly critical cases. Novel recurrent formulae for evaluating in a closed form the moments of objects represented in terms of circular arcs are also introduced; experimental results reveal a significant reduction of the time needed for evaluating the moments. q 1999 Elsevier Science B.V. All rights reserved. Keywords: Optical character recognition; Geometric moments; Hybrid description methods
1. Introduction In typical pattern recognition problems the description phase plays a fundamental role, since it defines the set of properties which are considered essential for characterizing the pattern and salient for taking a classification decision, whichever approach to the classification is adopted. In the statistical approach, the input pattern is characterized by a set of N features (e.g. a set of measurements performed on the raw data) and its description is achieved by means of a feature vector belonging to an N-dimensional space. If the features are properly chosen, feature vectors coming from objects of the same class will be close to each other in terms of geometric distance, while feature vectors belonging to different classes will be located in different regions of the feature space. In this way, recognition implies the partition of the feature space into regions, each pertaining to a single class. In this approach greater emphasis is given to the classification rather than to the description phase (in fact, the statistical approach is also referred to as decision-theoretic approach). Typically, the classification stage is reconducted to a problem of statistical decision theory and consequently a large variety of well documented * Corresponding author. Tel.: 1 39-081-768-3606; fax: 1 39-081-7683186. E-mail address:
[email protected] (M. Vento)
and assessed algorithms can be employed. However, effective and established methods for finding a priori a set of good features (i.e. features able to maximize the discrimination degree among the classes to be recognized), are generally not available. Usually, a large number of features is initially considered in order to capture all the discriminant information. The task of eliminating possible redundancies within the provisional set of features is performed by means of a feature selection process which generally employs statistical methods such as discriminant analysis [1]. In synthesis, the key problem of the statistical approach is the lack of a descriptive model for the patterns to be recognized which does not allow one to determine and control the information given by each feature, so as to choose the most suitable ones. An exhaustive review of feature extraction methods, especially devised for character recognition can be found elsewhere [2]. On the other hand, in the structural approach it is assumed that the pattern to be recognized can be decomposed into simpler components (called primitives), possibly in a recursive way, and then described in terms of simple appropriate attributes of the primitives and of their topological relations. In this way, the effectiveness of the description in discriminating among different classes can be perceptively appraised, to some extent. However, the obtained descriptions do not have fixed length and frequently it is not possible to establish an a priori order
0262-8856/99/$ - see front matter q 1999 Elsevier Science B.V. All rights reserved. PII: S0262-885 6(98)00146-2
702
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
for arranging the extracted primitives in the description. This characteristic strongly affects the choice of the data structure for storing the description and makes critical the classification phase. A technique for tackling such difficulties is to represent the patterns of a class as sentences belonging to a language, defined by means of a grammar. The most significant advantage of this approach, referred to as syntactic pattern recognition [3], is that most of the well known methodology of formal languages and parsing techniques can be simply reused; on the other hand, many difficulties may arise when defining the grammar, whose structure can be very complex and hard to infer from a sample set of patterns. Another way to represent structural description is given by more complex data structures such as the attributed relational graphs (ARGs) [4,5], whose nodes and branches respectively characterize, by means of a set of attributes, the primitives and their relations. ARGs have been widely adopted in structural pattern recognition, even if their description power is often paid in the classification stage with a high computational cost. In fact, the noise and the shape variations occurring in real applications generate distortions both in the structure of the graph and in its node and branch attributes, making necessary the use of complex algorithms for the inexact graph matching. In summary, both approaches present, together with some drawbacks, appealing and complementary properties, whose combination could lead to a more effective description method for most pattern recognition applications. In past years, possible ways of introducing statistical recognition techniques into the structural method have been identified, particularly with reference to the classification phase (see, for example, Goldfarb and Chan [6] who suggest employing a nearest neighbor classifier with structural prototypes, or Tsai [7] for a review of the stochastic grammars in the syntactic approach). Moreover, much effort has been made to include into the structural descriptions some statistically modeled information about variations of the primitives and/or the relations due to the noise or distortions, so as to make the recognition process more robust. A typical example is given by the random graphs [8], which have graph structure with randomly varying node and arc attribute values; more recently, Nishida [9] has proposed a structural description scheme which includes a model of the deformations affecting the patterns as geometric and statistical transformations. In any case, the whole approach is essentially structural: the classification is still performed through structural matching algorithms, even though it appears to be more robust with respect to instabilities occurring during the extraction of the primitives. On the contrary, much less attention has been devoted to define a way for including a structural description approach into a statistical framework, so as to employ model-based features for classification; thus, very few proposals of this kind are present in the literature. In the approach described
by Baird [10], characters are initially decomposed in terms of structural primitives, each described by some numerical parameters. The final feature vector has binary components corresponding to regions of the parameter space, which are identified in a previous clustering phase: the value of the component is set to 1 if there is at least one primitive falling in the associated region, to 0 otherwise. It is worth noting that the preliminary clustering is heavily domain dependent and thus its results can be quite dissimilar for different domains, giving rise to not homogeneous descriptions. Moreover, its computational cost is generally not negligible. In a paper by Taxt et al. [11], the description method presented produces a feature vector made of curvature measurements taken from the outer boundary of a symbol. The boundary is initially approximated by means of B-spline curves, but this is done only for removing noise: no hypothesis is made about the structure of the symbol to be recognized and thus the type and the order of the approximating spline are determined without taking into account the structural characteristics of the shape. The character description method we propose combines the desirable properties of statistical and structural approaches. The pattern to be recognized is described by means of a feature vector, so allowing the use of suitable statistical techniques in the classification phase: the relevant and original point is that the features (geometric moments) are extracted on a representation of the character coming from a suitable decomposition in terms of structural primitives (circular arcs). To this end, a preprocessing is performed which, starting from the original bit map of the pattern, leads to its structural representation in terms of circular arcs. The use of such primitives allows the computation of the geometric moments by means of closed recurrent formulae: numerical evaluations of the moments, which may imply a high computational cost and some degree of approximation error, are thus avoided. The obtained description is much more stable and yields significant improvements in classification performance. On one hand, in fact, the data on which feature vectors are extracted do not show the variability typical of the bit maps and so the relative distributions in the feature space are more compact and easy to model. On the other hand, possible instabilities in the character decomposition due to the noise and/or pattern distortions produce variations in the final descriptions which are not critical to be statistically modeled: such situations can thus be profitably tackled in the classification phase, without involving the problems arising in a pure structural approach. In the following sections the whole method is presented. In Section 2 the description process and the employed classification technique are outlined. The experiments performed on a standard database of handwritten characters and the relative results are described in Section 3, with a discussion in Section 4. Finally, in Section 5, some conclusions and guidelines for future work are drawn.
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
2. The handwritten character description method We have applied our method to the recognition of handwritten characters. This is a severe test-bed for a description method, since characters come from different writers who exhibit greatly varying drawing styles. This gives rise to a high variability within each class which is very difficult to model if a pure statistical approach is adopted. Typical statistical (quantitative) features (such as geometric moments, Fourier descriptors, etc.), directly extracted from the bit maps, are revealed to be effective for the recognition of printed characters (for which the drawing process is very stable), but generally they do not achieve very satisfactory results when applied to the recognition of handwritten characters. This is a challenging problem also for a pure structural approach. In fact, the decomposition in primitives provides an undoubtedly more stable representation of the character without any loss of information useful for the recognition, but frequently it does not completely eliminate the intrinsic variability of the original raw data and gives rise to a certain instability in the descriptions difficult to handle in the classification phase. A key point of our description method is the extraction, in a first phase, of a representation of the characters based on an effective structural decomposition. It is assumed that character images can be considered as ribbons evolving, joining and intersecting in a 2-D space (see Fig. 1(a)). In this case, the most convenient initial representation is a thin line centered within the stroke, since stroke thickness does not seem to be significant for recognition, with the exception of very special cases. The final structural representation is
703
made up of circular arcs approximating the obtained strokes: such a representation reabsorbs insignificant shape variability, without loosing morphological and/or topological information important for the recognition. Besides their descriptive power, these primitives are particularly useful because it is possible, as we will show below, to evaluate efficiently the geometrical moments on the structural representation of the character by means of closed formulae. The final step, which computes the geometrical moments from the structural representation of the whole character, provides a pattern description which can be fed into a statistical classifier (a neural network in our case) since it has a fixed length. Other kinds of descriptions simply derived from the obtained circular arcs separately considered (such as the values of coordinates, angles and radii of the individual arcs) would give rise to a variable length feature vector not suitable for a statistical classifier. Another important advantage of this description is that it is not substantially affected by variations in the structural representation due to instabilities in the decomposition phase. In fact, since the moments are an integral measure, their values do not differ significantly if a single stroke is decomposed into two or more pieces, nor do they depend on changes of the points on the stroke chosen as the ends of each single component, as long as the set of components is a faithful approximation of the original stroke. The assumptions underlying our approach are quite general. They do not depend on the particular class of objects to be recognized: in fact, the same assumptions hold for other applications (such as recognizing symbols on topographic maps or components on electric circuits) for which the hypothesis of ribbon-like shapes is valid.
Fig. 1. (a) Bit maps of some characters; (b) output of the thinning transformation; (c) polygonal approximations; (d) decompositions in circular arcs.
704
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
2.1. Handwritten character decomposition
2.2. The description through geometric moments
Character images can be considered as ribbons evolving, joining and intersecting in a 2-D space (see Fig. 1(a)). According to the semantic information held by the ribbon shape, a suitable character representation could be achieved by applying to them a thinning (skeletonizing) transformation (see Fig. 1(b)). Under certain hypotheses, the transformation achieves the goal of compressing information without loss, but its use for recognition purposes is not straightforward. It is known, in fact, that skeletonization techniques may give place to distorted representations of the shape of the ribbons one would describe. The most important shape distortions introduced by thinning techniques arise at the junction and crossing of ribbons representing character strokes and mainly consist of spurious skeleton branches, not corresponding to actual ribbons, or of spurious inflections of the skeleton lines whose relevance depends on the relative thickness and on the angle formed by the joining strokes. However, it has been shown [12] that it is possible to correct the distortions after thinning on condition that skeleton pixels are labeled with their distance from the background. By using this information, together with information about direction of skeleton lines, the mentioned shape distortions can be reliably corrected in the large majority of cases, avoiding side effects and at a contained computational cost. To obtain this, a medial axis transformation algorithm [13,14] followed by polygonal approximation of the obtained skeleton is first applied to the character bit map; the correction procedure is then applied to the attained polygonal (see Fig. 1(c)). To cope with character variability and to single out the features most characteristic and invariant for members of a recognition class, circular arcs have been assumed as primitives for the structural desription (see Fig. 1(d)). In fact, for Latin handwritten characters, the circular arcs seem to have enough descriptive power to substitute curves of different shape, but having the same contextual value, without destroying really discriminant features. According to these assumptions, we use an algorithm [15,16] which decomposes the polygonal lines representing a character into circular arcs of different radii of curvature, considering straight segments as the limit case of an arc. The procedure to find the arc approximating a piece of polygonal line involves a transformation that changes a polygonal line in the (x,y) plane into a set of horizontal segments (a staircase function) in a (l,a ) plane, where l is the distance of the generic point along the polygonal from a reference point (curvilinear abscissa), and a is the angle between each segment of the polygonal and a starting segment. This transformation reduces the problem of fitting a circular arc to a polygonal line to that of approximating a staircase function with a straight line. In order to find the straight line that better approximates a staircase in (l,a ), we minimize the value of a suitably defined error parameter, by using the least square method.
Geometric moments have been extensively employed in pattern recognition as image descriptors [17]. Several authors have proposed combinations of geometric moments that are invariant with respect to rotation [18]; other kinds of moments, based on orthonormal polynomials, have been also investigated [19,20]. In this paper we focus our attention on central geometric moments, although the method we propose could be extended to other kinds of moments. For a continuous image function f(x,y), the geometric moment of order (r,s) is defined as ZZ Mrs xr ys f
x; y dxdy
1 The quantity r 1 s is also referred to as the order of the moment. If the image is represented by a discrete function, integrals are replaced by summations, yielding XX r s xi yj f
xi ; yj
2 Mrs i
j
Moments defined by Eq. (2) are not independent of translation. To obtain translation invariance, central moments are used X Xÿ ÿ xi 2 x r yi 2 y s f
xi ; yj
3 mrs i
j
where x
M10 M and y 01 M00 M00
4
are the coordinates of the centroid of the image.ÿ Scale invariance can be obtained dividing mrs by M00 l with l
r 1 s=2 1 1. In the case of bilevel images given in terms of a set S of circular arcs, Eq. (1) can be rewritten as X Z r xg
lysg
l dl
5 Mrs g[S
g
where each circular arc g is described by means of the functions xg and yg , that express the coordinates of each point in terms of the curvilinear abscissa l. For circular arcs described by their centers, angles and radii, Eq. (5) becomes Mrs
N X
i Mrs where
i1 i Mrs
Z bi ÿ ai
ÿ Ri cosq 1 x^i r Ri sinq 1 y^i s Ri dq
6
where N is the number of arcs, a i and b i are the start and end angles of the ith arc, Ri is the radius and (x^i ; y^i ) are the coordinates of the arc center. Eq. (6) is not valid for an arc degenerating into a straight line segment; in this case
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
the corresponding term of the summation can be replaced by i Mrs
ZLi =2 ÿ 2 Li =2
lcosqi 1 x^i
r ÿ
lsinqi 1 y^i s dl
7
705
where Fpq
q
cosqp21
sinqq21 p21
sinq2 2 p1q22 p1q
where Li is the length of the segment, ÿ q i is the angle formed by the segment with the x axis and x^i ; y^i are the coordinates of the midpoint of the segment. Simple modifications of Eqs. (6) and (7) give the formulae to compute the central moments mrs. i The evaluation of the moment Mrs can be performed by i ~ pq computed assuming the arc center using the moments M as origin of the coordinate system and with p and q varying in the range 0…r and 0…s, respectively. In fact, taking into account the well known identity ! n X n j n2j n ab
a 1 b j j0
It is worth noting that Eq. (9) is applicable when p $ 2 and ~ 1q are ~ p1 and M q $ 2. The expressions giving the moments M easily obtained
the moment of order r 1 s on the generic arc can be written as
Z
Mrs
Zb a
Zb a
ÿ
Rcosq 1 x^r Rsinq 1 y^ s R dq 2 4
2
r
p0
p
s X
s
q0
q
r X s X
r
4
r X
Rsinqq y^s2q 5R dq !
s X
r
p0 q0
p
s
! x^r2p y^s2q
q !
s q
Zb a
Rcosqp
Rsinqq R dq
~ 0q the following formulae [21] can be used: ~ p0 and M For M Z
sinqp dq 2
cosqq dq
sinqp21 cosq p21 Z 1
sinqp22 dq p p
cosqq21 sinq q21 Z 1
cosqq22 dq q q
~ pq x^r2p y^s2q M
8
Vp0
q
V0q
q
cosqp21 sinq p
sinqq21 cosq q
As an example, in Table 1 the expressions relative to the moments up to the fourth order are given. For computing the coordinates of the centroid of the image, we can apply Eqs. (4) and (8), thus obtaining N X
x
~ i ~ i x^i M 00 1 M 10
i1 N X i1
~ pq in Eq. As a consequence, the definition of the moment M (8) becomes: h ~ pq Rp1q11 Fpq
b 2 Fpq
a M ÿ p21 q21 ~ ÿ ÿ M 1R p 1 q p 1 q 2 2 p22;q22
13
and
!
cosqp21
sinqq21 p21
sinq2 2 p1q p1q22 ÿ ÿ Z p21 q21 ÿ
cosqp22
sinqq22 dq 1 ÿ p1q p1q22
ÿ
4
11
where
~ pq , a recurrence relation can be established For computing M by means of the application of the following formula [21]: Z
cosqp
sinqq dq
q12 hÿ i ~ 1q R sinb q11 2
sinaq11 M q11
h ÿ i ~ 0;q22 ~ 0q Rq11 V0q
a 2 V0q b 1 R2 q 2 1 M M q
3
p
p0 q0 r X
Rcosqp x^r2p 5
!
10
In these cases, the obtained expressions are, respectively: h i ÿ ~ p22;0 ~ p0 Rp11 Vp0 b 2 Vp0
a 1 R2 p 2 1 M
12 M p
3
!
p12 h ÿ i ~ p1 R
cosap11 2 cosb p11 M p11
9
N X
; y ~ i M 00
~ i ~ i y^i M 00 1 M 01
i1 N X
14
~ i M 00
i1
At this point we can refer to Eq. (8) once again for evaluating the central moments mrs: ! ! N X r X s X r s ÿ ÿ ~ i x^i 2 x r2p y^i 2 y s2q M mrs pq
15 p q i1 p0 q0 From the operative point of view, all the terms necessary for computing mrs are easily obtained by means of Eqs. (9)– (13), once the parameters (Ri,a i,b i) have been derived from
706
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
in the scale and in the position of the character, a normalization is needed, according to the formulae presented in the previous subsection which evaluate the coordinates of the centroid and the normalization coefficient for the scale invariance. Since the formulae employ the values of the zero- and first-order moments, such values are no longer included in the feature vector. As a consequence, while the total number of the moments up to the nth order is
Table 1 ~ pq evaluated up to the fourth order Expressions of the moments M p1q
~ pq M
0 1
~ 00 R
b 2 a M ~ 10 R2
sinb 2 sina M ~ 01 R2
cosa 2 cosb M
2
~ 20 R3 sin2b 2 sin2a M 4
! 1
R2 ~ M 2 00
n 1 1
n 1 2 2
3 ~ 11 R
sinb2 2
sina2 M 2 ! R2 ~ 3 sin2a 2 sin2b ~ M 02 R M 1 4 2 00 # " 2 2 2 ~ 10 ~ 30 R4
cosb sinb 2
cosa sina 1 2R M M 3 3
3
4 ~ 21 R
cosa3 2
cosb3 M 3 4 R ~ 12 M
sinb3 2
sina3 3 " # 2 2 ~ 03 R4
sina cosa 2
sinb cosb 1 M 3 # " 3 3 ~ 40 R5
cosb sinb 2
cosa sina 1 M 4
4
the components of the corresponding feature vector will be
n 1 1
n 1 2 23 2 Another point to take into account is that the magnitude of the moments decreases as the order increases, since the coordinates of the points on the arcs are normalized to the range [0,1]. This situation could cause a reduction of the performance of some classifiers (such as the neural one used in the experiments described in the next section), which might consider as less significant the components with a smaller magnitude, and loose in this way discriminant information contained in higher-order moments. To overcome this problem, it should be advisable to scale each component by the mean absolute value of that component over a training set. This kind of normalization has been proved to be very effective in a preliminary experimental phase.
2R2 ~ M 01 3 3R2 ~ M 20 4
5 ~ 31 R
cosa4 2
cosb4 M 4 ! 4 ~ 22 R5 sin4a 2 sin4b 1 R M ~ M 32 8 00 5 ~ 13 R
sinb4 2
sina4 M 4 " # 3 3 2 ~ 04 R5
sina cosa 2
sinb cosb 1 3R M ~ 02 M 4 4
3. Experimental results
each of the N arcs into which the image has been decomposed. Eq. (8) can also be used, with small changes, if the arc degenerates into a straight line segment; in this case x^ and y^ are the coordinates of the center of the segment, and ~ pq M
ZL=2 2 L=2
lcosqp dl
cosqp
sinqq
ZL=2 2 L=2
lp1q dl
The proposed method has been tested using a set of handwritten digits from the ETL Database [22]. In particular, we used a set of 1000 randomly selected characters for training a neural network classifier, a set (training test set) of 1000 characters to decide when the training phase had to be stopped in order to prevent the overtraining phenomenon [23], and a test set composed of 5000 characters, disjoint from the former two sets, to evaluate the performance of the method (see Fig. 2). The classifier adopted for testing the
" #
cosqp
sinqq L p1q11 L p1q11 2 2 p1q11 2 2 8 p q p1q11 > < 2
cosq
sinq L for p 1 q even p1q11 2 > : 0 for p 1 q odd
16 2.3. The coding Once the moments have been evaluated, to obtain a feature vector as much as possible unaffected by changes
Fig. 2. Some samples of the considered test set.
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711 Table 2 Recognition rate obtained with BMM
Up to third order Up to fifth order Up to seventh order
Training set
Training test set
Test set
86.9 95.6 97.1
76.7 86.5 89.0
79.9 89.2 89.7
proposed method is a multi-layer perceptron [24] using a sigmoidal activation function. In particular, a three-layer fully connected network has been used, because it is known that this number of layers allows one to build decision regions of any shape (but not multiply connected) as required for facing the large majority of classification problems [25]. The output layer is obviously made of 10 neurons, each representing one of the classes to be recognized. The number of hidden neurons was chosen equal to 30. The network was trained with the standard back-propagation algorithm [24], using a constant learning rate h , fixed to 0.5. Two classifiers based on the same neural architecture have been used for evaluating the recognition performance of the moments evaluated on the bit maps (hereinafter denoted by BMM) and of the moments evaluated on the structural decompositions (SDM from now on). To avoid any bias in the final results, we started the training with the same initial weights in both cases. Tables 2 and 3 respectively report the recognition rates obtained by using BMM and SDM, for different values of the maximum moment order. In particular we compare the results for moments up to third, fifth and seventh order. Results obtained with higher-order moments are not reported, as they did not give a significant contribution to the recognition rate. It is worth noting that the obtained results are not outstanding; on the other hand, they are neither low, considering the quality of the characters in the database. Other authors have obtained better results [9] by employing sophisticated classification systems and training schemes tailored to the particular application. The primary goal of our experiments, instead, was not the achievement of the best classification performance on the chosen data set, but the assessment of the performance improvement due exclusively to the introduction of a structural approach into a statistical framework for the character classification. For this reason, we have employed a simple, perhaps not optimal, classifier without an extensive tuning of the classifier parameters. Our only concern has been to Table 3 Recognition rate obtained with SDM
Up to third order Up to fifth order Up to seventh order
Training set
Training test set
Test set
93.3 98.1 98.8
82.4 87.4 93.0
84.6 89.6 95.3
707
provide a fair comparison between the proposed hybrid description scheme and the purely statistical one chosen as reference. As evident from the tables and from the diagrams in Fig. 3, the proposed method leads to a sensible improvement of the classification rate in all the considered tests. In particular, for moments up to seventh order, which give the best performance in both approaches, the improvement in the recognition rate on the test set is about 6%. This behavior confirms our hypothesis that a description method based on a combination of structural and statistical approaches leads to feature vectors more stable with respect to shape variations. Considering the recognition rate for each class can highlight further differences in performance of both methods. The results reported in Fig. 4 show that with SDM there is an overall improvement, which is very significant with some characters (‘7’, ‘8’, ‘9’): in these cases, because of the large variability in character bit maps, the morphological differences among samples of different classes become smaller. Global measurements, such as BMM, fail to catch these differences, so causing a considerable decrease of the recognition rate. In these situations, structural decompositions are still able to extract primitives in a stable way, so preserving most of the discriminant information. There are three cases (‘1’, ‘2’, ‘6’), however, in which BMM works slightly better. Other useful data are presented in Tables 4 and 5, which show the confusion matrices for the compared description methods, using moments up to seventh order. The element tij of the table represents the percentage of samples belonging to the ith class which has been wrongly assigned to the class j. Such results further confirm that the combination of statistical and structural approaches effectively helps in improving the classification performance. This is particularly evident when considering pairs of classes difficult to distinguish by means of a pure statistical method: see, for example, the case of ‘7’ and ‘9’, which are strongly confused when using moments on bit maps, while there is very little confusion with moments on circular arcs. Other examples of this type are given by the pairs (‘0’, ‘8’) and (‘5’, ‘8’). Finally, Fig. 5 summarizes some overall differences between the two methods. About 86% of the samples are correctly recognized by both methods and only a small part (about 1%) remains unrecognized. It is worth noting that the percentage of samples correctly classified by only one of the two considered methods is significantly higher in the case of SDM. Our method, in fact, allows one to recognize 9.1% of samples missed by BMM. However, the actual overall gain is only 5.6% because SDM fails on 3.5% of characters recognized by BMM. Such cases, already highlighted in the discussion regarding Fig. 4, are mainly generated by an intrinsic weakness of
708
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
Fig. 3. Recognition rates versus the maximum moment order, collected for each considered set.
Fig. 4. Recognition rates on the test set. Moments up to seventh order have been used. Table 4 Confusion matrix for BMM evaluated on the test set 0 0 1 2 3 4 5 6 7 8 9
1
2
3
–
4
5
6
7
8
9
1.2
3.0
1.4
4.6
2.0
1.0 1.2
1.0
– 2.2
1.2 –
1.4 6.0 2.0 6.0
– 1.8 0.6 3.0
0.8
1.0 – 2.2 1.0
1.0 – 2.2 1.4
1.6 6.0
– 3.4
0.2 1.0 1.0
0.8 1.4
8.0
– 1.0 11.6
1.2 1.0
0.8 –
14 1.0 –
Table 5 Confusion matrix for SDM evaluated on the test set 0 0 1 2 3 4 5 6 7 8 9
1
–
0.8 1.2 2.6 0.8
– 2.0 1.0
2
3
4
5
2.6
1.2
2.0
–
1.0 – 1.2
2.8
3.2 0.6 2.0 2.6 1.6 –
0.8
1.2
– 1.2 1.0 0.8
6
7
8
9
0.8 0.8 1.6 2.8 –
1.4 0.2
2.4 1.0 –
1.2 0.2
– 0.4
0.2 –
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
Fig. 5. Percentages of characters recognized by each of the two considered methods, by both and by neither of them.
structural approaches. In fact, even though the use of a structural approach generally gives a great help in reabsorbing shape variability, the structural decomposition may actually sharpen the shape deformations, so producing an unfaithful description, when the pattern does not satisfy the hypotheses of the structural method. In particular, characters misclassified by SDM generally lie outside the hypothesis of the ribbon-like shape model (e.g. broken or too thick characters, characters with filled holes, etc.). In these cases, skeletons may exhibit distortions which cannot be simply recovered by successive processing.
4. Discussion The hybrid structural/statistical shape description method we have presented is based on two assumptions: (1) it is possible to model the shape to be described as a ribbon and thus its skeleton can be adopted as a faithful, compressed version from which to start for obtaining the description; (2) circular arcs are primitives suitable for a significant
709
structural decomposition of the shape. The first hypothesis is not strictly necessary: the approach is applicable if a different characteristic such as the contour is used for describing the shape, provided that it is still possible to obtain a decomposition in terms of circular arcs. The choice of these primitives has been made since there are many applicative contexts (besides handwritten character recognition, other possible applications are the recognition of symbols on topographic maps, the recognition of components in electronic circuits, etc.) in which their descriptive power is enough to substitute curves of different shape without loosing any morphological and/or topological information necessary for identifying the object. The aim of our proposal was not to derive a fast method for the computation of geometric moments (there are several algorithms [26–30] which allow one to speed up this task), and thus a precise and extended analysis of the performance attainable is beyond the scope of this paper. Nevertheless, a rough comparison between the times needed to compute the two kinds of descriptions considered shows a clear improvement when using SDM. In Figs. 6 and 7 the times are plotted as a function of the number of pixels in the pattern: it is possible to see that in both cases the curves are quite linear, but with much higher values for the BMM. It is worth noting that, for the SDM, the most part of the time is spent for the preprocessing and that the evaluation of the moments by means of the recurrence relations is very inexpensive and almost constant with respect to the number of the pixels of the pattern (see Fig. 8). Since the aim of the experiments was to verify the performances of our method in terms of description effectiveness, no great care has been taken in optimizing the speed of the preprocessing steps. Therefore, for some of them faster algorithms could be used, thus attaining shorter times for the whole description evaluation. Finally, it is worth noting that, as is shown in Fig. 5, the set of characters which are not recognized using either of the
Fig. 6. Times (in ms) for computing the BMM descriptions versus the number of pixels of the character bit maps.
710
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711
Fig. 7. Times (in ms) for computing the SDM descriptions versus the number of pixels of the character bit maps.
Fig. 8. Times (in ms) for computing the SDM starting from the structural representations of the characters.
two description schemes amounts to just 1.2% of the whole test set. This fact seems to suggest that a hypothetical recognition rate of 98.8% could be achieved if it were possible to decide, for each character, which of the two descriptions is more suitable for its recognition. To this end a promising approach can be the adoption of a multi-expert system for the classification phase.
geometrical moments directly on the character bit maps and reveal a significant improvement in the recognition rate. Future investigations will be oriented to generalize the method to other kinds of moments, and to adopt a multiexpert system for exploiting the advantages of both the description schemes.
References 5. Conclusions and future work In this paper a description method based on the combination of structural and statistical approaches is presented. The method describes a character by geometrical moments evaluated on a structural representation of the character in terms of circular arc primitives. The results of the method have been compared with those obtained by evaluating the
[1] K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., Academic Press, New York, 1990. [2] Ø.D. Trier, A.K. Jain, T. Taxt, Feature extraction methods for character recognition—a survey, Pattern Recognition 29 (1996) 641–662. [3] K.S. Fu, Synthetic Methods in Pattern Recognition, Academic Press, New York, 1974. [4] M.A. Eshera, K.S. Fu, An image understanding system using attributed symbolic representation and inexact graph matching, IEEE Trans. Pattern Anal. Mach. Intell. 8 (1986) 604–617.
P. Foggia et al. / Image and Vision Computing 17 (1999) 701–711 [5] L.G. Shapiro, R.M. Haralick, Structural description and inexact matching, IEEE Trans. Pattern Anal. Mach. Intell. 3 (1981) 505–519. [6] L. Goldfarb, T.Y.T. Chan, On a new unified approach to pattern recognition, Proc. 7th Int. Conf. on Pattern Recognition, Montreal, Canada, 1984, pp. 705–708. [7] W.H. Tsai, Combining statistical and structural methods, in: H. Bunke, A. Sanfeliu (Eds.), Syntactic and Structural Pattern Recognition—Theory and Applications, World Scientific, Singapore, 1990, pp. 349–366. [8] A.K.C., Wong, J. Costant, M.L. You, Random graphs, in: H. Bunke, A. Sanfeliu (Eds.), Syntactic and Structural Pattern Recognition— Theory and Applications, World Scientific, Singapore, 1990, pp. 197–236. [9] H. Nishida, Shape recognition by integrating structural descriptions and geometrical/statistical transforms, Computer Vision and Image Understanding 64 (1996) 248–262. [10] H.S. Baird, Feature identification for hybrid structural/statistical pattern classification, Comput. Vision Graph. Image Process 42 (1988) 318–333. ´ lafsdo´ttir, M. Dæhlen, Recognition of handwritten [11] T. Taxt, J.B. O symbols, Pattern Recognition 23 (1990) 1155–1166. [12] G. Boccignone, A. Chianese, L.P. Cordella, A. Marcelli, Using skeletons for OCR, in: V. Cantoni, L.P. Cordella, S. Levialdi, G. Sanniti di Baja (Eds.), Image Analysis and Processing, World Scientific, Singapore, 1990, pp. 275–282. [13] C. Arcelli, G. Sanniti di Baja, A thinning algorithm based on prominence detection, Pattern Recognition 13 (1981) 225–235. [14] C. Arcelli, L.P. Cordella, S. Levialdi, From local maxima to connected skeleton, IEEE Trans. Pattern Anal. Mach. Intell. 3 (1981) 134–143. [15] A. Chianese, L.P. Cordella, M. De Santo, M. Vento, Decomposition of ribbon-like shapes, Proc. 6th Scandinavian Conf. on Image Analysis, Oulu, Finland, 1989, pp. 416–423. [16] L. P. Cordella, F. Tortorella, M. Vento, Shape description through line decomposition, in: C. Arcelli, L.P. Cordella, G. Sanniti di Baja (Eds.),
[17] [18] [19] [20] [21] [22]
[23] [24]
[25] [26]
[27]
[28] [29] [30]
711
Aspects of Visual Form Processing, World Scientific, Singapore, 1994, pp. 129–138. A. Rosenfeld, A.C. Kak, Digital Picture Processing, vol. 2, Academic Press, New York, 1982. M. Hu, Visual pattern recognition by moment invariants, IRE Trans. Information Theory 8 (1962) 179–187. S.X. Liao, M. Pawlak, On image analysis by moments, IEEE Trans. Pattern Anal. Mach. Intell. 18 (1996) 254–266. C.H. Teh, R.T. Chin, On image analysis by the methods of moments, IEEE Trans. Pattern Anal. Mach. Intell. 10 (1988) 496–513. I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series and Products, 4th ed., Academic Press, New York, 1965. ETL-1 character database, collected by the Technical Committee for OCR at the Japan Electronic Industry Development Association and distributed by the Electroctechnical Laboratory. R. Hecht-Nielsen, Neurocomputing, Addison Wesley, Reading, MA, 1990, Chap. 5. D.E. Rumelhart, J.L. McClelland, Parallel Distributed Processing— Explorations in the Microstructure of Cognition, vol. 1, MIT Press, Cambridge, MA, 1986. R.P. Lippmann, An introduction to computing with neural nets, IEEE ASSP Magazine 4 (1987) 4–22. M. Dai, P. Baylou, M. Najim, An efficient algorithm for computation of shape moments from run-length codes or chain codes, Pattern Recognition 25 (1992) 1119–1128. M. Hatamian, A real-time two dimensional moment generating algorithm and its single chip implementation, IEEE Trans. ASSP 34 (1986) 546–553. B.C. Li, J. Shen, Fast computation of moment invariants, Pattern Recognition 24 (1991) 807–813. W. Philips, A new fast algorithm for moment computation, Pattern Recognition 26 (1993) 1619–1621. L. Yang, F. Albregsten, Fast and exact computation of cartesian geometric moments using discrete Green’s theorem, Pattern Recognition 29 (1996) 1061–1073.