Image and Vision Computing 21 (2003) 879–889 www.elsevier.com/locate/imavis
A Hidden Markov Model object recognition technique for incomplete and distorted corner sequences C. de Trazegnies*, C. Urdiales, A. Bandera, F. Sandoval Dpto. Tecnologı´a Electro´nica, ETSI Telecomunicacio´n, Universidad de Ma´laga, Campus de Teatinos, s/n 29071 Ma´laga, Spain Accepted 20 March 2003
Abstract This paper presents a new technique for planar object recognition based on Hidden Markov Models. First, the contour of the object is processed to extract a sequence of high curvature points. These points are extracted from a new adaptively extracted curvature function which is resistant against noise and transformations. Each corner is characterized by its subtended angle and its distance to the next corner. Then, corner sequences are analyzed by using HMMs. The method has been successfully tested for different databases. Its main advantage is that it can deal with incomplete and distorted corner sequences. q 2003 Elsevier B.V. All rights reserved. Keywords: 2D object recognition; Hidden Markov Models
1. Introduction Planar object recognition has been a main concern in the field of computer vision in the last decade, related to a wide range applications like handwriting recognition, vehicle plate numbers recognition, trademark logo identification or automatic reading of serial numbers on packing material [20]. Given an input bitmap Bc;s consisting of m £ n pixels and a fixed, finite set of objects C; the planar object recognition problem consists of determining which object in C is equivalent to Bc;s : It is necessary to note that Bc;s is usually distorted by the addition of noise, blurring, illumination changes, occlusion, sampling error and geometric distortions. Besides, Bc;s can also appear at different scales. Basically, the recognition process consists of analyzing either Bc;s [1,9], a linear or non-linear transformation of Bc;s [6,8,26] or a set of structural features of Bc;s [21,27,30] to recognize the input object. The correlation or template matching of the input bitmap against the defined set is a straightforward approach to recognition and, under certain circumstances, it can be reliable. After each object is isolated, the system tries to match it against a set of predefined templates. Unfortu* Corresponding author. Tel.: þ 349-95-213-7174. E-mail address:
[email protected] (C. de Trazegnies). 0262-8856/03/$ - see front matter q 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0262-8856(03)00074-X
nately, this process is typically slow, so input bitmaps are typically preprocessed by means of analytical transformations to reduce the data volume to be analyzed and to improve the separability of the objects. Unfortunately, any condition which causes an object to vary from the standard, like noise, perspectives, shadows, size or slant, is likely to confuse the engine and return a questionable result. Also, for methods working with a complete grayscale bitmap, it must be granted that Bc;s contains a single object plus background. This cannot be achieved if close objects present overlapping bounding-boxes [32]. Structural analysis relies on a decision-tree to assess the geometric features of the object contour after it has been segmented. The chosen features should be as resistant as possible against noise, transformations and distortions affecting the input bitmap. These methods include, for example, spline curve approximations, contour profiles, skeleton and graph descriptors [16,22,31] or edge based graph matching [15]. Structural techniques are reported to be more tolerant against transformations and distortions. However, structural features relying on the object global shape are not reliable when those shapes are partially occluded or distorted by dark shadows or spots, specially when the size of the segmented shape changes [32]. In these cases, recognition must rely on local features instead. Typical local object features are the number of loops, T-
880
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
joints, X-joints or relevant points [21,25,30], which can be detected on a given object even if part of it is significantly distorted. When several local features are used, the information they yield is typically combined in a statistical way to obtain the probability of being a given object. Thus, if some of these features are missed, the object can still be recognized. Particularly, relevant points have been widely used in this type of application because (i) they significantly reduce the number of contour points while maintaining a sufficiently accurate shape description; (ii) relevant point detection is a well-established area in computer vision and many detection algorithms are available; (iii) these points can be correctly detected even if part of the object is heavily distorted; (iv) relevant points are representative for all types of planar objects; (v) relevant point information can be combined in a straight way using well-known probabilistic methods. The main problem of relevant point based recognition is that these points may shift their position because of noise, distortions and transformations. Even though probabilistic methods may deal with moderate changes in a point sequence, these methods are likely to fail for significantly distorted sequences. This paper presents a character recognition structural technique for planar object recognition which relies on Hidden Markov Models (HMMs) to recognize a sequence of relevant points belonging to a given object. The most relevant feature of the proposed method is that it is very resistant against noise, transformation and distortions. To this purpose, a new method to extract relevant points from the curvature of the contour of an object in a stable way is presented in Section 2.1. Then, the resulting sequences of points are characterized as described in Section 2.2 and analyzed by means of HMMs as explained in Section 3. Experiments and results for different databases including objects subjected to different distortions are presented in Section 4. Finally, conclusions are given in Section 5.
2. Shape representation Relevant point detection methods can be broadly divided into two classes. The first group deals with a complete bitmap by evaluating its color gradients [18]. However, these methods are usually computationally expensive and sensitive to noise and illumination changes. Alternatively, other methods focus on extracting and processing the contour of an object to calculate its relevant points. These detection methods either rely on placing segmentation points by minimizing some error norm [23] or on identifying perceptually important points [28]. For rigid polygonal shapes, both techniques are accurate to represent a given object. However, the first group of techniques provides different results in terms of the number of points and their position in the presence of noise and orientation changes. Similarly, results are also unstable against small
deformations and transformations when shapes are characterized by slowly varying curvature. 2.1. Relevant points detection Most contour based relevant point detection methods search either for high curvature point (HCP) or for curvature zero-crossing points (ZCP). HCP based methods rely on detecting the corners on the contour of a shape. Basically, these methods differ in the way they represent the input contour. A common approach is to calculate the curvature function (CF) of the shape contour, because corners can be detected in a fast and easy way by simply thresholding the CF [4]. The main problem of this approach is that CFs are typically noisy and they consequently need to be filtered. Thus, most CF calculation methods implicitly or explicitly filter the function at a fixed cut frequency [2,5]. Since HCPs appear at different scales, some of these corners may be lost after filtering the CF. To solve this problem, curvature scale space (CSS) based methods [19] rely on detecting curvature ZCP over a wide range of increasingly filtered contours. The evolution of these points provides information about the contour nature. However, planar curves smoothed with a gaussian kernel suffer from shrinkage [17]. Hence, tracking and relevant point localization of HCPs become quite difficult for increasing filtered contours. Wavelet transform modulus maxima (WTMM) based methods are reported to be more effective in Ref. [11], because the exact location of the relevant points is determined with high precision by tracking the WTMM through the different decomposition levels. Also, WTMM based methods are faster than CSS based ones because only a few dyadic scales are used, instead of working with the full continuous scale decomposition. The main problem of WTMM based methods is that all scales need to be compared in order to determine the similarity between two objects. Also, feature vectors become very large because all detected relevant points need to be stored. In order to detect a correct and stable sequence of HCPs over a contour, Bandera et al. [7] propose an adaptively estimated curvature function (AECF). The AECF represents every contour point by the subtended angle between two contour flat segments that are considered to be adjacent to such point. The main advantage of the proposed AECF is that it correctly represents relevant points which appear at different natural scales in the processed contour. However, in order to obtain those relevant points, it is necessary to threshold the AECF at a fixed value. Although noise is adaptively removed by the AECF algorithm, it is unavoidable that some noise remains at the resulting CF. A fixed threshold cannot distinguish between peaks corresponding to real corners or to residual noise. A further filtering process would be non-adaptive and hence could lead to false HCP detection. In the present work we propose a variation on the AECF method [7]. The new AECF method represents every contour point by the local contour curvature value at that point. Every contour point corresponds to the angle
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
subtended between two contour flat segments adjacent to such point. This angle is calculated by the integration of the CF between two consecutive curvature ZCP around the selected point. Relevant points are the peaks of the CF. The main advantage of this method is that the inverse algorithm can be applied to the so obtained CF to recover the original contour, where the eventual noise has been adaptively removed. If additional filtering is necessary, the CF of progressively filtered contours can be calculated in an iterative way. The proposed method consists of the following stages. † Contour encoding by means of an incremental chain code. The incremental chain code associated to a given pixel n is a vector ðDxðnÞ; DyðnÞÞ which presents the difference in x and y between point n and n þ 1 of the contour. † For every point n; calculation of the maximum contour length kðnÞ free of discontinuities around n: The value of k for a given pixel n; kðnÞ; is calculated by comparing the Euclidean distance from pixel n 2 kðnÞ to pixel n þ kðnÞth neighbor, kn 2 kðnÞ; n þ kðnÞk2 to the real length of contour between both pixels, lmax ðkðnÞÞ: Both distances tend to be equal in absence of corners, even for noisy contours. Otherwise, kn 2 kðnÞ; n þ kðnÞk2 is significantly shorter than lmax ðkðnÞÞ: Thus, kðnÞ is the largest value that satisfies: kn 2 kðnÞ; n þ kðnÞk2 $ lmax ðkðnÞÞ 2 Uk
ð1Þ
Uk being a constant value that depends on the noise level tolerated by the detector. If Uk is large, kðnÞ tends to be large and some contour spikes might be softened, but if it is low, the resulting CF is very noisy. Fortunately, it is very easy to choose a suitable Uk : In most cases, Uk ¼ 0:4 works correctly [7]. † Calculation of the incremental adaptive chain code ðDxðnÞk ;DyðnÞk Þ associated to n: This new vector shows the variation in x and y between contour pixels n 2 kðnÞ and n þ kðnÞ and is equal to: DxðnÞk ¼
nþkðnÞ X
DxðjÞ;
DyðnÞk ¼
j¼n2kðnÞ
nþkðnÞ X
DyðjÞ
ð2Þ
j¼n2kðnÞ
881
† Calculation of the slope of the curve at every point n: We consider that the slope at point n can be approximated by the angle between the segment ðn 2 kðnÞ; n þ kðnÞÞ and the vertical axis. This angle is equal to: AngðnÞ ¼ arctan
DxðnÞk DyðnÞk
ð3Þ
† Calculation of the curvature at every point n: The curvature at every point n can be defined as the slope variation respect to n; dðAngðnÞÞ=dn: This value can be approximated by the incremental DðAngðnÞÞ=Dn;or locally by Angðn þ 1Þ 2 AngðnÞ: This final step represents the main difference with respect to the AECF method presented in Ref. [7] We assume that the contour of an object x is represented by a sequence of N corners Cx ¼ {C1x ; C2x ; …; CNx }; each one of them located at a peak or local extreme of the AECF. In order to test the performance of corner detection methods, Sarkar [29] proposed two widely accepted evaluation coefficients: the compression ratio (CR) and the integral square error (ISE). CR is the ratio between the number of contour pixels and the number of detected corners and the ISE evaluates the distance between the real contour and the polygonal approximation obtained from the detected corners. Most times, the CR and the ISE present opposite values because high compression rates tend to distort contours. Hence, Rosin [28] proposed a merit value M which combines a measure of efficiency E and a measure of fidelity F : sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffi ISEopt £ CRopt M ¼ F£E ¼ £ 100 ð4Þ ISEapprox £ CRapprox ISEopt, ISEapprox, CRopt and CRapprox being the approximation error and the CR on an optimal method and on the method under evaluation, respectively. Table 1 shows the results of different detection techniques over the contour of examples of a set of different planar objects. Two examples of shapes used for evaluation
Table 1 Performance of different corner detection methods Corner detection method
Dynamic programming [23] Split and merge [4] Adaptive method [7] Local histograms [5] Extended chain code [2] Gaussian filter [19] Proposed method
Fig. 1
Fig. 2
CR
ISE
Merit
CR
ISE
Merit
27.62 27.62 27.62 27.62 27.62 27.62 27.62
590.19 1494.45 1544.36 1817.68 1774.68 1897.90 1532.47
100 62.84 61.82 56.98 56.98 55.76 62.00
33.18 22.81 36.50 36.50 33.18 33.18 33.18
284.89 381.15 467.82 479.79 558.37 567.71 347.41
100 104 74.40 73.46 71.42 70.84 90.56
882
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
Fig. 1. Corners detected by using (a) dynamic programming [23]; (b) split and merge [4]; (c) adaptive method [7]; (d) local histograms method [5]; (e) extended chain code [2]; (f) gaussian filter [19]; (g) proposed method.
are presented in Figs. 1 and 2. The position of the corners detected with the methods under evaluation is marked with on the shape contours. The object in Fig. 1 presents corners at different natural scales, all of them important for object characterization. The object in Fig. 2 presents all corners at a similar scale, but it is corrupted with strong noise that introduces curvature peaks at different natural scales. In the case of the object in Fig. 1 all methods were forced to detect the same number of corners by adjusting their parameters. Consequently, CR is constant and the performance of the corner detection methods can be evaluated by comparing the value of the ISE coefficient. In the case of the object in Fig. 2, the noise produces a different number of false detections for each method. Therefore, the performance of each method must be evaluated by means of its merit coefficient. The best detectors according to these coefficients are those based on optimal polygonal approximations, as expected. However, as aforementioned, noise and transformations provoke serious shifting on the detected corners so that the merit coefficient of the polygonal approximation keeps being optimal while the position of
the corners might be significantly shifted. Hence, the polygonal approximation method is not appropriate for shape recognition. The second best method is the local histograms method, which yields a very high signal to noise ratio. Unfortunately, even though it locates corners correctly, it is difficult to find a set of parameters that properly characterize the detected corners. Hence, only corner location information is available for further shape recognition methods. However, corner location is not robust against scale transformations of the original shape. It can be appreciated that the proposed method behaves clearly better than the rest. Besides, corners are correctly characterized in this case because the proposed adaptive CF preserves angle information. To prove that, Fig. 3 shows an example of a contour recovered from a non-adaptively built CF (Fig. 3b) and from its adaptively built CF (Fig. 3c). The recovery is calculated by inverting the application of the curvature calculation. Every point on the recovered contour is located with respect to its preceding neighbor at a distance of one pixel and with a relative angle determined by the CF. It can be appreciated that the original contour in Fig. 3a can be
Fig. 2. Corners detected by using (a) dynamic programming [23]; (b) split and merge [4]; (c) adaptive method [7]; (d) local histograms method [5]; (e) extended chain code [2]; (f) gaussian filter [19]; (g) proposed method.
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
883
resistant against geometric transformations, noise and distortions.
3. Matching algorithm
Fig. 3. (a) Original shape; (b) shape reconstructed from a non-adaptively built CF; (c) shape reconstructed from an adaptively built CF.
correctly recovered from the AECF. In Fig. 3b the loss of information suffered by the non-adaptive CF distorts the recovered contour.
2.2. Corner characterization To characterize a sequence of N corners Cx ¼ {C1x ; C2x ; …; CNx }; each corner can be characterized by different features, being the easiest one its ðx; yÞ coordinates referred to the bitmap. However, these features are not invariant against simple geometric transformations. Corners are typically characterized by its position with respect to a reference point. Thus, in our case a given corner i is characterized by the parameters shown in Fig. 4. These parameters are (i) the corner subtended angle ðCfix Þ; which is equal integral of the CF over the contour length between two successive CF ZCP ahead and behind corner Cix ; (ii) the contour length ðCrix Þ between Cði21Þx and Cðiþ1Þx : It must be noted that these parameters are invariant to translation and rotation. Besides, if the contour length is normalized, the corner parameters are also scale invariant. If CFs were calculated in a non-adaptive way using a constant k-slope [2], angle information would not be reliable and additional features, typically the corner position, would be required. Usually, the corner position is referred to the centroid of the object to gain stability against corner shifting [10,33]. However, such a feature is sensitive to distortions affecting the position of the centroid, like spots or lines adjacent to a shape. It can be observed that the main advantages of our shape feature extraction method is that only two features are required for each corner in the contour. Those features are
Fig. 4. Corner characterization parameters.
Let x ¼ ½x0 ; x1 ; …; xN a feature vector with dimension N and Q ¼ q0 ; q1 ; …; qQ be a set of Q classes. The simplest approach to assess if the object related to vector x belongs to a given class qi is to calculate the distance between x and the prototype vector of class qi according to a chosen metric. However, this approach is only valid if all input vectors present the same dimension. Clearly, vectors may present different dimensions even for distorted versions of the same object for any corner based recognition method. To overcome this problem, some methods rely on corner based shape morphing or graph matching [3,15,30]. If results are not satisfactory, processes are performed in an iterative way to take into account possible missing points. However, recursive processes tend to be computationally expensive and, in case of severe distortion, processes might not converge. In order to obtain more reliable results, the probability of a given object of belonging to class qi when it yields the observed vector x is classically obtained by using the Bayes rule. However, simple Bayesian processes do not take advantage of sequentiality information: Instead, they can only provide the probability of a set of corners, which belongs to an observed shape, of being similar to another set of corners, which belongs to an stored pattern. The problem of sequential structures recognition is a typical application of Markov Model (MM) procedures. Particularly, HMMs have been applied to planar shape recognition based on contours. He and Kundu [13] report using continuous density HMMs to classify planar shapes. They segment closed shapes and exploit relations between consecutive segments for classification. The algorithm was reported to tolerate shape contour perturbation and some occlusion. However, it tended to be computationally expensive for long contours. Hornegger and Niemann [14] rely on HMM to recognize contours by using parts of the polygonal approximation of the whole contour. However, they do not provide results for complex nor distorted shapes, so it is difficult to know how the method behaves in these cases, specially since polygonal approximation methods tend to be sensitive to transformations and noise [4]. In the present work, the recognition of planar shapes is based on the sequentiality of a set of corners of the observed shape. The available sequence of corners is of a variable length. If every corner could be classified as belonging to a unique class, then the recognition problem could be solved using classical MMs. However, a given corner of a shape could be similar to several prototypes and it is not always possible to include it in an unique class. Hence, we use a recognition method based on HMMs.
884
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
3.1. HMM construction The choice of appropriate parameters to define the HMM is an important task for the reliability of the recognition process. In our case we define the main HMM parameters as follows. Our hidden states H ¼ {H 1 ; H 2 ; …; H i ; H M } are classes of corners. These hidden states do not correspond to a particular template, but rather to a particular class of corner which may appear in different templates in the database. To obtain the hidden states, we represent all the corners from all templates in the database in polar coordinates, being the modulus and phase of a corner Cip in template p equal to Crip and Cfip ; respectively. Then, we perform a K-means clustering [12] of all corners into M classes according to Crip and Cfip : It has been observed that the corner modulus is more stable against noise or deformations than the corner phase. Therefore, we define a clustering distance that weights the modulus and phase in a separate way: D2 ðCip ; Cjq Þ ¼ kCip ; Cjq k22 þ
cos2 u ðCrip 2 Crjq Þ2 sin2 u
We initialize the transition matrix Ap by evaluating the number of transitions between different corners at every template p: † An observation probability distribution matrix Bp ; whose dimension is M £ N; N being the number of corners of template p: Each coefficient Bpij of the matrix Bp is equal to the probability of corner Cip in template p of being at hidden state H j : To calculate a given coefficient Bpij ; a gaussian probability distribution related to the distance D between observed corner Cip and hidden state H j can be used. However, every observed corner might be near to several hidden states, each one of them belonging to several templates. Hence, the system might give unreliable results. In order to limit the number of coefficients Bpij that present high values, the matrix Bp is defined as follows: 1 1 Bpij ¼ pffiffiffiffi exp 2 pffiffiffiffi kij2 2ps 2ps
ð6Þ
ð5Þ
Cip being the ith corner of template p and Cjq the jth corner of template q; DðCip ; Cjq Þ the proposed distance and kCip ; Cjq k2 the Euclidean distance between Cip and Cjq : The relative weights of modulus and phase can be controlled by heuristically fixing the value of angle u: The M prototypes {H 1 ; H 2 ; …; H M } of the classes are the hidden states of the model. The number of classes is determined by setting the cluster radius. For a low radius value the number of classes is large the system is less tolerant to corner distortion or deformation. A large radius causes a large number of corners to be classified into every cluster, thus reducing the reliability of the recognition system. † An initial probability distribution is Pp ¼ ðp1 ; p2 ; …; pi ; …; pM Þ: Each coefficient pi of the initial probability distribution vector represents the probability of occurrence of the hidden state H i at the initial corner C1p of template p: The initial probability vector Pp for each template p is calculated as the a priori probability of finding the different corners of such a template at the first observed corner. † A transition matrix Ap is calculated for every template p in the database. The transition matrix Ap is evaluated for every template p by means of the Baum-Welch algorithm [24]. The Baum-Welch algorithm, derived from the expectation-maximization algorithm, is a local optimization method. Hence, the initialization of iterative parameters is critical for the performance of the system. The choice of the initial system parameters determines (i) the number of iterations needed to converge to a stable solution and (ii) the tendency to converge to an optimal or to a second-order maximum.
kij being an index ranging from 0 to N 2 1 which ranks the hidden states H according to the distance D between corner Cip and the prototype of class j; H j ; and s being the standard deviation of the gaussian distribution. This s is calculated so that only the two closest corners to the studied ones contribute to the distribution in a significant way. By defining the coefficients this way, the process is more resistant to potential corner shifting and loss due to noise and transformations. Models are constructed off-line a single time for the working database. As previously commented, the chosen training procedure is the Baum-Welch algorithm [24]. Each model can be optimized by applying the Baum-Welch algorithm to a single template or to a set of distorted samples of the corresponding object in the database. Our goal is to test the recognition rate based on the similarity of a distorted pattern with the stored template, therefore in the present work a single training template is used for each object. The training process is performed until the probability PðCp lQp Þ of template p of being recognized as the model Qp under training achieves a maximum. The Baum-Welch algorithm does not assures the convergence to a global maximum [24]. In general it converges to a local maximum. Thus, as mentioned before, the choice of appropriate initialization parameters is critical. 3.2. Query comparison When an observation consisting of a sequence of N corners Cx ¼ {C1x ; C2x ; …; Cnx } for an unknown input object x is acquired, its corresponding Bx matrix is calculated by the same procedure than matrices Bp : Each coefficient Bxij provides the probability of corner Cix of the observed sequence Cx of being hidden state H j : As we cannot observe directly the hidden states that corresponds to
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
our observed sequence Cx ; we must suppose that the sequence Cx can be generated by any possible sequence of hidden states S ¼ {S1 ; S2 ; …; SN }: Every state Si of S belongs to the hidden states set H: Thus, there are M N possible sequences S: The probability PðCx ; SlpÞ of the observed sequence Cx of being similar to a given template p with each hidden state sequence S is PðCx ;SlpÞ¼PðCx lSÞPðSlpÞ ¼pS1 BxC1x S1 AxS1 S2 BxC2x S2 AxS2 S3 ···BxCN21x SN21 AxSN21 SN BxCNx SN ð7Þ Thus, the probability PðCx lpÞ of the observed sequence Cx of being similar to template p; independently of which hidden state sequence has been generated, is equal to the sum of PðCx ;SlpÞ over all possible sequences S X ð8Þ PðCx lpÞ¼ PðClSÞPðSlpÞ allS
x
This calculation is usually performed in an iterative way to keep a bounded computational load. In our case, the calculation is performed by using the Forward –Backward Procedure [24]. It must be noted that each time a corner appears or disappears the probabilities of being equal to any template in the database significantly decrease. However, as long as false detections do not make the object more similar to a wrong template than to the correct one, the probability for the correct match is still considerably higher than the rest.
4. Experimental results The proposed system was tested within three different databases: † A reduced geometric database, including six shapes and 78 deformed versions of the templates. The deformations are created using image processing software, including horizontal and vertical perspectives, as well as shear, pinch and spherical projection effects. † An artificially created character database whose models present no segmentation errors. The database includes 26 templates and 338 deformed characters with the same type of deformation as in the case of the geometric database. † A character database extracted from license plates. It includes 23 character templates together with 667 distorted samples. The distorted shapes have been obtained from real vehicle plates, thus presenting noise, shadows, perspective deformations and segmentation errors. We included this database in order to test the performance of the proposed method with images obtained in a real environment.
885
There is not much work on HMMs applied to corner sequences, probably because of the instability of corner detection against noise, transformations and distortions. Thus, it is not easy to find a benchmark database to test this kind of algorithms. However, the authors of Ref. [10] kindly provided the geometric and the artificial character databases they use so that we could compare our results to theirs in equal conditions. As in their work, results are presented according to the retrieval rank. The retrieval rank is defined as the average of the retrieval index. The retrieval index is the order position occupied by the stored model which correctly corresponds to the input object. Thus, the closest is to 1, the best the retrieval rank is. For each one of these three databases a set of HMMs was created. The different content of each database determines the set of hidden states created by the corner clustering process described in Section 3.1. The cluster radius was chosen as to minimize the retrieval rank and it is equal to 0.075 for all three databases (Fig. 5). With this radius the number of hidden states for both the first and the second databases was equal to 12. For the third it presented 16 hidden states. The number of hidden states is similar for all three databases because it only depends on the corner representation and the clustering radius. In general, the performance of the system was satisfactory not only because most objects were correctly identified but also because errors were reasonable according to the system specifications. These errors were mostly due to the appearance of false corners induced by distortions. If those corners appear at positions where other objects present real corners and the rest of the corner sequence is similar for two otherwise different objects, they are likely to be confused. Even though the proposed corner detection technique is very resistant against noise and distortion, if the original image shows a shadow or a spurious spot in contact with the observed shape, the segmented shape might present new corners. These corners are detected by the system. This problem is intrinsic to the method as long as improvements on the detection method should be able to detect any corner presented by the observed shape, independently from its original content. Fig. 6 presents some correctly recognized planar objects belonging to the two character databases. Fig. 6a shows some examples extracted from the artificially created character database, where the first column in Fig. 6a shows the prototypes used to create the HMM. As can be appreciated in the figure, results were mostly correct except when distortions were excessive. Fig. 6b shows some of the characters extracted from real license plates in order to test the resistance of the method against spots touching or partially occluding the shape, noise and segmentation errors. As in the previous case, the first set of characters presents the ideal prototypes that were used to create the HMM. In this case, most characters were correctly recognized as well despite their usually damaged contours. Fig. 7 shows an example of how the probability of being a given object evolves when more corners of the object are
886
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
Fig. 5. Evolution of retrieval rank of the third database with the clustering radius for different phase weights.
studied. Fig. 7a shows a prototype of character E which is used to build its HMM. Fig. 7b shows different E characters captured from real license plates. These characters present different levels of noise and distortion. Fig. 7c shows how the probability of being the original E evolves for the characters in Fig. 7b with each detected corner. It can be observed that most letters are immediately recognized as Es. However, the second corner of final one is softened and, consequently distorted. The system recovers a little after detecting the third corner, but then a new unexpected corner
Fig. 6. Correctly recognized patterns: (a) artificial character database; (b) license plate database.
appears and the probability of being an E decreases again. Nevertheless, it can be observed that after the fifth observation the character is correctly recognized. Fig. 8 shows an example of wrong recognition. The input object (Fig. 8a) is too softened and the corner in the right side of the K is missed. Thus, the object presents a corner sequence which is more similar to that of an N than to that of a K: Hence, the system believes after four observations that the object is an N: Even if all corners were analyzed, the probability of being an N would still be higher than the probability of being a K because the rest of the sequence is very similar. It can be observed that if two sequences of corner are similar except for a few corners and those corners are missed, the system is likely to malfunction. Fig. 9 shows a set of deformed input shapes and the list of the six most similar shapes according to the similarity values returned by the proposed algorithm. It can be
Fig. 7. Recognition of distorted objects: (a) prototype and corners; (b) input characters and corners; (c) evolution of the probability of being the prototype with each detected corner.
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
887
Table 3 Performance of method in Ref. [10] and of the proposed recognition method for the artificial character database
Fig. 8. Wrong recognition of a distorted object: (a) input character and corners; (b) K and N prototypes and corners; (c) evolution of the probability of being each of the prototypes with each detected corner.
observed that the system can even deal with moderate overlapping. The retrieval rank for the first two databases against various shapes and deformations are listed in Tables 2 and 3, respectively, both for the proposed method and for the method in Ref. [10]. The average retrieval ranks were equal for both aforementioned methods. The geometrical database gives a retrieval rank of 1.29 with the proposed method and
Shape
[10]
Proposed method
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
1.00 3.54 1.54 1.85 1.85 1.61 1.08 1.00 1.39 1.00 1.08 1.08 1.54 1.16 2.30 1.00 2.15 1.61 1.23 1.08 1.08 1.16 1.85 1.23 1.23 1.23
1.08 1.58 2.00 1.83 1.00 1.33 1.08 1.00 2.08 1.08 1.58 1.00 1.42 1.25 1.00 1.00 1.92 1.00 1.42 1.08 1.92 1.33 1.25 1.50 2.67 2.92
Average
1.47
1.47
Table 4 Performance of the proposed recognition method for license plate characters database
Fig. 9. Query results for geometric database: (a) query shapes; (b) list of most similar shapes ordered by similarity values.
Table 2 Performance of method in Ref. [10] and of the proposed recognition method for the geometrical database Shape
[10]
Proposed method
Tree Cross Ellipse Rectangle Star Triangle Average
1.29 1.14 1.00 1.57 1.00 1.93 1.32
1.00 1.29 1.07 1.50 1.00 1.86 1.29
Shape
Proposed method
A B C D E F G H J K L M N P R S T U V W X Y Z
1.20 1.15 2.45 1.35 1.00 1.00 1.90 1.00 1.40 1.05 1.30 1.30 1.30 1.20 1.00 1.45 1.50 2.25 1.10 2.60 1.10 1.15 1.00
Average
1.38
888
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889
Fig. 10. (a) Original picture; (b) characters extracted from the picture and their detected corners; (c) prototypes of recognition results and their corners.
1.32 with the method in Ref. [10]. The artificial character database gives a retrieval rank of 1.47 for both the proposed method and the method in Ref. [10]. It can be observed that the results of the proposed algorithm are only slightly better than those in Ref. [10] when using their databases. However, our system presents a clear systematical advantage when compared to theirs. In Ref. [10], corners are characterized by four parameters, being two of them referred to the centroid of the object. When the shape of the object is very distorted by shadows or overlapping, its centroid may significantly shift. Consequently, parameters referred to the centroid position would clearly present wrong values. In our case, only two parameters are used to characterize each corner and none of them is referred to the centroid. Consequently, our method can deal with nonlinear deformations and overlapping characters, like the damaged A in Fig. 6b. In Table 4 the retrieval rank for the license plate numbers is presented. In this case real noise and distortion, which typically affects the shape centroid position and the stability of the corner detection, is tested. It can be observed that even in this case the retrieval rank keeps a low value, even lower than that of the artificial character database. In order to test the performance of the proposed method under severe conditions, some handwritten characters, similar to those from vehicle plates have been presented to the system. In order to properly recognize numbers, 10 number templates were added to the model. The result is shown in Fig. 10. Fig. 10a shows the original picture, containing handwritten characters. Fig. 10b presents the observed characters, once segmented, and the position of their detected corners. In Fig. 10c, the prototypes of results and the position of their corners are presented. It can be observed that the stability of the corner detection method, despite of the extraction method of the characters, allows the successful recognition results. Finally, it must be noted that the efficiency of the proposed method is not limited only to character recognition applications. Character sets have been chosen in the present work as test databases in order to proof the good performance of the system when applied to complex shapes subjected to
severe distortions. Hence, the proposed method can be successfully applied to generic planar shape recognition.
5. Conclusions A new planar object recognition method based on HMMs has been presented in this paper. The proposed method is resistant against translation, rotation, scale and noise. Shapes are characterized by sequences of HCPs, which are extracted from a new AECF. This detection method is valid for HCPs at different natural scales. Each HCP is characterized by only two features: its corner subtended angle and the distance between the previous and the following HCPs, which are resistant against geometric transformations. Besides, those features are not referred to any given point, like a centroid, so they are also resistant against distortions changing the global shape of the object. Finally, HCP sequences are analyzed by means of HMMs, providing resistance against corner shifting and loss. Hence, the algorithm is also flexible to severe non-rigid distortions and occlusions.
Acknowledgements This work has been partially supported by the Spanish Ministerio de Ciencia y Tecnologia (MCYT), project No. TIC2001-1758. We would also like to thank Mr Chang and Mr Chen, from Yuan-Ze University (Republic of China), for providing some of the databases used in this work.
References [1] K. Aas, L. Eikvil, T. Andersen, Text recognition from grey level images using hidden Markov models, Lecture Notes in Computer Science 970 (1995) 503–508. [2] G. Agam, I. Dinstein, Geometric separation of partially overlapping nonrigid objects applied to automatic chromosome classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (11) (1997) 1212– 1222.
C. de Trazegnies et al. / Image and Vision Computing 21 (2003) 879–889 [3] Y. Amit, D. Geman, K. Wilder, Joint induction of shape features and tree classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 1300–1306. [4] N. Ansari, E. Delp, On detecting dominant points, Pattern Recognition 24 (5) (1991) 441–451. [5] F. Arrebola, P. Camacho, A. Bandera, F. Sandoval, Corner detection and curve representation by circular histograms of contour chain code, Electronics Letters 35 (13) (1999) 1065–1067. [6] R.R. Bailey, M. Srinath, Orthogonal moment features for use with parametric and non-parametric classifiers, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (4) (1996) 369–398. [7] A. Bandera, C. Urdiales, F. Arrebola, F. Sandoval, Corner detection by means of adaptively estimated curvature function, Electronics Letters 36 (2) (2000) 124 –126. [8] S.O. Belkasim, M. Shridhar, A. Ahmadi, Pattern recognition with moment invariants: a comparative study and new results, Pattern Recognition 24 (1991) 1117– 1138. [9] M. Bokser, Omnidocument technologies, Proceedings of the IEEE 80 (1992) 1066–1078. [10] F.S. Chang, S.Y. Chen, Deformed shape retrieval based on Markov model, Electronic Letters 36 (2) (2000) 126 –127. [11] F.A. Cheikh, A. Quddus, M. Gabbouj, Shape recognition based on wavelet transform modulus maxima, Proceedings of VII International Conference on Electronics Circuits and Systems (ICECS 2000), Beirut, Lebanon, 2000, pp. 461– 464 [12] J.A. Hartigan, A k-means clustering algorithm, Applied Statistics 28 (1979) 100–108. [13] Y. He, A. Kundu, 2-D shape classification using hidden Markov models, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (11) (1991) 1172–1184. [14] J. Hornegger, H. Niemann, Optimization problems in statistical object recognition, in: M. Pelillo, E.R. Hancock (Eds.), Proceedings on the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, Lecture Notes in Computer Science, 1223, Springer, Heidelberg, 1997, pp. 311–326. [15] B. Huet, E.R. Hancock, Shape recognition from large image libraries by inexact graph matching, Pattern Recognition Letters 20 (1999) 1259–1269. [16] F. Kimura, M. Shridhar, Handwritten numerical recognition based on multiple algorithms, Pattern Recognition 24 (10) (1991) 969–983. [17] T. Lindberg, Scale space for discrete signals, IEEE Transactions on Pattern Analysis and Machine Vision 12 (3) (1990) 234–254.
889
[18] R. Liu, C. Chu, Y. Hsueh, A modified morphological corner detector, Pattern Recognition Letters 19 (1998) 279–286. [19] F. Mokhtarian, A. Mackworth, Scaled-based description and recognition of planar curves and two-dimensional shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (1) (1986) 34–43. [20] S. Mori, C.Y. Suen, K. Yamamoto, Historical review of OCR research and development, Proceedings of the IEEE 80 (1992) 1029–1058. [21] J. Park, V. Govindaraju, S.N. Srihari, OCR in a hierarchical feature space, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (4) (2000) 400– 407. [22] T. Pavlidis, A vectorizer and feature extractor for document recognition, Computer Vision, Graphics and Image Processing 35 (1986) 111 –127. [23] J.Y. Pee´rez, E. Vidal, Optimum polygonal approximation of digitized curves, Pattern Recognition Letters 15 (1994) 743 –750. [24] L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77 (2) (1989) 257–286. [25] S.R. Ramesh, A generalized character recognition algorithm: a graphical approach, Pattern Recognition 22 (4) (1989) 347– 350. [26] T.H. Reiss, The revised fundamental theorem of moment invariants, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (1991) 830–834. [27] J. Rocha, T. Pavlidis, A shape analysis model with applications to a character recognition system, IEEE Transactions on Pattern Analysis 16 (4) (1994) 393– 405. [28] P.L. Rosin, Augmenting corner descriptors, Graphical Models and Image Processing 58 (3) (1996) 286–294. [29] D. Sarkar, A simple algorithm for detection of significant vertices for polygonal approximation of chain-coded curves, Pattern Recognition Letters 14 (1993) 959–964. [30] S. Singh, Shape detection using gradient features for handwritten character recognition, Proceedings of the 13th International Conference on Pattern Recognition (ICPR’96), Vienna, Austria, vol. III, 1996, pp. 145–149. [31] T. Taxt, J.B. Olafsdottir, M. Daehlen, Recognition of handwritten symbols, Pattern Recognition 23 (11) (1990) 1155–1166. [32] O.D. Trier, A.K. Jain, T. Taxt, Feature extraction methods for character recognition: a survey, Pattern Recognition 29 (1996) 641–662. [33] P. Zhu, P.M. Chirlian, On critical point detection of digital shapes, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (8) (1995) 737–748.