Directional Pattern Matching for Character Recognition Revisited Hiromichi Fujisawa and Cheng-Lin Liu Central Research Laboratory, Hitachi, Ltd. 1-280 Higashi-koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan E-mail: {fujisawa, liucl}@crl.hitachi.co.jp Abstract Directional features have been successfully used for the recognition of both machine-printed and handwritten Kanji characters for the last decade. This paper attempts to explain why the directional features are effective. First, the advances of directional features and related methods are briefly reviewed. Then the properties that the similarity measure should hold are discussed and simulation experiments of directional pattern matching are conducted to validate the properties. This analysis is expected to inspire the design of new and more effective features.
1. Introduction Dated back to 1980s, we were still wondering how complex Kanji (Chinese) characters with as many as over 3,000 classes could be efficiently recognized. In those years, many intuitive features were invented, often without discussing the properties of similarity measure [1]. Those features include the projection profile, peripheral profile, local profile, stroke extraction and relaxation matching, etc. However, none of these survives to become a major approach to Kanji recognition. The so-called directional feature appeared as early as in the end of 1970s [2,3]. The work by Yasuda and Fujisawa was dealing with handwritten digit recognition, and due to this, it was often overlooked when Kanji recognition methods were discussed. This work presented two major points that are not intuitive. The first point is that the blurring operation is essentially important for feature extraction. It actually had been discussed by Iijima that the blurring operation could provide a good effect on recognition by pattern matching [4]. What is not intuitive here is that the blurring parameter can be larger than our expectations. The second important point in the paper of Yasuda and Fujisawa [3] is that the decomposition of character images into separate image patterns according to local stroke direction is so effective that it raises the optimum blurring parameter considerably. Larger blurring means more absorption of minor details and minor changes. What is not intuitive here is again the blurring parameter
whose optimum value is surprisingly large. This means that the directional feature patterns, which are twodimensional gray-scale images, can be sampled into a smaller mesh size. The method presented in [3] used a thinning procedure to extract the local stroke direction, which might cause unwanted distortion in the extracted skeleton. An improved method introduced spatial differentiation to extract the local gradient direction [2]. By spatial differentiation, the same method can be applied to grayscale images as well as to binary ones. The directional feature has been applied successfully to the recognition of both machine-printed and handwritten Kanji characters for the last decade. Despite of the great success, however, it has not been well explained why the directional feature is effective. An evidence of the plausibility of directional features comes from physiology that the cells in the visual cortex of animals are selective to the orientation of perceived objects [5]. Whereas this clue inspired the application of directional features in machine vision problems, this paper will show the effectiveness of directional features from the viewpoint of pattern matching, that the similarity measure of directional patterns exhibits some desirable properties. In the following, we will briefly review the advances of directional features and related methods. Then to validate the effectiveness of directional pattern matching, we present the simulation results on synthesized patterns to show that the similarity measure holds some desirable properties.
2. Directional Features and Relatives The general routine of directional feature extraction is as follows. After normalizing the character image into standard size, directional planes are generated each recording the local stroke components in a specific direction or orientation. To reduce the dimensionality of features, each directional plane is partitioned into a number, say, 8 by 8, of blocks, and the feature strengths of each block are averaged. To smooth the margins of blocks, the directional planes are better reduced by lowpass filtering and down-sampling [6]. The low-pass
Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE
filtering without down-sampling was used to called blurring [3]. Keeping the original resolution is actually equivalent to over-sampling in the filtered space. Oversampling is beneficial to recognition using simple classifiers such as template/pattern matching. For sophisticated classifiers, the down-sampling brings little loss of recognition accuracy. In decomposing the character image into directional planes, there are several ways to determine the local stroke direction: skeleton direction, stroke segment direction, contour chaincode, gradient, etc. Rather than extracting local stroke direction from the skeleton after thinning, Yamashita et al. directly decomposed the stroke segments into directional patterns [7]. In this way, the strength of directional feature is sensitive to the stroke width. On binary images, the local stroke direction can be extracted from the stroke contour as the chaincode directions. Because the thickness of contour is constant (one pixel), the directional feature extracted this way is less sensitive to the stroke width. Since character recognition experiments are mostly based on binary images, this strategy has been prevalently adopted [8,9]. For gray-scaled images (either scanned in gray-scale or converted from binary ones by filtering), the local stroke direction can be extracted using gradient operators. The Kirsh operators can be used to straightforwardly extract directional features in 4 orientations or 8 directions [10]. The Kirsh masks of different directions, however, are not orthogonal. On the other hand, the Sobel operators are more often used to calculate the gradient components of two (horizontal and vertical) directions [2]. The two components are synthesized to determine the (arbitrary) gradient direction and magnitude. For decomposing the character image into directional patterns, the gradient direction can be partitioned into discrete regions [11] or the gradient vector is decomposed into components in specific directions [12]. The superiority of directional features has been justified in the comparison of character recognition algorithms [13,14]. Some modifications or extensions have been given to directional features. Hamanaka et al. proposed to extract normalization-cooperated directional features. By skipping the normalization procedure, the computation cost is largely saved [15]. Teow and Loe used some sophisticated convolution masks to extracted stroke directions and stroke ends from character images [16]. Their stroke direction masks are similar to gradient operators. The supplementation of stroke end features or other structural features (convexity/concavity, curvature, etc.) to direction features helps further improve the accuracy of character recognition [17,18]. Viewing the gradient or direction masks as the impulse response functions of spatial filters, directional feature extraction is actually a band-pass spatial filtering
procedure. In this view, the directional feature is closely related to other spatial filtering methods, particularly the Gabor filter. Being selective to orientation and spatial frequency, the Gabor filter has been widely used in computer vision, and recently, in character recognition as well [19-21]. In the following, we will show that the frequency response function of gradient masks for direction feature extraction is very similar to that of Gabor filter. The gradient masks shown in Fig. 1 are designed for the purpose of analyzing the frequency responses of directional edge detectors. The upper two masks are the Sobel operators, and the lower two ones are given for analyzing the edge detectors in diagonal directions, though they are not actually used in directional feature extraction. The masks are centered in 64x64 plane and the frequency responses are computed by discrete Fourier transform (DCT). The frequency responses of the four masks in Fig. 1 are shown in Fig. 2. The frequency responses show that the edge detection masks behave very well as band-pass filters selective to orientations. Moreover, the frequency response areas resemble Gaussian distribution, so the edge detection masks effect like the Gabor filter.
Fig. 1. Gradient masks of four directions
Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE
Fig. 2. Frequency responses of gradient masks
3. Similarity Features
Properties
of
Directional
Due to the large number of classes, Kanji recognition used to be accomplished using a simple classifier, typically, template or pattern matching. The extraction of discriminative features is crucial to the recognition performance of pattern matching. Though sophisticated classification is now available for Kanji recognition, pattern matching still survives for, e.g., coarse classification, and the evaluation of features with pattern matching makes sense for various classifiers. In this section, we analyze the properties of similarity measure of directional pattern matching. To certain extent, these properties explain the effectiveness of directional features in character recognition.
3.1. Desirable similarity properties Very few technical papers have discussed the desirable properties of similarity measure. Yamada discussed the property of “continuity” in the superimposition method, i.e., template matching [22]. Hereof we think that the desirable similarity measure should have the following properties: Continuity: when patterns are given small shape variations, the change of similarity measure should be small. Such variations can be parallel translation, stroke thickness changes, and angular (directional) changes. For character recognition, the similarity should decrease with the increasing angle between strokes. Particularly, two strokes that are perpendicular with each other should have a very small similarity. The similarity should be insensitive to the change of stroke width when other differences (direction, position, etc.) are small.
3.2. Simulation results We conduct some experiments on simulated stroke patterns to investigate the similarity properties on varying direction, stroke width, and position. The similarities of directional features will be compared to those of image patterns. In the experiments, each pattern is stored in a 64x64 binary image. Five types of features are used for comparing patterns. First, the 64x64=4096 pixel values are arranged to a feature vector, as we call Binary. The binary image is reduced to 16x16 mesh by block averaging and Gaussian filtering (blurring with downsampling). The resulting 256-dimensional feature vectors
are called Block and Blur, respectively. For directional pattern matching, the binary image is decomposed into four orientation planes according to the local contour direction of chaincode. Each orientation plane is reduced to 8x8 by block averaging or Gaussian filtering. The values of four planes are concatenated into a feature vector, as called 4-Block and 4-Blur, respectively. The blurring parameter of Gaussian filter is determined according to the sampling interval [6]. Representing two patterns in feature vectors x 1 and
x 2 , the similarity is calculated as the correlation coefficient:
Simi (x1 , x 2 ) =
< x1 , x 2 > || x1 || ⋅ || x 2 ||
In the first experiment, we generate strokes with variable direction but fixed location, length, and thickness. The stroke length and thickness are 33 and 7 pixels, respectively. The stroke direction ranges from 0 (horizontal) to 90 degree (vertical). Some stroke images are shown in Fig. 3. The similarity measure is calculated between each pattern and the horizontal pattern. The similarities of varying directions are shown in Fig. 4.
Fig.3. Synthesized strokes in varying directions In the second experiment, we compare the stroke patterns of varying stroke widths. The two strokes to compare are centered at the same location. One stroke pattern has the thickness of 7 pixels, and the thickness of the other one ranges from 1 to 15. The similarities between two stroke patterns of same direction are shown in Fig. 5 and those between two stroke patterns in perpendicular are shown in Fig. 6. In the third experiment, the similarities between stroke patterns of varying translation are calculated. The stroke direction (horizontal) and thickness (7 pixels) are fixed and one stroke is translated in perpendicular to the stroke direction with variable distance (from 0 to 20). The similarities are shown in Fig. 7. In Fig. 4, the similarity oscillates around the angle of 45 degree due to the discontinuity of discrete images. Except this, the similarity measure approximately changes
Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE
continuously with the angle between two strokes. We can see that for either binary image matching or directional pattern matching, the similarity decreases with the increasing angle. The directional features (4-block and 4blur) show desirable property that when the angle is greater than 45 degree, the similarity nearly vanishes, while for binary image pattern, the similarity is appreciable even for perpendicular strokes. As to the effect of dimensionality reduction, it is evident that either block averaging or blurring improves the similarity. For directional feature, the blurring operation gives higher similarity than block averaging when the angle is small.
It is shown in Fig. 5 that the similarity of binary image pattern matching without dimensionality reduction decreases rapidly with the increasing difference of stroke width. This effect is alleviated by block averaging and blurring, whereas the similarity of directional pattern matching is much less sensitive to stroke width. In the figure, the similarity of 4-block drops abruptly when one stroke pattern has one-pixel width (difference –6). This is because for one-pixel width stroke, the two edges totally overlap such that the feature values change abruptly. Fig. 6 shows that when two strokes are perpendicular, the similarity of directional features remains very small.
Fig. 4. Similarities of varying stroke directions
Fig. 6. Similarities of varying stroke widths between two patterns in perpendicular
Fig. 5. Similarities of varying stroke widths between two patterns of same direction
Fig. 7. Similarities of variable translations between two parallel strokes
Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE
In Fig. 7, it is shown that the similarity between two stroke patterns monotonically decreases with the increasing distance between them. Directional feature extraction improves the similarity compared to that of binary pattern matching, and the blurring operation makes the similarity changes more smoothly. From the results of Figs. 4-7, we can conclude that directional feature matching is more selective to stroke direction and less sensitive to stroke width and translation than image pattern matching, while blurring makes the similarity changes more smoothly than block averaging.
4. Conclusion This paper briefly reviews the advances of directional features and related methods for character recognition and experimentally analyzes the similarity measure of directional pattern matching. The desirable properties of directional features and the successful applications in character recognition are mainly due to the directional selectivity of the feature detectors. To improve the performance of character recognition, more informative and discriminative features such as curvature and largescale stroke segments can be detected using complicated masks or synthesized from directional maps.
[8]
[9]
[10] [11] [12]
[13] [14]
[15]
References [1]
[2] [3]
[4] [5] [6]
[7]
M. Umeda, “Advances in Recognition Methods for Handwritten Kanji Characters,” Special Issue on Character recognition and Document Understanding, IEICE Trans. Information & Systems, Vol. E-79, No. 5, pp. 401-410, 1996. H. Fujisawa and O. Kunisaki, “Method of Pattern Recognition,” Japanese Patent 1,520,768 granted in 1989, filed in 1979. M. Yasuda, H. Fujisawa, “An Improved Correlation Method for Character Recognition,” Systems, Computers, and Controls, Vol. 10, No. 2, pp. 29-38, 1979 (Translated from Trans. IEICE Japan, Vol. 62-D, No. 3, pp.217-224, 1979). T. Iijima, H. Genchi and K. Mori, “A Theoretical Study of the Pattern Identification by Matching Method,” Proc. First USA-JAPAN Computer Conference, 1972. D. H. Hubel and T. N. Wiesel, “Functional Architecture of Macaque Monkey Visual Cortex,” Proc. Royal Society London B, Vol. 198, pp. 1-59, 1977. C.-L. Liu, Y.-J. Liu, R-W. Dai, “Preprocessing and Statistical/Structural Feature Extraction for Handwritten Numeral Recognition,” Progress of Handwriting Recognition, A.C. Downton and S. Impedovo (Eds.), World Scientific, 1997, pp.161-168. Y. Yamashita, K. Higuchi, Y. Yamada, Y. Haga, “Classification of Handprinted Kanji Characters by the
[16]
[17]
[18]
[19]
[20]
[21]
[22]
Structured Segment Matching Method,” Pattern Recognition Letters, Vol.1, pp.475-479, 1983. F. Kimura, K. Takashina, S. Tsuruoka, Y. Miyake, “Modified Quadratic Discriminant Functions and the Application to Chinese Character Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.9, No.1, pp.149-153, 1987. F. Kimura, T. Wakabayashi, S. Tsuruoka, Y. Miyake, “Improvement of Handwritten Japanese Character Recognition Using Weighted Direction Code Histogram,” Pattern Recognition, Vol.30, No.8, pp.1329-1337, 1997. S.-W. Lee, “Multilayer Cluster Neural Network for Totally Unconstrained Handwritten Numeral Recognition,” Neural Networks, Vol.8, No.5, pp.783-792, 1995. G. Srikantan, S.W. Lam, S.N. Srihari, “Gradient-Based Contour Encoder for Character Recognition,” Pattern Recognition, Vol.29, No.7, pp.1147-1160, 1996. A. Kawamura, et al., “On-line Recognition of Freely Handwritten Japanese Characters Using Directional Feature Densities,” Proc. 11th ICPR, The Hague, 1992, Vol.II, pp.183-186. D.S. Lee, S.N. Srihari, “Handprinted Digit Recognition: A Comparison of Algorithms,” Proc. 3rd IWFHR, Buffalo, 1993, pp.153-162. C.-L. Liu, K. Nakashima, H. Sako, H. Fujisawa, “Handwritten Digit Recognition Using State-of-the-Art Techniques, Proc. 8th IWFHR, Niagara-on-the-Lake, Canada, 2002, pp.320-325. M. Hamanaka, K. Yamada, J. Tsukumo, “NormalizationCooperated Feature Extraction Method for Handprinted Kanji Character Recognition, Proc. 3rd IWFHR, Buffalo, 1993, pp.343-348. L.-N. Teow, K.-F. Loe, “Robust Vision-Based Features and Classification Schemes for Off-Line Handwritten Digit Recognition,” Pattern Recognition, Vol.35, No.11, pp.2355-2364, 2002. J.T. Favata, G. Srikantan, S.N. Srihari, “Handprinted Character/Digit Recognition Using A Multiple Feature/Resolution Philosophy,” Proc. 4th IWFHR, Taipei, 1994, pp.57-66. M. Shi, Y. Fujisawa, T. Wakabayashi, F. Kimura, “Handwritten Numeral Recognition Using Gradient and Curvature of Gray Scale Image,” Pattern Recognition, Vol.35, No.10, pp.2051-2059, 2002. A. Shustorovich, “A Subspace Projection Approach to Feature Extraction: the Two-Dimensional Gabor Transform for Character Recognition,” Neural Networks, Vol.7, No.8, pp.1295-1301, 1994. Y. Hamamoto, S. Uchimura, K. Masamizu, S. Tomita, “Recognition of Handprinted Chinese Characters Using Gabor Features,” Proc. 3rd ICDAR, Montreal, 1995, pp.819-823. X. Wang, X. Ding, C. Liu, “Optimized Gabor Filter Based Feature Extraction for Character Recognition,” Proc. 16th ICPR, Quebec, Canada, 2002, Vol.4, pp.223-226. H. Yamada, “Continuous Nonlinearity in Character Recognition,” Special Issue on Character recognition and Document Understanding, IEICE Trans. Information & Systems, Vol. E-79, No. 5, pp. 423-428, 1996.
Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) 0-7695-1960-1/03 $17.00 © 2003 IEEE