Recognition of English Multi-Oriented Characters - Semantic Scholar

Report 2 Downloads 33 Views
Recognition of English Multi-oriented Characters U. Pal, F. Kimura*, K. Roy and T. Pal CVPR Unit, Indian Statistical Institute, 203 B. T. Road, Kolkata-108, INDIA, *Faculty of Engineering, Mie University, 1577 Kurimamachiya-cho, TSU, Mie 514-8504, Japan Email: [email protected] Abstract There are some printed artistic documents where text lines may be curved in shape. As a result, characters of a single line may be multi-oriented. To handle such artistic documents, in this paper, we present a scheme towards the recognition of multi-oriented and multi-sized English characters. The features used here are invariant to character orientation and computed based on the angular information of the border points of the characters. We used modified quadratic discriminant function (MQDF) for recognition. We tested our proposed scheme on a dataset of 18232 characters and obtained 98.34% accuracy from the system.

1. Introduction There are many documents where text lines of a single page may not be parallel to each other. These text lines may have different orientations or the text lines may be curved in shape. Examples of two artistic documents are shown in Fig.1. In this paper, we present a system for recognition of multi-oriented and multisized English characters.

(a) (b) Fig.1. Examples of artistic documents collected from (a) newspaper (b) ICPR-2002 call for papers. There are a few pieces of published work on artistic document recognition [1-5]. Adam et al. [1] used Fourier Mellin Transform for multi-oriented symbol and character recognition from engineering drawings. Some

of the multi-oriented characters handling approaches consider character realignment [4]. Based on the types (horizontal text, vertical text, curved text, inclined text etc.) of the text, the characters in a text line are realigned horizontally. Next, OCR techniques are used on this re-aligned horizontal text for recognition. Main drawback of these methods is the distortion due to realignment of curved text. Parametric eigen-space method is used by Hase et al. [2] for rotated and/or inclined character recognition. Xie and Kobayashi [5] proposed a system for multi-oriented English numeral recognition based on angular patterns. Recently, Pal et al. [8] proposed a method for English isolated character recognition based on zoning and contour distance features. To handle multi-oriented characters, in this paper the features used are mainly based on the angular information of the external and internal border points of the characters. To make the features rotation invariant, we compute the frequencies of different angles obtained by three consecutive border points. We will get only seven different angles and the frequency of these seven angles will be similar even if the character is rotated at any angle in any direction. For feature detection, at first, a character is divided into four concentric rings considering CG (Center of Gravity) as center and radii of the rings are in arithmetic progression. Next, each ring is divided into four equal blocks. Thus for a character we have sixteen blocks and frequencies of seven angles of each block are used as the feature. So, for a character we get 16×7 = 112 features. These 112 dimensional features are fed to the modified quadratic discriminant function for recognition.

2. Data collection and Preprocessing For experiments 18232 characters are collected mainly from journals, magazines, newspaper, advertisements and computer printouts. We used a flatbed scanner for digitization. Digitized images are in gray tone with 300

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

dpi and stored as TIF Format. We have used a histogram based global binarizing algorithm [7] to convert them into two-tone (0 and 1) images (Here ‘1’ represents object point and ‘0’ represents background point). The digitized image may contain spurious noise points and irregularities on the boundary of the characters, leading to undesired effects on the system. For removing these noise points we have used a method discussed in [7]. Documents written in Times New Roman and Arial fonts are considered here. Both uppercase and lowercase letters were considered. Since we consider uppercase and lowercase letters so we should have 52 classes (26 for uppercase and 26 for lowercase). But because of shape similarity here we have 37 classes. Since we are considering arbitrarily rotation (any angle up to 360 degrees) the characters “p” and “d” are considered same. This is because we will get similar shape of “p” if we rotate the character “d” by 180 degrees. Similar situations occur for “q” and “b”, “n” and ‘u’ etc. Lowercase “o” and uppercase “O” are similar in shape. Lowercase “l” and uppercase “I” are also similar in shape. Some more similar characters are “C’ and “c”, N and z, z and Z etc.

3. Feature extraction Given a two-tone image, we first find contour points of the image by the following algorithm. For all object point in the image, consider a 3 x 3 window surrounded to the object point. If any one of the four neighboring points (as shown in Fig.2.) is a background point then this object point (P) is considered as border point. Both external and internal border points are used for feature extraction.

are R1, R2, R3 and R4, respectively. Here R1-R2 = R2-R3 = R3-R4. Here, R1 is the distance of the furthest border point of the character from its CG. See Fig.4 where four rings with their radii are shown on the character ‘k’. For segmentation of rings into four equal parts, we find the border point having maximum distance from CG of the character. Let this border point be P1, and CG of the character be C1 (see Fig.4(a)). We draw a line through P1 and C1, and call this line as reference line (see Fig.4.(b)). We also draw another line perpendicular to P1C1 and passing through the CG. These two mutually perpendicular lines divide each of the rings into four equal segments. We consider the segment that starts from P1 as segment number 1 (say S1). If we move anti-clockwise direction from this segment then other consecutive segments are considered as segment number 2 (S2), segment number 3 (S3) and segment number 4 (S4). See Fig.4(b) for different segments of the rings. Thus for a character we get four rings and each rings will be divided into four equal blocks. As a result, for a character we have sixteen blocks and frequencies of seven angles of each block are used as the features. These sixteen blocks and the frequencies of the seven angles of each block are shown in Fig.5 for two different samples of the English character ‘k’. From Fig.5 it can be seen that the histograms of the angle values of these two samples are very similar although one sample is normal ‘k’ and the other is rotated ‘k’.

Fig.2. Four neighboring points are shown. P denotes the point of interest and X denotes the neighboring points. The features used are mainly based on the angular information of the external and internal border points of the characters. We compute the frequencies of the angles made by three consecutive border points of the characters. From three consecutive points we will get two angles (one from background side and other from foreground side). The angle made by background side is considered here (as shown in Fig.3.). We can have 7 different angles from three consecutive border points. These seven different angles are 360o, 315o, 270o, 225o, 180o, 135o and 90o. Seven different angles are shown in Fig.3. For a character, frequency of each angle will be same even if the character is rotated at any angle in any direction. For feature detection, a character is divided into four concentric rings (circles) considering the centre of gravity (CG) of the character as the centre. The radii of the rings are in arithmetic progression. Let for a character the radii (outer to inner) of these rings

Fig.3. Examples of seven different angles obtained from boundary points. (Here gray points are object points (foreground) and boundary points are marked by deep gray). White pixels denote background.

(a) (b) Fig.4. Examples of different (a) rings and (b) segments. Maintaining an order, numbering of sixteen blocks is done as follows. Block numbering starts from S1.

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

Starting from S1 if we move anti-clockwise then the st blocks obtained from outer ring (R1) are designated as 1 , nd rd th 2 , 3 and 4 block. Similarly, starting from S1 if we move anti-clockwise then the blocks obtained from the th th th th ring R2 are designated as 5 , 6 , 7 and 8 block. Other blocks are obtained in similar way.

this straight portion only 180 degree value is obtained in our feature computation method. When this character is rotated at 30 degree in clockwise direction (as shown in Fig.6(b)) we will get feature value for 180 degree as well as 180±45 degrees from the above portion because of the digital effect of such rotation. To get uniform value, we detect the digital straight line [7] portions of the border points and we assume only 180 degree value in these portions for our feature extraction.

4. Recognition Recognition of characters is carried out by using a modified quadratic discriminant function (MQDF) [6] as follows: g ( X ) = ( N + N 0 − n − 1) ln[1 +

λi

k

1 [|| X − M || 2 Noσ 2

{Φ T ( X − M )}2 ]] + No 2 i σ N k No 2 i ln( λ + σ ) ∑ N i =1 −∑ i =1

(a)

λi +

(1)

where X is the feature vector of an input character; M is a mean vector of samples;

(b) Fig.5. Extracted feature are shown. (a) feature for normal character ‘k’ (b) feature for rotated ‘k’. Angular frequencies of the corresponding blocks of (a) and (b) show similar behavior although one is computed from normal a sample of normal ‘k’ and other is computed from a rotated ‘k’. To get size independent features we normalize them. For normalization we compute maximum value of the angle frequencies from all the 16 blocks. We divide each of the above features by this maximum value to get the feature values between 0 and 1. Sometimes for a character we may get two or more points having the longest distance from CG of the characters. In this case we compute the features considering each of these points. As a result, we may get two or more feature vectors for some of the characters. Digital effect may occur when a straight line is rotated a certain angle (except 90 degree or its multiple angles). Because of this digital effect our feature value may differ. For examples see Fig.6(a). Here a straight portion of the character “L” is shown by dotted box. In

ΦTi is the ith eigen vector of the

sample covariance matrix; λi is the ith eigen value of the 2 sample covariance matrix; n is the feature size; σ is the initial estimation of a variance; N is the number of learning samples; and No is a confidence constant for σ, and No is selected experimentally. We do not use all the eigen values and their respective eigen vectors for the classification. We sort the eigen values in descending order and use first 40 eigen values and their respective eigen vectors for the classification.

(a) (b) Fig.6. Example of Digital Straight line. Here a digital straight portion is shown by dotted box on the character ‘L’. (a) straight portion is vertical (b) straight portion is slanted.

5. Results and discussion We have used a dataset of 18232 characters for the experiment. Both uppercase and lowercase letters of Times New Roman and Arial fonts are considered here. Variable sized characters with different orientations were used for the experiment. We have tested our results using 5-fold cross validation technique. For this purpose, we divided the dataset into 5 parts. We trained our system on 4 parts of the divided dataset and tested on remaining part of the data. On an average 98.34% accuracy was obtained from the proposed scheme. We also noticed

0-7695-2521-0/06/$20.00 (c) 2006 IEEE

that 99.23% recognition accuracy was obtained when first two top choices of the output are considered. Recognition results of different choices are shown in Table 1. We noticed that most of the errors occurred due to similar shape structures. Examples of some misrecognized character pairs with their error rate are shown in Table 2. From the table it can be seen that highest error (0.39%) occurs from the character pair ‘e’ and ‘c’. This is because some of the samples of ‘e’ shows similar shape of ‘c’ after binaraization. For example see the character ‘e’ shown in Fig.7. This character looks similar to ‘c’. The second highest error (0.10%) occurs from the character pair ‘X’ and ‘K’. Table 1: Recognition result on different choices. Number of choices from top Accuracy Only 1 choice 98.34% Only first 2 choices 99.23% Only first 3 choices 99.59% Only first 4 choices 99.73% Characters with font-size between 10 to 28 point are considered for experiments. From the experiment we also noticed that better results were obtained in case of bigger font-size characters. We obtained 99.32% accuracy when documents written in 16 or more fontsize were considered. Also, from the experiment we noticed that upper-case characters provided better results than lower-case characters. The advantage of the proposed recognition method is that it does not depend on size and rotation of a character. The proposed method will also work for poor data. Table 2: Examples of some erroneous results in decreasing order of error rate. Character Mis-recognized as Error rate e c 0.39% X K 0.10% z s 0.08% l x 0.03% t r 0.02%

Fig.7. An example of a binary image of character ‘e’. Comparison of the results: To get an idea about the comparative results we compared our results with some of the existing techniques. Xie and Kobayashi [5] applied their method only on 10 numerals and they obtained 97% accuracy. Adam et al. [1] received 97.5% accuracy on English characters. Previously we used NN

classifier on the same feature set discussed in this paper and obtained 97.14% average accuracy. From the present experiment we noted that MQDF based classifier gives 1.2% better result than NN classifier. Drawback of the proposed approach: Main drawback of the proposed approach is that it cannot distinguish ‘b’ and ‘q’, ‘p’ and ‘d’, n and u, etc. This is because of the use of rotation invariant features. As we mentioned earlier, if we rotate the character ‘b’ 180 degree then we get the character ‘q’. This is true for ‘p’ and ‘d’, ‘n’ and ‘u’, etc. Because of this fact our proposed method cannot distinguish these character pairs. We plan to use contextual information of the other characters in a word to recognize such characters present in that word. Note that if we restrict the character rotation up to 180 degrees then our method can recognize such characters correctly.

6. Conclusion Here, we present a MQDF based scheme towards the recognition of multi-oriented and multi-sized English isolated characters. The features used are mainly based on the angular information of the border points of the characters. The main advantage of the proposed recognition method is that it does not depend on size and rotation of a character. At present 98.34% accuracy is obtained from the proposed system.

7. References [1] S. Adam, J. M. Ogier, C. Carlon, R. Mullot, J. Labiche and J. Gardes, “Symbol and Character recognition: application to engineering drawing”, Int. Journal of Document Analysis and Recognition, vol.3, 2000, pp. 89-101. [2] H. Hase, T. Shinokawa, M. Yoneda and C. Y. Suen, “Recognition of Rotated Characters by Eigen-space”, Proc. 7th ICDAR, 2003, pp. 731-735. [3] H. Hase, M. Yoneda,T. Shinokawa and C. Y. Suen, “Alignment of Free layout colour texts for character recognition”, Proc. 6th ICDAR, 2001, pp.932-936. [4] S. X. Liao and M. Pawlak, “On Image Analysis by moments”, IEEE Trans. on PAMI, vol.18,1996, pp. 254-266. [5] Q. Xie and A. Kobayashi, “A construction of pattern recognition system invariant of translation, scale-change and rotation transformation of pattern”, Trans. of the Society of Instrument and Control Engineers, vol. 27, 1991, pp.11671174. [6] F. Kimura and M. Shridhar, “Handwritten numeral recognition based on multiple algorithms”, Pattern Recognition, vol. 24, 1991, pp. 969-983. [7] B. B. Chaudhuri and U. Pal, “A complete printed Bangla OCR system”, Pattern Recognition, vol. 31, 1998, pp. 531-549. [8] U. Pal, P. Dey, N. Tripathy and A. Dutta Choudhury, “Recognition of Multi-oriented and Multi-sized English Characters”, Proc. Int. Conference on Knowledge Based Computing, 2004, pp. 358-367.

0-7695-2521-0/06/$20.00 (c) 2006 IEEE