Pattern Recognition 34 (2001) 2515}2522
Slant estimation algorithm for OCR systems E. Kavallieratou*, N. Fakotakis, G. Kokkinakis Wire Communications Laboratory, Electrical Computer Engineering, University of Patras, 26500 Patras, Greece Received 13 December 1999; accepted 10 October 2000
Abstract A slant removal algorithm is presented based on the use of the vertical projection pro"le of word images and the Wigner}Ville distribution. The slant correction does not a!ect the connectivity of the word and the resulting words are natural. The evaluation of our algorithm was equally made by subjective and objective means. The algorithm has been tested in English and Modern Greek samples of more than 500 writers, taken from the databases IAM-DB and GRUHD. The extracted results are natural, and almost always improved with respect to the original image, even in the case of variant-slanted writing. The performance of an existed character recognition system showed an increase of up to 9% for the same data, while the training time cost was signi"cantly reduced. Due to its simplicity, this algorithm can be easily incorporated into any optical character recognition system. 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Slant estimation; Script writing; Wigner}Ville distribution; Character recognition
1. Introduction Handwritten text is usually characterized by slanted characters. In particular, the slanted characters slope either from right to left or vice versa. Moreover, di!erent deviations may appear not only within a text but also within a single word. Some examples illustrating these cases are shown in Fig. 1. Slanted characters constitute a common feature of any natural language with a Latinstyle alphabet (e.g., English, Modern Greek, etc.). For example, the percentage of slanted writing in IAM-DB [1] database of English reaches 77% while the corresponding percentage in GRUHD, a database of Modern Greek (1000 writers are included) [2], approaches 59%. The above rates were provided by manually counting the forms with apparent slanted writing either to the left or to the right, according to the human judge. Furthermore, slanted characters appear in both hand-printed and cursive writing.
* Corresponding author. Tel.: #30-61-991722; fax: #30-61991855. E-mail address:
[email protected] (E. Kavallieratou).
Consequently, a robust optical character recognition (OCR) system has to be able to cope with slanted characters. Watanabe [3] conducted comparative experiments showing that slant normalization minimizes the error of recognition. To a further extent, the two most important problems that may arise from the existence of slanted characters, in regard to an OCR system, are the following: E The application to slanted words of a character segmentation procedure, if such is required, that produces vertical segment boundaries (e.g., based on histograms), could result in defectively segmented characters as well as in noisy segments. E Both the computational cost of the training procedure and accuracy of the recognition stage would be a!ected in a negative way. Indeed, concerning a single character, the amount of the training data required for covering as many slant angles as possible is substantial. Moreover, the classi"cation of a given character into the correct class is much harder since the set of possible classes is now bigger. Therefore, the majority of recent OCR systems contain a preprocessing stage dealing with slant correction [4}7].
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 1 5 3 - 9
2516
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
Fig. 1. Examples of slanted word.: (a) a word slanted to right (b) a word slanted to left (c) a variant-slanted word.
This stage is usually located before the segmentation module, if it exists, or just before the recognition stage otherwise. The most commonly used method for slant estimation is the calculation of the average angle of near-vertical strokes [4}6]. This approach requires the detection of the edges of the characters and its accuracy depends on the particular characters included in the word. Shridar [7] presents two more methods concerning slant estimation and correction. In the "rst one, the vertical projection pro"le is used while the second one involves making use of the chain code of entire border pixels. Vinciarelli [8] presents a technique based on a function S() that provides a measure of the slant absence across the word. The calculation of such function relies on the vertical density histogram that is easier to be obtained than the direction of the strokes. For each angle in an interval [!153, 153], a shear transform t is ? applied and the following histogram is calculated: h (m) H (m)" ? , 0)m(nCol, ? y(m) where h (m) is the value of the vertical density histogram ? of the image shear transformed by the angle . y(m) is the di!erence between the maximum and minimum y coordinates of the foreground pixels in the m column and nCol is the number of columns in the image. The number of foreground pixels of each column is divided by the distance between the highest and the lowest pixel giving H (m)"1, if the column contains a continuous stroke, ? and H H (m)3[0,1] otherwise. ? Then, the maximum quantity is computed: h (i). ? G&? G
S()"
The value for which S() is maximum, is assumed as slant estimate and the corresponding shear transform t , ?Y when applied on the desloped original image, gives the deslanted image. Although time consuming in the shearing transformation procedure, the above method proved to decrease the error rate by 12% compared to that proposed by Bozinovic [9]. However, the evaluation of the slant correction approaches is di$cult since the slants may vary even within a single word. Additionally, in the relevant literature a slant correction procedure is rarely evaluated separately, so that comparative results cannot be given. In this paper a new method for slant estimation is presented based on a combination of the projection pro"le technique and the Wigner}Ville distribution. Moreover, it uses a simple and fast shearing transformation technique. Our approach is character independent and can easily be adapted in order to satisfy the requirements of any OCR system. Its performance is measured in relation to the improvement of the results of a character recognition system. In the next section the Wigner}Ville distribution (WVD) is brie#y presented. The algorithm is described in Section 3. Finally, some experimental results are given in Section 4 and the conclusions drawn by this study are included in Section 5.
2. Wigner}Ville distribution An important chapter of signal processing is the nonstationary signals, that is, signals whose characteristics vary with time contrary to stationary signals that are time-independent. Therefore, in order to succeed a better representation of non-stationary signals joint timefrequency distributions are used. A "rst class of time-frequency representations is the atomic decomposition or linear time}frequency representations. These distributions decompose the signal on the basis of elementary signals (i.e. the atoms) which have to be well localized in both time and frequency. However, there is a trade-o! between time and frequency resolutions, as the decomposition is succeeded by windowing the signal. A good time resolution requires a short window; on the other hand, a good frequency resolution requires a long window. This is a consequence of the time}frequency resolution relation via Heisenberg} Gabor inequality [10]. In contrast, the energy distributions, another class of time}frequency representations, distribute the energy of the signal over two description variables: time and frequency. The starting point is that since the energy of a signal x can be deduced from the squared modulus of either a signal or its Fourier transform,
E " V
> > x(t) dt" X( f ) df, \ \
(1)
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
we can interpret x(t) and X( f ) as energy densities, respectively, in time and in frequency. It is, then natural, to look for a joint time and frequency energy density (t, f ), such that V
E " V
> > (t, f ) dt df, V \ \
(2)
which is an intermediary situation between those described by Eq. (1). As the energy is a quadratic function of the signal, the time}frequency energy distributions will be, in general, quadratic representations. A very well-known representative of the energy distributions and member of the Cohen's class [11], is the Wigner}Ville distribution:
=(t, f )"
> z(t#/2)z(t!/2)e\DO d, \
> Z( f#u/2)Z( f!u/2)e SR du. \
1. The word image is arti"cially slanted to both left and right for di!erent slant angles. The maximum slant angle is 453 approximately and the slant angle step depends on the height of the word image, as described below. 2. For each of the extracted word images, the vertical histogram is calculated. 3. The WVD is calculated for all the above histograms. 4. The curves of maximum intensity of the WVDs are extracted. 5. The curve of maximum intensity with the greatest peak, corresponding to the histogram with the most intense alternations, is selected. In Fig. 3 several curves of maximum intensity are shown. 6. The corresponding word image is selected as the most non-slanted word.
(3)
where z(t) represents the analytical signal associated with the signal s(t). The WVD can also be expressed as a function of the spectrum of the signal, Z( f ), under analysis as follows: =(t, f )"
2517
(4)
The WV function is a particularly popular distribution due to the large number of desirable mathematical properties it satis"es. Claasen and Mecklenbrauker [10] proved the uniqueness of the Wigner}Ville distribution `in the sense that it is that single energy distribution that possesses all the stated desirable propertiesa. This justi"es the numerous applications of WVD in Pattern Recognition [12,13], Synthesis [14], Seismic signal [15], Optics [16], etc.
3. The algorithm The vertical projection pro"le of a non-slanted word presents the most dips between the characters, even if they are connected, and the highest peaks at the main body of the characters. The latter is much more evident when ascenders and descenders are included. In the case of slanted words, the otherwise vertical strokes of the characters cover now the intra-character gaps. Hence, the dips of the histogram are less deep, while the peaks are smoother. The alternations (i.e., between dips and peaks) of the vertical projection pro"le of a certain word are more intense when it is non-slanted than slanted. In Fig. 2 the vertical projection pro"les for di!erent slants of the same word are shown. These slanted words were produced automatically by applying the technique described below. The proposed algorithm for slant estimation and correction for a given word, consists of six steps:
In order to slant a word image to right or left we follow the procedure below. The word image is segmented in equally wide horizontal zones. The lowest zone is considered to be the base. The zone above the base is shifted one pixel to the right or to the left. The next zone (if it exists) is shifted two pixels to the same direction, etc. Thus, each pixel p(x, y) of the image, is shifted to the point (x, y): x"x#i, Z(i(Z,
0(Z)h
(5)
and y"y,
(6)
where successive zones are identi"ed by successive ordinals Z and h the height of the image. The corresponding slant will be tan "Z!1/h.
(7)
The more the zones, the greater the slant angle. The maximum slant angle corresponds to one-pixel-wide zones (i.e., when the amount of zones is equal to the height h of the word image in pixels). In this case, the higher zone is shifted by h!1 pixels and the corresponding slant angle is the maximum one and it can be calculated as tan "h!1+1N "453.
(8)
To illustrate the above procedure, the gradual slanting of a vertical stroke is shown (in enlargement) in Fig. 4, while in Fig. 5 we can see the maximum slant of a nonslanted word to left and right. Notice that the words produced by that technique are very natural (as can be seen in Fig. 2). A maximum slant angle of $453 covers the vast majority of handwritings. Shifting each zone by more than one pixel would increase the maximum slant angle. However, it could cause undesirable disconnections between adjacent zones producing an unnatural outcome.
2518
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
Fig. 2. Vertical projection pro"les of a word with various slants. For each slant the slant angle and the number of horizontal zones are as follows: (a) !453!101, (b) !303!31, (c) !153!27, (d) 03!1, (e) 153!27, (f) 303!31, (g) 453!101.
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
2519
Fig. 3. Curves of maximum intensity that correspond to the histograms of Fig. 2.
Fig. 4. The gradual slanting of a vertical stroke in enlargement.
Fig. 5. The maximum slant of a word to left and right.
4. Experimental results The presented slant removal algorithm has been tested on a collection of word images taken from the IAM-DB [1] and the GRUHD [2] databases comprising English
and Modern Greek unconstrained handwriting, respectively. In more detail, more than 1500 word images were used taken from approximately 500 di!erent writers, selected randomly. The word segmentation was performed automatically based on an OCR preprocessing system [18]. As already mentioned, the evaluation of a slant removal method is di$cult since the selection of the most appropriate result very often falls under subjective judgements (especially in case of dealing with variant-slanted words). Moreover, a slant removal algorithm can
2520
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
Fig. 7. The system into which the proposed algorithm has been incorporated.
Fig. 6. Some experimental results (leftdown: original word images, rightdown: corrected word images).
indirectly be evaluated by taking into account the improvement it provides to an existing OCR system. Nevertheless, the application of our method to the above mentioned data produced very satisfactory results. Some examples are shown in Fig. 6. In general, it is clear that even in the hardest cases the produced word is considerably improved as regards its processing in further stages (e.g., character segmentation, character recognition, etc.). It is worth noting that the already non-slanted words are not a!ected in a negative way by applying this algorithm whether the characters are connected or not. The ascenders and descenders, if they exist, play an important role in slant estimation since the alternations of peaks and dips in the histogram are getting more evident. However, the proposed method can also handle words without any ascender or descender. Regarding the variant-slanted words, the slant of the majority of the vertical strokes included in the word is more likely to determine the "nal slant angle since more peaks of the histogram will be generated. The presented algorithm has been incorporated into the character recognition system, shown in Fig. 7, aiming at the automatic processing of document images. The system, except of slant removing, includes six other main modules, namely skew angle estimation and correction, printed-handwritten text discrimination, line segmentation, word segmentation, character segmentation and recognition, stemming from the implementation of already existing as well as novel algorithms. The proposed
technique could be applied to words or directly to the text line images resulted from line segmentation. However, the uneven valleys of the vertical histogram, i.e. wide valleys between words narrow valley between characters, could give confusing results in some cases. This is the reason that we preferred to use part of the text lines instead. Assuming that every page corresponds to only one writer, a skew angle is estimated per page. The longer and the most solid the text parts, the more accurate the estimation procedure will be, since they include more information and even more histogram. In order to select these parts, the valleys of the vertical histogram of a text line, with width greater than a threshold are considered to be the boundaries between the parts. The 10 longest from the resulted parts were selected. As a threshold, the 1/10 of the line height was used. However, any threshold small enough to sense word segment valleys or smaller is appropriate. The resulted parts usually are entire words or parts of words. The slant estimation technique is applied to them and the average of the estimated slants is considered to be the slant of the page. In Fig. 8 a document image of IAM-DB as inserted in the system and after the slant correction is shown. In Fig. 9 the dependence of the recognition accuracy on the amount of training samples per characters with and without the application of the slant removing algorithm are presented, respectively. The results have been obtained from tests applied to 200 forms (100 IAM-DB forms and 100 GRUHD forms). In the case of IAM-DB the system was trained with samples taken by NIST [17] database, while for the GRUHD, training sets, di!erent from the testing sets, of the same database were used. As compared with the initial recognition system (i.e., when no slant correction was performed), the required training data for achieving similar performance were reduced by more than one-third. The computational cost of using the proposed technique depends strongly on the size of the document.
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
2521
Fig. 8. An example of IAM-DB document image (a) as inserted into the system and (b) after the slant removing.
Fig. 9. Recognition accuracy versus the amount of training samples per character, (a) before and (b) after the incorporation of the slant removing algorithm.
However, the whole slant removing procedure for the document of Fig. 8 requires 29.516 s using a Pentium III at 300 MHz while just the slant estimation part demands 22.72 s.
5. Conclusions In this paper, we presented an algorithm for slant removal. In contrast to current techniques, our method is character independent since it is based on the intent alternations of the vertical histogram, indicating the vertically oriented characters, rather than detecting the almost vertical strokes that may be included in the word. The WVD was used in order to estimate the slant angle
that can range between $453 according to the original position. The evaluation of our algorithm was made by both subjective and objective means. First, the algorithm was applied to isolated word image samples from both English and Modern Greek databases. The extracted results are natural, and almost always improved with respect to the original image, even in the case of variantslanted writing. Then the algorithm was applied to entire document images by incorporating it into an OCR system. The performance of the character recognition system was increased by up to 9% for the same data, while the training time cost was signi"cantly reduced. Almost any OCR system can bene"t from our algorithm since it requires little computational cost and it is easily adapted. Cases where characters within a single word have to be corrected by di!erent slant angles cannot be handled by our approach since it is based on the dominant angle. However, we currently work on providing the most accurate results.
References [1] U. Marti, H. Bunke, A full English sentence database for o!-line handwriting recognition. Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR'99, Bangalore, 1999, pp. 705}708. [2] E. Kavallieratou, N. Liolios, E. Koutsogiorgos, N. Fakotakis, G. Kokkinakis, The GRUHD database of Modern Greek Unconstrained Handwriting, LREC2000, Athens, vol. 3, 1999, pp. 1755}1759.
2522
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
[3] M. Watanabe, Y. Hamammoto, T. Yasuda, S. Tomita, Normalization techniques of handwritten numerals for Gabor "lters, Proceedings of the International Conference on Document Analysis and Recognition, ICDAR IEEE, Los Alamitos, vol. 1, CA, 1997, pp. 303}307. [4] G. Kim, V. Govindaraju, E$cient chain-code-based image manipulation for handwritten word recognition, Proceeding of SPIE-The International Society for Optical Enginering, Bellingham, vol. 2660, WA, USA, 1996, pp. 262}272. [5] S. Knerr, E. Augustin, O. Baret, D. Price, Hidden Markov model based word recognition and its application to legal amount reading on french checks, Comput. Vision Image Understanding 70 (3) (1998) 404}419. [6] A.W. Senior, A.J. Robinson, An o!-line cursive handwriting recognition system, IEEE Trans. Pattern Anal. Mach. Intell. 20 (3) (1998) 309}321. [7] M. Shridar, F. Kimura, Handwritten address interpretation using word recognition with and without lexicon, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Piscataway, vol. 3, NJ, USA, 1995, pp. 2341}2346. [8] A. Vinciarelli, J. Luettin, O!-line cursive script recognition based on continuous density HMM, Proceedings of the Seventh International Workshop on Frontiers in Handwriting Recognition, Amsterdam, September 2000. [9] R.M. Bozinovic, S.N. Srihari, O!-line cursive script word recognition, IEEE Trans PAMI 11 (1) (1989) 68}83. [10] T.A. Claasen, W.F. Mecklenbrauker, The Wigner distribution: a tool for time-frequency signal analysis, Phillips J. Res. 35 (Parts 1}3) (1980) 217}250, 276}300, and 372}389.
[11] L. Cohen, Generalized phase-space distribution functions, J. Math. Phys. 7 (1966) 781}786. [12] B. Boashash, B. Lovell, L. White, Time frequency analysis and pattern recognition using singular value decomposition of the Wigner}Ville distribution, advanced algorithms and architecture for signal processing, Proc. SPIE 828 (1987) 104}114. [13] G. Cristobal, J. Bescos, J. Santamaria, Application of Wigner distribution for image representation and analysis, Proceedings of the IEEE Eighth International Conference Pattern Recognition, 1986, pp. 998}1000. [14] K.B. Yu, S. Cheng, Signal synthesis from Wigner distribution, Proceedings of the IEEE ICASSP 85, 1985, pp. 1037}1040. [15] P. Boles, B. Boashash, The cross Wigner}Ville distribution-a two dimensional analysis method for the processing of vibrosis seismic signals, Proceedings of the IEEE ICASSP 87, 1988, pp. 904}907. [16] O. Kenny, B. Boashash, An optical signal processing for time}frequency signal analysis using the Wigner}Ville distribution, Journal of Electrical Electronic Engineering, Australia, 1988, pp. 152}158. [17] R. Wilkinson, J. Geist, S. Janet, P. Grother, C. Burges, R. Creecy, B. Hammond, J. Hull, N. Larsen, T. Vogl, C. Wilson, 1992. The First Census Optical Character Recognition Systems Conference CNISTIR 4912. The U.S. Bureau of Census and the National Institute of Standards and Technology. Gaithersburg, MD. [18] E. Kavallieratou, N. Fakotakis, G. Kokkinakis, An Integrated system for Handwritten Document Image Processing, under reviewing.
About the Author*ERGINA KAVALLIERATOU was born in Kefalonia, Greece, in 1973. She received the Diploma in Electrical and Computer Engineering in 1996 from the Polytechnic School of the University of Patras. She received her Ph.D. in Handwritten Optical Character Recognition and Document Image processing from the same department. During the academic year 1997}1998 she was a member of the Signals, Systems and Radiocommunications Laboratory of the Dept. of Telecommunications Engineering of the Polytechnic School of Madrid, working on Image Processing. Her research interests include Optical Character Recognition, Document Image Analysis and Image Processing. About the Author*GEORGE KOKKINAKIS was born in Chios, Greece. He received the Diploma in Electrical Engineering (Dipl.-Ing.) in 1961, the Doctor's Degree in Engineering (Dr.-Ing) in 1966 and the Diploma in Engineering Economics (Dipl.-Wirt.-Ing) in 1967, all from the Technical University of Munich (Technische Hochschule Munchen). Since 1969 he is with the Department of Electrical Engineering (Electrical and Computer Engineering since 1995) at the University of Patras, where he has organized and is directing the Wire Communications Laboratory (WCL). His current activity in research and development includes, apart from telecommunications subjects, the analysis, synthesis, recognition and linguistic processing of the Greek language. He has published several books on Telecommunications and Electrotechnology and over 250 technical papers, articles and reports on Speech and Language Technology and on Telecommunications. Dr. Kokkinakis is a senior member of IEEE and a member of the Technical Chamber of Greece (TEE), the VDE (Verein Deutscher Elektrotechniker), ISCA (International Speech Communication Association), the EURASIP (European Association for Signal Processing), the Linguistics Society of America (LSA) and the GORS (Greek Operations Research Society). Since 1997 he is a member of the board of ISCA. About the Author*DR. NIKOS FAKOTAKIS received the B.Sc. degree from the University of London (UK) in Electronics in 1978, the M.Sc. degree in Electronics from the University of Wales (UK), and the Ph.D. degree in Speech Processing from the University of Patras, (Greece), in 1986. From 1986 to 1992 he was lecturer in the Electrical and Computer Engineering Dept. of the University of Patras, from 1992 to 1999 Assistant Professor and since 2000 he has been Associate Professor in the area of Speech and Natural Language Processing and Head of the Speech and Language Processing Group at the Wire Communications Laboratory. Dr. Fakotakis is author of over 100 publications in the area of Speech and Natural Language Engineering. His current research interests include Speech Recognition/Understanding, Speaker Recognition, Spoken Dialogue Processing, Natural Language Processing and Optical Character Recognition. Dr. Fakotakis is a member of the Executive Board of ELSNET (European Language and Speech Network of Excellence), Editor-in-Chief of the `European Student Journal on Language and Speecha, WEB-SLS. He is also a member of IEEE, TEE, EURASIP, ISCA.