A Formalization of On-line Handwritten Japanese Text Recognition free from Line Direction Constraint Masaki Nakagawa, Bilan Zhu and Motoki Onuma Graduate School of Technology, Tokyo University of Agriculture and Technology, 2-24-16 Naka-cho, Koganei-shi, Tokyo, 184-8588 Japan
[email protected] Abstract This paper presents a formalization of an on-line writing-box free, line-direction free handwritten Japanese text recognition and its effect. By normalizing character orientation, even text of arbitrary character orientation can be recognized. The method evaluates the likelihood composed of character segmentation, character recognition, character pattern structure and context. The likelihood of character pattern structure considers the plausible height, width and gaps within a character pattern that appear in Chinese characters composed of multiple radicals (subpatterns). We show how the newly modeled factors in the likelihood affect the overall recognition rate.
1. Introduction Demand to remove writing constraints from on-line handwriting recognition is getting larger and larger since people can write more freely on enlarged surfaces of tablet PCs, electronic whiteboards and on new paperbased handwriting environments such as the Anoto pen, e-pen and so on. On such surfaces, Asian people whose languages are Chinese origin often write text horizontally, vertically or even slantingly in a mixed way. The research started from horizontal text recognition without character writing boxes, which are common on PDAs to avoid segmentation problem. Murase et al. made an initial attempt by applying DP-matching to find the best interpretation of a character pattern sequence [1]. However, the likelihood of segmentation was not considered and the likelihood of context (as Japanese text) was used only for verification. Fukushima et al. considered the likelihood based on character segmentation, shape (recognition), context, and character size [2]. By incorporating the factor of character size in determining the likelihood, it becomes possible to perform better evaluation than the past techniques [3, 4] on Japanese text that includes various sizes of character patterns ranging from simple numerals to complex Chinese characters consisting of multiple radicals, thereby
raising the recognition rate. Senda et al. published a similar approach to the above method and formulated the problem as a search for the most probable interpretation of character segmentation, recognition and context, but they did not deal with character size variations [5]. Most of the previous publications and systems have been assuming only horizontal lines of text, while we have been trying to relinquish any writing constraint from on-line text input. We proposed a method to recognize mixtures of horizontal, vertical and slanted lines of text with assuming normal character orientation [6], and a revised method with arbitrary character orientation [7]. This paper presents a formalization of on-line writing-box free, line-direction free handwritten Japanese text recognition and the effect of the newly introduced factors in the likelihood evaluation. According the formalization, we improved the overall recognition rate. Section 2 defines line direction and character orientation. Section 3 describes character orientation normalization that liberates the method even from character orientation constraint. Section 4 presents line direction quantization. Section 5 presents the probabilistic model and Section 6 presents the effect. Section 7 concludes this paper.
2. Line direction and character orientation Here, we define some terminologies. A stroke means a series of pen-tip coordinates sampled from pen down to pen up. Character orientation is used to specify the direction of a character from its top to bottom while line direction is used to designate the writing direction of a sequence of characters until it changes (Fig. 1). Although the line direction is the same as common sense, the character orientation might be the opposite from it. We define them in this way since they are consistent with pen-tip movement direction to write Japanese characters. A text line is a piece of text separated by new-line and large space and it is further divided into text line elements at the changing points of writing direction. Each text line element has its line direction (Fig. 2). The line direction and the character orientation are independent.
0-7695-2128-2/04 $20.00 (C) 2004 IEEE
Character orientation Line direction
Fig. 1. Line direction and character orientation.
Fig. 2. Text line element and line direction.
3. Character Orientation Normalization By rotating a text line element so that its character orientation is normal, we only have to consider arbitrariness in text line direction. The method is to find two perpendicular peaks in pen trajectory direction from a text line element as shown in Fig. 3 and rotates it so that the two peaks become rightward and downward. This is because Japanese character patterns are mainly made by rightward and downward pen movement. When there are not so many characters in a text line element, however, character orientation estimation may not be deterministic. Then, several candidate orientations are assumed and the best candidate producing the highest score of total recognition is selected. Once character orientation is normalized, we can make a model of recognition assuming characters are normally placed [7].
between strokes (off-stroke) to the right is a good cue for segmentation but a leftward stroke (on-stroke) or offstroke is useful to merge preceding strokes crossed by the on/off-stroke. In general, when we know the line direction, off-strokes to the line direction are employed for segmentation but on/off-strokes to the opposite direction are used to merge its preceding strokes with the result that hypotheses on segmentation can be decreased, which is then effective to speed up the text recognition and to increase the recognition rate. Consequently, a text line element is hypothetically segmented as shown in Fig. 4 according to its quantized direction. For rightward or leftward (horizontal) direction, a text line element is segmented vertically while producing bounding boxes bi of character candidates and gaps gi between them along the text line. For downward or upward (vertical) direction, it is segmented horizontally. Since a character pattern of Chinese origin is often made up of several radicals, segmentation may also divide a character pattern into components. b1
g3 g4
g5
g6
g1 b2
b1
g2
b2
b3
b4 b5
b6
b7
rightward
b3 g3
b1
b4 g4 b5 g5
b2
g1 g1
g2
b6
slightly down and rightward
g2
b3 g3
b4 g4 b5
i. displacement to down right
g5 b6
ii
g2
g1
downward
b1
g3 b2
g4
g5 g6
b3
slightly left and downward
b4
b5 b6 b7
Fig. 4. Segmentation of various text line elements.
i
5. Model of Recognition ii. displacement to right
Fig. 3. Two peaks in pen movement direction.
4. Line Direction Quantization After character orientation normalization, a text line element is quantized into 4 directions. The quantization can be finer but 4-directional quantization is adequate and effective to prevent the text line element from being segmented excessively. For example, if one is writing text horizontally from left to right, then large pen movement
The probability that a given pattern X is segmented as a segmentation, that is a sequence of bounding boxes and gaps, S=s1g1s2g2…sigi…smgm (si is an i-th character pattern structure that is bounded by a box bi of the height hi and the width wi and includes quasi gaps qik (k =0, 1, 2, …), while gi is a i-th true gap) and recognized as a character sequence C=C1C2…Ci…Cm is defined as the conditional probability P(C, S | X ) and it is transformed as follows: P(C , S | X ) =
0-7695-2128-2/04 $20.00 (C) 2004 IEEE
P(C ) ⋅ P( X , S | C ) P( X )
… eq. (1).
The goal is to find the segmentation S and character sequence C that maximize P(C, S | X ) among candidate segmentations as shown in Fig. 5 and among candidate character sequences. Since P(X) is the probability that a pattern X occurs regardless of S and C, we ignore it. Hereafter, we will consider P(C) and P(X, S | C ). s2
s1
s3
g2
g1
s1’
s2’
g3
g2’ g1’
s4’ s5’
s3’
g5’
g3’ g4’
Fig. 5. Different segmentations for the same pattern. 5.1 Probability P(C) In eq. (1), P(C) is the probability that a character sequence C occurs. Assuming the 1st order Markov chain, P(C) is transformed with Ci denoting the i-th character in C as follows: m
P(C ) = ∏ P(Ci | Ci −1 )
… eq. (2).
i =1
m: the number of characters in C. P(Ci | Ci-1): the probability that a character Ci-1 is succeeded by Ci (bi-gram probability). C0: the state before the first character occurs. P(C1 | C0): the probability that a character C1 occurs at the top of text.
m
P ( S | C ) ≅ ∏ P ( si | Ci , C ) ⋅ P ( g i | Ci , Ci +1 , C ) … eq. (4). i =1
where the term with Cm+1 is ignored. If we assume that the scale of si and gi is proportional to the average size C , we can scale them by C . Then, P(si | Ci , C ) and P( g i | Ci , Ci +1 , C ) are replaced by
P( si / C | Ci ) and P( g i / C | Ci , Ci +1 ) , respectively.
P( si / C | Ci ) is the probability that a character Ci is written in a structure si whose height is hi / C , width is wi / C and includes quasi gaps qik / C (k =0, 1, 2, …). The character pattern structure is an extension of character size in [2]. The simplest approximation is to assume a constant probability regardless of Ci. The second simplest way is to classify characters into several groups Gi and apply distinct probabilities P(si / C | Gi ) . Grouping can be made for numerals, alphabets, simple Kanji characters composed of single radicals, those composed of left and right radicals, those composed of top and bottom radicals and so on. On the other hand, P( g i / C | Ci , Ci +1 ) can be assumed as a constant regardless of Ci and Ci+1 or can be approximated by distinct probabilities depending on Gi including Ci and Gi+1 including Ci+1. 5.4 Probability P(X | S, C)
5.2 Probability P(X, S | C) In eq. (1), P(X, S | C) is the probability that a character sequence C is written as a segmentation S= s1g1s2g2…sigi…smgm and a character pattern sequence X=X1X2…Xi…Xm, where Xi denotes a stroke sequence Xi =xi1xi2…xik within si. Using the Basian theorem: P( X , S | C ) = P( X | S , C ) ⋅ P( S | C )
that a gap gi occurs between the characters Ci and Ci+1 depends only on the characters Ci and Ci+1, and the average size C of the character sequence C:
… eq. (3).
The terms P(S | C) and P(X | S, C) are the probability that a character sequence C is written so as to be segmented as S and the probability that a character sequence C segmented as S produces a character pattern sequence X, respectively. 5.3 Probability P(S|C) This is the probability that a character sequence C is written so as to be segmented as S. We assume the probability that a character Ci is written in a structure si depends only on the character Ci and the average size C of the character sequence C. We also assume the probability
This is the probability that a character sequence C segmented as S produces a pattern X and approximated as: m P( X | S , C ) ≅ P ( X | s , C ) … eq. (5).
∏
i
i
i
i =1
The probability P( X i | si , Ci ) is that each character Ci is
written in a structure si and represented by the stroke sequence Xi. We approximate this by the score of character recognition. Assume that we use the total of N learning patterns and the term nc(k) denotes the number of learning patterns correctly recognized with the score of its top candidate as k. The following p(i) is a monotonic function of i ranging from 0 up to less than 1 and considered as a good approximation of P( X i | si , Ci ) . p (i ) =
1 i ∑ nc (k ) N k =0
0-7695-2128-2/04 $20.00 (C) 2004 IEEE
… eq. (6).
5.5 Total evaluation function
If we summarize the above transformations and approximations: m P (C ) ⋅ P ( X | C ) = ∏ P(Ci | Ci −1 ) i =1 m × ∏ P( X i | si , Ci ) ⋅ P( si / C | Ci ) ⋅ P ( g i / C | Ci , Ci +1 ) …eq.(7). i =1
Then, by taking log of the both sides: m
log P (C ) ⋅ P( X | C ) = ∑ log P (C i | C i −1 ) i =1
m
(
)
+ ∑ logP( Xi | si ,Ci ) + logP(si / C | Ci ) + logP(gi / C | Ci , Ci +1) …eq. (8). i =1
Therefore, this evaluation function considers context likelihood in terms of bi-gram, character recognition, character pattern structure likelihood and gap likelihood.
6. Evaluation We obtained parameters for character size likelihood and quasi gaps within character patterns from our on-line handwritten character pattern databases kuchubue_d and nakayosi_t [8]. On the other hand, we collected 136 test patterns consisting of 1,385 characters from 17 people. We define character recognition rate (crr) as the number of correctly recognized characters over the number of total character patterns and segmentation rate (sr) as follows: sr = 1 −
number of incorrect segmetations … eq.(9). number of characters
Table 1 shows the effect of evaluating character pattern structure likelihood and gap likelihood. The effect is notable. Table 1. Effect of evaluating character pattern structure likelihood and gap likelihood. evaluation function measure character rec. rate (crr) segmentation rate (sr)
without character pattern structure and gap likelihood 76.17 % 92.42 %
with character pattern structure and gap likelihood 81.95 % 95.81 %
The effect of considering quasi gaps is seen in Fig. 6. Without this factor, two characters written narrowly were recognized incorrectly into a single character. With this, however, they are recognized correctly.
7. Conclusion This paper presented a formalization of writing-box free, line-direction free handwritten Japanese text recognition and its effect. By normalizing character orientation, even text of arbitrary character orientation can be recognized. We showed that newly employed factors in the likelihood are effective for the overall character recognition rate and segmentation rate. More extensive evaluation on the effect of each factor in the evaluation function remains to be made.
8. Acknowledgement This research is being supported by the Strategic Information and Communication R&D Promotion Scheme under the Ministry of Public Management, Home Affairs, Post and Telecommunications.
9. References [1] H. Murase, T. Wakahara and M. Umeda: “Online WritingBox Free Character String Recognition by Candidate Character Lattice Method”, Trans. of IEICE Japan, Vol. J68-D No. 4, pp. 765-772 (1985) (in Japanese). [2] T. Fukushima and M. Nakagawa: “On-line Writing-box-free Recognition of Handwritten Japanese Text Considering Character Size Variations,” Proc.15th ICPR, Barcelona, Vol.2, pp.359-363 (2000.9). [3] M. Okamoto, H. Yamamoto, T. Yoshikawa and H. Horii: “Online Character Segmentation Method by Means of Physical Features”, Technical Report of IEICE Japan, PRU95-13 (1995) (in Japanese). [4] H.Aizawa, T.Wakahara and K.Odaka: “Real-Time Handwritten Character String Segmentation Using Multiple Stroke Features,” Trans. of IEICE Japan, Vol. J80-D-II No.5, pp. 1178-1185(1997) (in Japanese). [5] S. Senda and K. Yamada: “A Maximum-Likelihood Approach to Segmentation-Based Recognition of Unconstrained Handwriting Text,” Proc. 6th ICDAR, Seattle, pp.184-188 (2001.9). [6] Y. Inamura, T. Fukushima and M. Nakagawa: “An On-line Writing-box-free and Writing-direction Free Recognition System for Handwritten Japanese Text,” (in Japanese) IEICE Technical Report, PRMU, Vol.100-37, No.135, pp.17-24 (2000.6) (in Japanese). [7] M. Nakagawa and M. Onuma: “On-line Handwritten Japanese Text Recognition Free from Constrains on Line Direction and Character Orientation,” Proc. 7th ICDAR, Edinburgh, pp.519-523 (2003.8). [8] http://www.tuat.ac.jp/~nakagawa/ipdb/
Fig. 6. Effect of evaluating quasi gaps.
0-7695-2128-2/04 $20.00 (C) 2004 IEEE