P1: SAD Applied Intelligence
KL574-05-Lee
April 8, 1998
14:20
Applied Intelligence 8, 269–285 (1998) c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands. °
Handwritten Chinese Character Recognition Based on Primitive and Fuzzy Features via the SEART Neural Net Model HAHN-MING LEE, CHUNG-CHIEH SHEU AND JYH-MING CHEN Department of Electronic Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
[email protected] Abstract. A handwritten Chinese character recognition method based on primitive and compound fuzzy features using the SEART neural network model is proposed. The primitive features are extracted in local and global view. Since handwritten Chinese characters vary a great deal, the fuzzy concept is used to extract the compound features in structural view. We combine the two categories of features and use a fast classifier, called the Supervised Extended ART (SEART) neural network model, to recognize handwritten Chinese characters. The SEART classifier has excellent performance, is fast, and has good generalization and exception handling abilities in complex problems. Using the fuzzy set theory in feature extraction and the neural network model as a classifier is helpful for reducing distortions, noise and variations. In spite of the poor thinning, a 90.24% recognition rate on average for the 605 test character categories was obtained. The database used is CCL/HCCR3 (provided by CCL, ITRI, Taiwan). The experiment not only confirms the feasibility of the proposed system, but also suggests that applying the fuzzy set theory and neural networks to recognition of handwritten Chinese characters is an efficient and promising approach. Keywords: handwritten Chinese characters recognition, neural network model, fuzzy set theory, primitive features, fuzzy compound features
1.
Introduction
Recognition of handwritten Chinese characters (HCCR) is useful in two applications: (1) automatic input for Chinese Electronic Data Processing (EDP); (2) compression of Chinese documents. Furthermore, HCCR applications have spread to office automation, computer aided instruction, and so on. For the abovementioned reasons, much attention has been focused on HCCR since 1980 [13]. Due to the characteristics of handwritten Chinese characters, i.e., the large character sets of characters (at least 5401 for daily use), a high degree of complexity in their structure, and various writing styles, we propose a method based on primitive and fuzzy compound features via the SEART neural net model [19] for HCCR. A neural network model is based on the biological neurons of the human brain. Because humans have the powerful ability of pattern recognition, some
researchers believe that neural network models are appropriate for Chinese character recognition. The main difference between neural networks and other classification methods is that neural networks do not need large prototype databases in advance. Furthermore, matching of input character with template characters is not necessary in neural networks. SEART (Supervised Extended ART) is a neural network model that incorporates a supervised mechanism into the extended unsupervised ART [19]. It uses a learning theory called the Nested Generalized Exemplar (NGE) theory [22]. The extended ART subsystem, named EART, quantizes the input vectors. A category node in this subsystem represents a cluster of input instances. It has a geometric representation of a hyperbox in feature space. Instances in the same hyperbox belong to the same class. Instances belonging to different classes in a cluster can be distinguished by nested hyperboxes [22]. A layer of Grossberg’s Outstars [1] is used to learn the
P1: SAD Applied Intelligence
270
KL574-05-Lee
April 8, 1998
14:20
Lee, Sheu and Chen
desired associations between clusters and corresponding classes. In any period of time, the training instances may or may not have desired outputs. When the desired output is presented, the model learns the desired association. Otherwise, this model makes a prediction and generalization based on learned associations. SEART also has the ability of incremental learning. In addition, multiclass and nonconvex classifications can be handled well. The feature extractor is the kernel of the OCR system. It deeply influences the recognition results. Features can be generally divided into three categories: global features, local features, and structural features [13]. Global feature extraction is based on global transformation in the global and abstract view, such as Walsh, Fourier, and K.L. transforms [7]. It tolerates the noise and distortion of handwritten characters. Because the details and relations are ignored, it suppresses the recognition rate of similar characters. Local features are derived from the transformation in different positions. Cellular and polygonal approximations are typical local features [20]. Structural features include feature points, strokes, stroke sequences, etc. Because they are the main elements in Chinese characters, the strokes can be used to identify handwritten Chinese characters effectively. However, it is difficult to extract perfect structural features due to writing variants. In [27], the desirable properties of features were found to be: (1) insensitivity to variation in shape, (2) typicality and presence in many characters, and (3) ease of detection. In short, global and local features can tolerate noise and are easily detected, but they are not invariant. The structural features are adaptable variations; however, they are sensitive to noise. Since the writing styles of handwritten characters vary a great deal in stroke length, stroke type, and stroke distance, they completely meet the fuzzy characterics. Based on the structures of Chinese characters, we choose horizontal, vertical, left-slanting, and right-slanting strokes as primitive features. In our method, we employ primitive features and compound fuzzy features to achieve stability and adaptability. The rest of this paper is organized as follows. Section 2 describes our proposed architecture. The SEART neural network model is introduced in Section 3. Then the experimental results are given in Section 4. Next, Section 5 discusses the advantages and limitations of our method. Finally, we give concluding remarks.
Figure 1.
2.
The system architecture.
System Architecture
The architecture of our proposed system is shown in Fig. 1. It consists of three stages. The first stage accepts the input image and then deals with normalization, thinning, and dividing. The second stage performs feature extraction. Two kinds of features are used in our system. They are primitive features and compound fuzzy features. A 3 × 3 window is adopted to extract the primitive features while the compound fuzzy features are based on the strokes. Basically, the primitive features and compound fuzzy features are statistical features and structural features, respectively. In Chinese characters recognition, these two kinds of features are very important. Finally, we use a neural network classifier named Supervised Extended ART (SEART) [19] to recognize handwritten Chinese characters.
2.1.
Preprocessor
First, we normalize the input bitmaps to the same size, for example, 72 × 72. A thinning algorithm [6] is then used to eliminate the width of the strokes and to get the skeleton of the input character. It tolerates variants of stroke width caused by different writing tools. Since thinning is not within the scope of this paper, we will not deal with it here. Figure 2 shows the 9 dividing blocks used. In Wang’s four-corner method [25], a character is represented by the four corners. This method can successfully separate Chinese characters into different categories. The modified method presented in [3] has also shown good results. This is why we emphasize the peripheral parts in the 9 dividing blocks. Also the central part and the periphery overlap slightly (in our experiments, 5 image pixels were used) to avoid writing position variations.
P1: SAD Applied Intelligence
KL574-05-Lee
April 8, 1998
14:20
Chinese Character Recognition
Figure 2. system.
2.2.
The 9 dividing blocks used for the input character in our
Primitive Features
Two types of primitive features are used in our system. For extracting the first set of primitive features, we use the 12 templates of Fukushima’s Neocognitron [10] to represent 8 categories of stroke direction. We count the numbers of 8 categories in each shadowy dividing block mentioned in Fig. 2 to construct the first set of primitive features. For extracting the second set of primitive features, we compute the distribution of thin stroke and bold stroke to construct the second set of primitive features. These two types of primitive features will be detailed in the following subsections. 2.2.1. First Set of Primitive Features. There are 12 templates in the S1 layer of Fukushima’s Neocognitron [10]. It classifies the 12 templates into 8 categories. Figure 3 shows the 12 templates and the corresponding
Figure 3. 12 simple template patterns and 8 corresponding categories in Neocognitron.
271
8 categories. We adopt them as our first set of primitive features and use them to extract the strokes for the compound fuzzy features. A 3 × 3 window is used to trace the input character from left to right, top to bottom. We match the 3 × 3 window with the 12 templates mentioned above. If one of the 12 templates is matched, then to the center of the window is assigned the associated value, 1–8, of the category. A feature point, denoted by “$”, is assigned to the center of the window when the window matches more than one of the 12 template patterns. For example, in Fig. 4, we show the result of a Chinese character “ ” after applying matching and assignment operations. The 8 categories can be treated as 8 stroke directions. Furthermore, the template S6 may be a variation of template S5 or S7. In this sense, the first primitive features are tolerant of local variations. For each dividing block, we count the numbers of the 8 categories and then normalize them to [0, 1]. The normalized values are used as the first set of primitive features. Thus, a total of 72 primitive features is used in our system. We give the example “ ” and show its first set of primitive features in Table 1. The first column of the matrix denotes the 9 dividing blocks which we adopt. The second row indicates the 12 templates, and their 8 categories are given in the first row. The element in the matrix denotes the number of the template pattern in the corresponding block. 2.2.2. Second Set of Primitive Features. Basically, the second set of primitive features are stroke densities. We horizontally scan the character row by row and then count the numbers of strokes. A stroke is classified a bold stroke or a thin stroke. In a bold stroke, the number of continuous pixels is greater than 4, and the number of continuous pixels is less than 4 in a thin stroke. In our experiments, the number of continuous pixels representing a bold stroke was from 3 to 6. We chose 4 as our threshold since it led to the best results. We then accumulated the numbers of the bold strokes and thin strokes every 8 rows to reduce the input dimension. There were 9 groups after the 72 rows were scanned. Each group had 2 features: the number of bold strokes and the number of thin strokes, respectively. These 18 numbers were normalized to [0, 1] and then used as the horizontal features of our second set of primitive features. Similarly, we can obtain 18 other features via vertical scanning. Figure 5 shows the numbers of the two categories of strokes for the character “ ”.
P1: SAD Applied Intelligence
272
KL574-05-Lee
April 8, 1998
14:20
Lee, Sheu and Chen
Table 1. The numbers of 8 categories (12 templates) in 9 dividing blocks for an example “ element in the matrix indicates the number of the template pattern in the corresponding block.
”. The
Category 1
2
3
4
5
6
7
8
10
0
7
12
4
0
0
3
0
2
14
22
5
0
0
6
21
4
0
2
16
2
0
1
19
8
0
2
21
2
0
4
16
9
0
3
27
3
1
6
8
5
0
3
22
3
1
3
0
2
7
10
1
4
10
10
0
0
0
0
0
4
10
7
16
4
6
18
23
1
2
7
Dividing block
2.3.
Compound Fuzzy Features
The primitive features are statistical. On the other hand, the compound fuzzy features, i.e., stroke types and stroke relations, are regarded as structural features. Because Chinese characters consist of horizontal, vertical, right-slanting, and left-slanting stroke types, we use them as our primitive strokes to represent Chinese characters. Furthermore, it is very difficult to extract ”, the complex stroke relations, such as “ so we only select the “×, >, d ” stroke relations mentioned in [25]. The advantages of the proposed method are easy stroke extraction and low-time complexity. Figure 6 shows the four stroke types and three stroke relations which we used. Fuzzy membership functions are used in structural feature extraction to relieve the effect of noises and variations. Since Chinese characters
are based on strokes, stroke extraction is essential and will be described in the next subsection. 2.3.1. Stroke Extraction Strategy. Since it is difficult to extract strokes directly, we complete the stroke extraction process in two stages: segment extraction and segment combination. After thinning, a stroke may be divided into several segments. At the same time, a segment consists of clusters in several continuous rows. In the first stage, we extract the segments, i.e., the parts of a stroke, using a tracing algorithm. Second, we combine the extracted segments into a stroke by mean of fuzzy combining rules. To speed up this process, matching and segment extraction are carried out simultaneously. First, we define continuous pixels in a row as the same cluster. From Fig. 4, we can observe: (1) pixels in the same
P1: SAD Applied Intelligence
KL574-05-Lee
April 8, 1998
14:20
Chinese Character Recognition
273
Figure 4. An example “ ”: the result of templates matching. Assign each black pixel a value of 1 ∼ 8 according to the matched template. The ’$’ pixels are the feature points.
cluster belong to the same segment; (2) the values of pixels in the same segment do not vary a lot; (3) for two neighboring pixels in different rows, if the difference of the feature values, defined in Section 2.2.1, is less than 1, then they belong to the same segment. Information obtained from the observation above is useful for extracting segments. We trace the neighboring clusters of the two neighboring rows until: (1) the “$” symbol is met; or (2) the difference of the values for two neighboring end pixels of the clusters is greater than 1. When we meet the symbol “$” in tracing, there exist at least two possible forks from which to choose. The decision is not made in the tracing
stage. After segment extraction, the combination stage will decide whether and with which ones the segments combine. Segment extraction does not stop until all the pixels are traced. An example of extracted segments and the flowchart of the tracing algorithm are shown in Figs. 7 and 8, respectively. Next, we combine these segments into strokes by means of combination rules [18]. For example, the character “ ” consists of three segments (i.e.,“ ”) which can be combined ”) by the slope of the seginto two strokes (i.e., “ ments and the distance between any two strokes. The extracted strokes of the example “ ” are shown in Fig. 9.
P1: SAD Applied Intelligence
274
KL574-05-Lee
April 8, 1998
14:20
Lee, Sheu and Chen
Figure 6.
Figure 5a. An example “ ” after horizontal scanning. The character was scanned horizontally, and the number of strokes was counted every 8 rows. Bold stroke means that the number of continuous pixels was not less than 4 in a row. Thin stroke indicates that the number of continuous pixels was less than 4 in a row.
The stroke types and stroke relations used in our system.
features was explained in [27]. It is helpful to distinguish between similar characters by means of structural features. For these reasons, we employ structural features as our compound features. Due to variations in handwritten Chinese characters, we may treat an extracted stroke as a horizontal stroke or right-slanting stroke [18]. Similarly, the fuzzy relation exists in two extracted strokes. In order to increase the separability among different character categories, we divide the writing square into four regions: upper left, upper right, lower left, and lower right, as shown in Fig. 10. It is not clear to which one of the four regions an extracted stroke belongs. For the above- mentioned reasons, we apply the fuzzy concept in HCCR. We define the membership functions for the four primitive strokes, horizontal, vertical, right-slanting, and left-slanting strokes, as follows: (1) Horizontal: ¯ ¯ ¯ θ (x) ¯ 1 − ¯¯ ◦ ¯¯, |θ (x)| < 45◦ ; µHL 45 ˜ (x) = 0, others,
Figure 5b. An example “ ” after vertical scanning. The character was scanned vertically, and the number of strokes was counted every 8 columns. Bold stroke means that the number of continuous pixels was not less than 4 in a column. Thin stroke indicates that the number of continuous pixels was less than 4 in a column.
2.3.2. Compound Fuzzy Feature Extractor. Kumamoto et al. [17] proposed a precandidate selection method using global features. Since the reduced character set is still large, it is reasonable to include other kinds of features. The importance of structural
where θ (x) denotes the angle between stroke x and the horizontal line. (2) Vertical: µVL ˜ (x) ¯ ¯ ¯ θ (x) − 90◦ ¯ ¯, 45◦ ≤ |θ (x)| ≤ 135◦ ; 1 − ¯¯ ¯ = 45◦ 0, others,
P1: SAD Applied Intelligence
KL574-05-Lee
April 8, 1998
14:20
Chinese Character Recognition
Figure 7. The result of the extracted segments for an example “ are 11 extracted segments, marked as a ∼ 1, in this example.
275
”. The pixels marked by the same character are in the same segment. There
P1: SAD Applied Intelligence
276
Figure 8.
KL574-05-Lee
April 8, 1998
14:20
Lee, Sheu and Chen
The flowchart of the segment extracting algorithm.
P1: SAD Applied Intelligence
KL574-05-Lee
April 8, 1998
14:20
Chinese Character Recognition
Figure 9.
The result of the extracted strokes for an example “
277
”. The segments are combined into 8 strokes, marked as A ∼ H.
where θ(x) denotes the angle between stroke x and the horizontal line. (3) Left-Slanting: µLS˜ (x) ¯ ¯ ¯ θ(x) − 135◦ ¯ ¯, 90◦ < θ(x) ≤ 180◦ ; ¯ 1−¯ ¯ = 45◦ 0, others, where θ(x) denotes the angle between stroke x and the horizontal line.
(4) Right-Slanting: µRS ˜ (x) ¯ ¯ ¯ θ (x) − 45◦ ¯ ¯, 0◦ < θ (x)| ≤ 90◦ ; ¯ 1−¯ ¯ = 45◦ 0, others, where θ(x) denotes the angle between stroke x and the horizontal line. According to the membership functions defined above, we calculate the degrees of the stroke types for each extracted stroke, and then distribute them to
P1: SAD Applied Intelligence
278
KL574-05-Lee
April 8, 1998
14:20
Lee, Sheu and Chen
Figure 10. The four regions of writing square and their corresponding numbers.
the four regions as shown in Fig. 10. The four corner method has good separability for the Chinese character set. Because our features are based on the four-corner method, we divide the writing square into four regions. The distance between a stroke’s midpoint and each apex is used to decide to which region the stroke belongs. The region degrees, 1.0, 0.5, 0.25, and 0.125, are assigned to the four regions according to these distances. The smaller the distance is, the higher is the region degree of the stroke. For example, if a stroke and the upper left region have the minimal distance, the stroke has the maximal membership degree, 1.0, for the upper left region; i.e., the stroke most likely belongs to region 1. Although we can assign region degrees by means of the exact distance between a stroke’s midpoint and one of the four apexes, it is more proper to assign the four
fixed degrees (i.e., 1.0, 0.5, 0.25, and 0.125) to prevent variations in positions. For each extracted stroke, we multiply the four stroke type degrees by the four region degrees. Table 2 demonstrates the membership degrees for all the extracted strokes in an example, “ ”. In Table 2, the numbers 1–4 indicate the four different regions: 1 for upper left, 2 for upper right, 3 for lower left, and 4 for lower right. In Fig. 9, A to H are the eight extracted strokes. For example, stroke A has a right-slanting type degree of 0.35 in region 1. The elements of the column in Table 2 are accumulated as total degrees. The accumulation denotes the density of the stroke type degree in the corresponding region; e.g., region 2 has degree 1.79 for the horizontal stroke type. Compared with the other regions, region 2 has the highest density of horizontal stroke. The 16 accumulated degrees are normalized to [0, 1] and used as the first set of compound features. The stroke relations between two strokes are the second set of compound features. For any two extracted strokes, we can calculate the three relations’ degrees by means of the membership functions defined below. Also, the region degrees are assigned to be 1.0, 0.5, 0.25, and 0.125 for the four regions according to the distances between the intersection point of two strokes and the apexes. Similarly, we multiply the three stroke relations’ degrees by the region degrees. For each region, we accumulate the three relation membership degrees. Finally, all 12 degrees are normalized to [0, 1] and used as the second set of compound features. These features are used to indicate the densities of the three
Table 2. The membership degrees of the stroke types for the extracted strokes in the example “ ”. 1∼4 indicate the four regions in the writing square. The total degrees are normalized to [0, 1] and used as the first set of compound features. Stroke type Horizontal Stroke
Right-slanting
Vertical
Left-slanting
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
A
0.00
0.00
0.00
0.00
0.35
0.08
0.17
0.04
0.72
0.18
0.36
0.09
0.00
0.00
0.00
0.00
B
0.46
0.93
0.12
0.23
0.04
0.08
0.01
0.02
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
C
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.20
0.83
0.10
0.41
0.05
0.21
0.02
0.10
D
0.94
0.47
0.23
0.12
0.08
0.04
0.02
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
E
0.00
0.00
0.00
0.00
0.36
0.09
0.72
0.18
0.18
0.04
0.36
0.09
0.00
0.00
0.00
0.00
F
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.23
0.93
0.12
0.46
0.02
0.09
0.01
0.04
G
0.09
0.37
0.18
0.75
0.04
0.16
0.08
0.31
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
H
0.05
0.02
0.19
0.09
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.22
0.11
0.87
0.43
Total degrees
1.54
1.79
0.72
1.19
0.87
0.45
1.00
0.56
1.33
1.98
0.94
1.05
0.29
0.41
0.90
0.57
P1: SAD Applied Intelligence
KL574-05-Lee
April 8, 1998
14:20
Chinese Character Recognition
different stroke relations in a specific region. We define the membership functions for the three stroke relations, CROSS, CORNER, and T-TYPE, as follows: (1) CROSS: µCROSS ˜ (S1, S2) dist(I, C1) dist(I, C2) − 1− len(S1) len(S2) = Intersection point is in S1 and S2; 0, others. (2) CORNER: if 50◦ < δ(S1, S2) < 130◦ µCORNER (S1, S2) ˜ dist(I, C1) dist(I, C2) + len(S1) len(S2) Intersection point is in S1; min(dist(I, B1), dist(I, E1) = 1 − len(S1) Max 0, min(dist(I, B2), dist(I, E2)) − len(S2) others.
else
µCORNER (S1, S2) = 0. ˜
(3) T-TYPE: if 50◦ < δ(S1, S2) < 130◦ µT-TYPE (S1, S2) ˜ dist(I, C1) 0, 1 − len(S1) min B2), dist(I, E2)) (dist(I, − , len(S2) Max dist(I, C2) 1− = len(S2) min (dist(I, B1), dist(I, E1)) − len(S1) Intersection point is in S1 or S2; 0, others.
else
µT-TYPE ˜ (S1, S2) = 0.
dist(a, b) denotes the distance between point a and point b. len(s) indicates the length of stroke s. I is the
279
intersection point of stroke S1 and stroke S2. B1 and B2 denote the start points of stroke S1 and stroke S2. C1 and C2 represent the central points of stroke S1 and stroke S2. E1 and E2 indicate the end points of stroke S1 and stroke S2. δ(s1, s2) denotes the angle between stroke S1 and stroke S2. “+” and “×” are typical CROSS patterns. The longer the distance between the intersection point and the center of the stroke, the higher is the CROSS membership degree. If the intersection point is neither in S1 nor in S2, such as “ ”, “ ”, the CROSS membership degree of S1 and S2 is assigned to be zero. “⊥”, “>”, “a” and “`” are typical T-TYPE patterns. If the angle between two strokes is not in the range (50◦ ∼ 130◦ ) , then the pattern does not satisfy the T-TYPE variation tolerance. Thus, it is reasonable to assign a membership degree of zero. The angle range is decided based on experiments. For example, “>” is one of the typical T-TYPE patterns, and it’s T-TYPE membership degree is 1. “ ” and “ ” are variations of T-TYPE. According to the definition of the T-TYPE membership function, it’s membership degree is less than 1. The first term “1-[dist(I,C1)/len(S1)]” means that the greater the distance between the intersection point and the center of the stroke is, the lower the T-TYPE membership degree will be. The second term calculates the nearest distance between the intersection point and the other stroke. The T-TYPE membership degree is inversely proportional to the distance between the intersection point and the other stroke. In other cases, if the intersection point is neither in S1 nor in S2, such as “ ”, obviously, the T-TYPE membership degree is zero. “ ”, “ ”, “ ” and “ ” are typical CORNER patterns. If the angle between two strokes is too large (>130◦ ) or too small (