A Thinning Algorithm for Digital Figures of Characters Michio SHIMIZU, Hiroshi FUKUDA*, Gisaku NAKAMURA** Nagano Prefectural College, University of Shizuoka*, Tokai University** E-mail:
[email protected],
[email protected]*
Abstract A thinning scheme could be used as an useful pre processing in image processing. Various algorithms have been proposed to produce the skeleton of digital binary pattern. However, they have undesirable properties that tend to cause shrinking or vanishing of the segment, the appearance of a beard, and warping where segments intersect. In this paper, we propose a parallel Hilditch algorithm to acquire more stable output. Especially, two kinds of masks that are effective in the thinning of digital figures of characters are introduced. Then, we evaluate the performance of our scheme by investigating the thinning for 432 kinds of actual character fonts. As a result, the skeletons obtained from our method show the best qualities among those of any other major thinning algorithms.
1. Introduction The graphic data taken from an image scanner etc. and processed by the computer are mainly a letter, a map information and a design drawing, etc.. In order to process these graphics, a transaction of thinning becomes often important. For example, although a letter is inputted as a picture image having a certain width, in order to make the burden of the computer in character recognition light, it is desirable to carry out a pre processing to as thin a line drawing as possible. Thus the technique of making a line thin without spoiling the information on original is called thinning. Various algorithms have been proposed to produce the skeleton of digital binary pattern[2]-[5]. However, they have undesirable properties that tend to cause shrinking or vanishing of the segment, the appearance of a beard, and warping where segments intersect, and the profitability may depends on the processing objects. The definition of a thinning has not being shown lucidly, then the quality criterion would likely become subjective. It could not say that the experimental validation to a concrete processing object have been fully made. In this paper, a new thinning algorithm for digital figures of character is proposed. The classic Hilditch algorithm [2] is parallelized and two kinds of masks for
performing a local transaction are introduced. Then, we evaluate the performance of our scheme by investigating the thinning for 432 kinds of actual character fonts. As a result, the skeletons obtained from our method show the best qualities among those of any other major thinning algorithms.
2. Definition First, the terms which are necessary to describe the algorithms are defined. A processing object is an binary graphic which takes 0 or 1on a plane tetragonal lattice. A thinning is setting width of the line of this binary graphic to 1, and the graphic of the width 1 generated by the thinning is called a core line or skeleton. Figure 1 shows the 8 neighbors of the objective point (pixel 0) on the pattern of 3 x 3. Among these, especially the pixels 1,3,5, and 7 are called 4 neighbors. 8 neighbors take the value of 0 or 1, and they are represented by xk (k = 1,2,..,8).
4 5 6
3 0 7
2 1 8
Figure 1. 8 neighbors Now, the value of objective point is set to 1, and 1 is expressed by gray and 0 is by white. If two pixels adjoin each other in vertical, horizontal or diagonal direction, they are said to be connected. Figure 2 shows an example of pixel’s connection. The connectivity number Nc is defined by the equation:
Figure 2. Connection of pixels
Nc where
x k (1 x k 1 x k 2 )
k 1, 3, 5, 7
x k 1 xk , and k + 2 = 1 for k = 7[6]. The
connectivity number takes the value of 0, 1, 2, 3, and 4. Especially the point of Nc =1 is called the boundary point, and generally it becomes the candidate point of a deletion. However, among the boundary points, if the total value of 8 neighbors is one or less, the point is called endpoint and is treated as outside for deletion. If the graphic does not contain the boundary point except an endpoint, it is called complete 8 connection or complete 8 connected core line. The complete 8 connected core line is generated by the Hilditch algorithm or parallel Hilditch algorithm as stated in the following paragraph. But finally the core line generated by our algorithm is called incomplete 8 connection. This is defined by the intersection number: 1 kx kx
8
d
this paper, in order to introduce the masks described later, the Hilditch algorithm is transposed to a parallel type, and let this be the kernel of our algorithm. The operation consists of a pre processing stage and a scan stage. In the pre processing stage, boundary points with connectivity number Nc = 1 are searched. During the scan stage, operation is divided into 4 cycles as illustrated by Figure3. In each cycle only those pixels whose neighbor can be identified as xc = 0 are checked, and if two conditions are satisfied: 8
(1)
x k 1
(2)
k
N c 1 : Boundary point,
the pixel is removed. The algorithm is stopped when there is no point to delete. The flow of a parallel Hilditch algorithm is shown in Figure 4. Where, point 2 denotes a candidate point to delete and point 3 a deletable point.
1 k
d expresses the number of 0,1 pattern’s variation in the 8 neighbors. Since the value of d is taken even number of 2,4,6 and 8, d’ = d/2 is used instead of d. Then, for all pixels of core line, if (d’≠1) or (d’=1 andΣχk≦1) are satisfied, it is said to be the incomplete 8 connection.
3. Parallel Hilditch algorithm A thinning algorithm consists of the repetition of shaving a bound and the stop judging process which decides whether a core line is generated. The repetitive shaving process is roughly divided into the sequential model and the parallel model depends on the timing of deletion. As for the sequential model, the raster-scan type which scans the pixels of one line at a time toward a lower right from upper left is general. Although the algorithm is brief, there is an asymmetric problem depending on the scanning direction. A parallel model is the technique of processing all pixels simultaneously and there is no problem of an asymmetry. But, the device which does not extinguish the graphic of line width 2 is needed. Therefore, the technique of dividing each repetitive process into further several cycles is used well. The classical Hilditch algorithm is a fundamental sequential algorithm using the connectivity number[2]. In
cycle1
1 : Not end point,
cycle 2 cycle 3 Figure 3. Four cycles
cycle 4
Input of binary pattern all point 1 → point 2
Boundary points except edge points among 2 → point 3
Deletion of point 3 by 4 cycles Satisfy stop condition
N
Y Figure 4. Parallel Hilditch algorithm
4. Masks By parallelizing, it may be possible to introduce two kinds of masks which are used for the processing of exceptional points. Masks are already applied to some other parallel thinning algorithms. In this study, various masks were added to the parallel Hilditch algorithm for the character graphic, and their profitability was examined. Because the processing time of an algorithm is affected with the number of masks or the size of masks, suitable masks need to be considered. Consequently, it is judged to be effective to add following internal point masks and
▲ - - + ▲ - ● - - - + △ △
(a)
(b)
Figure 5.
Exceptional point
cross point masks. The number of masks becomes 48 as shown below. The notions in the masks are as follows: ●: objective point +: point whose value is greater or equal to one -: point whose value is zero △,▲: Each of two points is + space: arbitrary point First, in order to protect the angle of characters “L” or “V”, the internal point masks are introduced. Although the fraction shown in the gray of Figure 5 (a) is the candidate points of a deletion searched by a pre processing, it does not contain the black point in the corner. However, this point will cause warp in the corner by successive processing. Therefore, the internal point mask of character L detects this pixel and adds it to the candidate points of a deletion. The internal point masks of character L are 4 masks which is Figure 6(a) and its rotations of π/2, π and 3π/2. On the other hand, for the corner of character V with a smaller angle than L, 24 internal point masks are prepared. They are Figure 6(a), (b), (c) and its rotations of π/2, πand 3π/2 , and their mirror images. Here, ★ denotes a internal point of character L. Then, the sum of internal point masks is 28.
▲ ▲ + - + ● + △ + △
+ + + - ▲ + ● ★ + ▲ +
(a) char. L
▲ - + + ▲ + ● - - - + △ △
▲ ▲ - + - △ + ● + △ - +
(b) char. T(1)
(c) char. T(2)
Figure 7. Cross point mask Second, we introduce the cross point masks used in the final step of producing a core line. Figures 5(b) illustrates the crossing point as the center of the “T” figure. This point is already removed because of Nc = 1, but it may be preferable for it to be left visually. Then, an introduced cross point mask finds such a point as a non-deletable point. There are two kinds of masks, for character L and character T. The cross point masks of character L are 8 masks which is Figure 7(a) and its rotations of π/2, πand 3π/2. On the other hand, for the intersection of character T, 12 cross point masks are prepared. They are 4 masks of Figure 7(b), its rotations of π/2, π, 3π/2 , and 8 masks of Figure 7(c) , its rotations of π/2, π, 3π/2 and their mirror images. Therefore, the sum of cross point masks is 20. By the way, the classical Hilditch algorithm or our parallel Hilditch algorithm described section 3 produces a complete 8 connected core line, but by introducing masks, an incomplete 8 connected core line will be produced. The flow of our parallel Hilditch algorithm with masks is shown by Figure 8. This is an extension of Figure 4, and point 1 denotes an eternal fixed point.
5. Evaluation (a) char. L
+ - + △ - △ ▲
(b) char. V(1)
+ + + + + - + ● ★ + ● ★ + + + + - + ▲ ▲ ▲ △ △ (d) char. V(3) (c) char. V(2) Figure 6. Internal point mask
To judge the performance of our algorithm, we made a comparison with any other major algorithms. As the experiment methods, algorithms, character font, evaluation items, and evaluation methods are described. ① Algorithms: The former 4 algorithms of Hilditch[2] (abr.HD), Deutsch[3](DA), Tamura[4](TA), Tsuruoka[5](TS), and our Parallel Hilditch(PH) as a kernel, Parallel Hilditch with masks(PHM). Then, 6 algorithms are compared. The parallel Hilditch is taken up as reference, in order to clarify the profitability of a masks.
Input of binary pattern all point 1 → point 2
Boundary points except edge points among 2 → point 3 Points searched by internal point mask among 2 → point 3 Points searched by cross point mask among 2 or 3 → point 1
Fig.9 Results produced through thinning Upper: HD, DA, TA Lower: TS, PH, PHM
Deletion of point 3 by 4 cycles
Satisfy stop condition
N
Y
Figure 8. Parallel Hilditch algorithm with mask ② Character font: MS Gothic font of 3 kind of sizes (18 points, 36 points, 72 points). They consist of 144 characters of the alphabet (a uppercase, lower case), katakana, and a hiragana. Then, the total number of them is 432. ③ Evaluation items: Seven quality evaluation items[4] of core line are used. They are deviation from central position, shrinking of segment, appearance of beard, warp in the intersection part of character L, character T, character +, others. ④ Evaluation methods: It is impossible to make quantitative evaluations for the quality of core lines. Then, we estimate their qualities by visual sensory evaluation of 5 phase; 1 (good), 2 (a little good), 3 (usually), 4 (a little bad), and 5 (bad). To 432 characters and 6 algorithms, the same people performs a evaluation twice in order to investigate the fluctuation of a mental experiment. The system treating Bitmap data of a character font has been built on Windows, and the thinning experiment was performed. Figure 9 shows the results of thinning “ア” for 6 algorithms. In this example, a shrinking of TA, beard of HD,DA,TS, warp in intersection of PH are seen. The results of the experiment is shown in Table 1. The unit of the processing time is msec. There is a few difference between the results of two experiments.
Main results are: (a) Hilditch algorithm would likely to produce beard, and Parallel Hilditch algorithm would likely to produce warp in the intersection part. (b) Skeletons produced by the parallel Hilditch algorithm with masks showed the best qualities for most of the factors mentioned above, however processing time increased slightly. The profitability of this algorithm which corrects the beard by the concurrency and corrects the warp in the intersection part by masks is shown. The processing time increases slightly because time is taken with the pattern matching of a mask and a part of letter graphic, since the mask is extended to 5 x 5.
Table 1. Evaluations for thinning algorithms
method
property
average frequency processing evaluation (bad) time
HD
beared
2.530
13
0.11
DA
multiple in crossing point
2.370
21
0.11
TA
shrinking
2.640
28
0.17
TS
many beared
3.167
76
0.17
2.465
13
0.17
1.954
9
0.39
warp in crossing point increase PHM processing time PH
6. Conclusion We have studied for the purpose of the development of a thinning algorithm which produces a visually excellent thin letter. Especially, various masks were introduced and those effects were examined. Consequently, the smaller number of masks which could be anticipated a synthetic effect were able to be decided. Moreover, the core line obtained with masks has satisfied the property of incomplete 8 connection which shows a visually excellent core lines. It may be considered that our algorithm is applicable not only to a letter graphic but also the common binary image patterns. However, as for the thinning of two lines with certain degree of crossed angle like the alphabet "X", an improvement might be left behind about the transaction of an intersection part.
References [1] A. Rosenfeld, “Connectivity in digital pictures”, J.ACM,17, 1, pp.146-160(1970). [2] C. J Hilditch, “Linear skeletons from square cupboards”, Machine Intelligence 4, (Edinburgh Univ. Press), pp.403-420 (1969). [3] E. S. Deutsch, “Thinning algorithms on rectangular, hexagonal, and triangular arrays”, C.ACM, vol.15,no.9, pp.827-837 (1972). [4] H. Tamura, “A comparison of line thinning algorithms from digital geometry viewpoint”, Proceedings of 4th Int. Joint Conf. On Pattern Recognition, pp.715-719(1978). [5] S. Tsuruoka, F. Kimura, M. Yoshimura, Y. Miyake, “A thinning algorithm for digital binary pictures”, PRL 78-47, pp.41-49(1978). [6] S. Yokoi, J Toriwaki, T. Fukumura, “Topological properties in digitized binary pictures”, ibid., 56-D, 11, pp.662-669(1973).