A Technique for Fuzzy Document Binarization - Semantic Scholar

Report 5 Downloads 32 Views
A Technique for Fuzzy Document Binarization Nikos Papamarkos Department of Electrical & Computer Engineering Democritus University of Thrace, 67100 Xanthi, Greece email : [email protected]

nature of document images. Based on this assumption, specialized binarization techniques have been developed for document images. Strouthopoulos and Papamarkos [ 181 considered the problem of gray-level reduction of complex documents by using a combination of a page layout analysis technique [lo] and a Kohonen SOFM neural network [ 12,131.

Abstract This paper proposes a new method for fuzzy binarization of digital document. The proposed approach achieves binarization using both the image gray-levels and additional local spatial features. Both, gray-level and local features values feed a Kohonen Self-Organized Feature Map (SOFM) neural network classifier. After training, the neurons of the output competition layer of the SOFM define two bilevel classes. Using content of these classes, fuzzy membership functions are obtained that are next used with the Fuzzy C-means (FCM) algorithm in order to reduce the character-blurring problem. The method is suitable for binarization of blurring documents and can be easily modified to accommodate any type of spatial characteristics.

This technique is suitable for clean documents but does not work well in degraded documents. Some other document directed binarization techniques have been developed which are focused on binarization of degraded documents [14,15,16,17]. In [15], O’Gorman proposes an algorithm based on local connectivity information. Other techniques are based on some gradient and edge information [ 16,191, stoke analysis [20] and adaptive logical techniques [ 14,171. A good analysis and comparative results of these techniques is given in [14]. The above techniques give good results in many cases. However, they are complex, require extending analysis of the documents structure and for these reasons are time consuming. In many applications, such as Fax transition and document retrieval systems we mainly want the text to be readable as possible.

Keywords Binarization, Thresholding, Self-Organized Neural Networks, Fuzzy Logic.

INTRODUCTION A document image contains text, symbols and graphics. In many practical applications, we only need to recognize the text content of the document in a binary form. So, it is sufficient to convert the documents in a binary format which will be more efficient to transmit and process instead of the original gray-scale image. For many years the binarization of documents was based on the standard bilevel techniques that are also called global thresholding algorithm [4,5,6,7,&g]. These statistical methods are suitable for converting a gray-scale image to a binary one but not for complex documents and even more for degraded documents. For these special cases it is important to take into account the

The proposed technique for binarization of mixed-type documents takes advantages of gray-scale and spatial information. Specifically, the binarization technique is based on a hybrid neuro-fuzzy system [21], which consists of a Kohonen SOFM and a FCM classifier [ 13,221. According to the proposed technique, the gray-level value of each pixel is related to suitable local spatial features extracted in its neighboring region. Thus, the onehistogram approach of the dimensional clustering multithresholding techniques is converted to a multi-dimensional feature clustering technique. The gray-level value of each pixel is considered as the first feature. The entire feature set is completed by additional spatial features, which are extracted from neighboring pixels. These features are associated with spatial image characteristics that emphasize the shapes of the characters. After training, the two output neurons of the competition layer of the SOFM define the two dominant classes. Using the content of these classes fuzzy membership functions are then automatically obtained that are next used in the FCM classifier. The scope of using the FCM classifier is for character blurring reduction [22]. In order to reduce the storage requirement and cut off the computation time, the training set can be a representative sample of the image pixels. In our approach, the sub-sampling is performed via a fiactal scanning technique, based on the Hilbert’s space tilling curve [ 11.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DocEng’O1, November 9-10,2001, Atlanta, Georgia, USA. Copyright 2001 ACM I-581 13-432-0/01/0011...$5.00.

152

The proposed method was tested with a variety of documents. In addition, this paper presents characteristic examples for various document types. The experimental and comparative results confirm the effectiveness of the proposed method.

combination for the spatial feature set. Especially, the Laplacian feature values obtained by the 3x3 mask

[1

1 1 1 &=I - 8 I I I 1

DESCRIPTION OF THE METHOD A digital gray-scale document I(ij), i=l...n, j=l.. .m can be considered as an image consisting of nxm pixels, where each pixel is a point in the gray-scale space. Usually, the total number of gray-levels is restricted to 256, i.e., I(i, j)e [0...255]. The document gray-scale reduction problem can be considered as the problem of best transforming the original gray-level image to a new one, with only J gray-levels, such as the final image to approximate not only the principal gray-level values, but also the final document must preserve the principal information included in it that mainly is associated with the text. A special case of grayscale reduction is the binarization process where the final image has only two gray-scale values, usually 0 for foreground and 255 for background pixels, respectively. An effective approach to solve this problem is to consider it as a clustering problem and achieve its solution using a suitable self-organized neural nehvork and the FCM algorithm. The proposed technique, which can be used for binarization of degraded documents, is analyzed below step by step.

(1)

in cooperation with the min value in the 3x3 neighbdring region of the sampling pixels construct a suitable set of spatial features for document binaization. To speed-up the entire process and to reduce memory requirements a fmctal scanning sub-sampling technique based on Hilbert curve is adopted [l] (Fig. 2). The important feature of Hilbert curve is that it scans continuously the neighboring entry in the image [2]. 1

2.1. Document Binarization The proposed document binarization method consists of the following steps: (a) Document binarization using a Kohonen SOFM neural network with suitable spatial features. (b)

Fuzzy

membership

extraction. Figure 1. The Kohonen SOFM classifier.

(c) Binarization using the FCM algorithm. The complete analysis of the above stages follows.

2.111

Fig. 3(a) shows the original bad illuminated document. Fig. 3(b) depicts the binary images obtained by the method of Otsu. Fig. 3(c) shows the binary image obtained by the method of Papamarkos and Atsalakis [3]. As it can be observed, both techniques result to dark area in the left part of the document. Finally, Fig. 3(d) shows the binarization results obtained in

Initial Document Binarization

It is obvious that in order to take advantage on the document texture, we must not considered only the image gray-scale values but also suitable spatial features. That is, most of the information in a document is associated with the character forms. For this reason, local characteristics must be extracted and used in the binarization process. If these local characteristics are considered as local spatial features then the binarization problem can be defined as a procedure that leads to a binary document by taking into account except of the gray-scale values suitable spatial characteristics. Now, the problem can be viewed as a clustering one and its solution can be obtained by using an appropriate classitier. If k spatial features are used then we have a k+l features clustering problem. The Kohonen SOFM used in this stage is shown in Fig. 1. We can observe that k+l features feed the neural network and the output consists of only two neurons. It should be noticed that the spatial features must be suitably selected such as the document text areas to be enhanced. From our experience we have found that the edge extraction masks in combination with mean OT min local values provide a good

153

(4 Figure 3. the first stage of the proposed algorithm where the Laplacian and min additional spatial features are applied. As it can be observed, the binary document obtained is better and does not have any dark area.

2.1.3 Final Binarization Via the FCA4 Algorithm. After the determination of the membership functions, the clustering problem is converted into a fuzzy clustering one. The solution of this problem can be easily obtained by using the well-known FCM algorithm [22] where as initial membership values are those obtained by the method described above. The description of the FCM used in our approach is as follows.

2.1.2 Fuzzy membership extraction To improve the binarization results by reducing the blurring effect, we propose a new fuzzy procedure that can classify fuzzy pixels into foreground or background pixels. This technique requires a fuzzy membership extraction procedure that determines the membership functions of the two classes defined by the two output neurons of the SOFM. Referring to the above example, it is clear that each feature vector, which consists of three independent features, is classified in one of the two output classes. Let d,, i=l,.. ., n x m , j=l,2 be the Euclidean distances of the feature vectors from the centers of the two classes. The membership functions for the two classes are obtained through the following relations

Let fr , f2,. . .fn are the II feature vectors V = {vr, v2} are the cluster centers

U = {uii} is the 2xn matrix of the membership values that satisfy the conditions Oluijll,

i=1,2 a n d j=l...,n

(5)

2

u2J

=

1 l+ldj2 - i 21/02

j = l,..., n

c i=l

(3)

o 1. The iterative procedure stops when there is not significant change of the objective function

(4

The FCM algorithm classifies the document pixels into two classes, one for the background pixels and the other for the foreground pixels. The important effect of using of the FCM is the reduction of the document blurring. Fig. 4(a) depicts the document obtained after the application of the FCM algorithm. Comparing Fig. 3(d) and 4(a) it can be observed that the document of Fig. 4(a) is sharper than the document produced by the SOFM. As a final stage, some simple post-processing tasks can be applied. For example, a simple mean 3x3 filter followed

(b)

Figure 4. (a) Document after the application of the FCM. (b) Final document after post-processing.

[2] K.L. Chung, Y.H. Tsai and F.C. Hu, “Space-filling approach for fast window query on compressed images”, IEEE Tran. on Image Processing, vol. 9, no. 12, pp. 2109-2116,200O. [3] N. Papamarkos and A. Atsalakis, “Gray-level reduction using local spatial features”, Computer Vision and Image Understanding, CVIU-78, pp. 336-350,200O. [4] J. Kittler and J. lllingworth, “Minimum error thresholding”, Pattern Recognition, vol. 19, pp. 41-47, 1986. [5] S. S. Reddi, S. F. Rudin and H. R. Keshavan, “An optimal multiple Threshold scheme for image segmentation”, IEEE Tran. on System Man and Cybernetics, vol. 14, no. 4, pp. 661-665, 1984. [6] N. Otsu, “A Threshold selection method from gray-level histograms”, IEEE Tran. on System Man and Cybernetics, vol. 9, no. 1, pp. 62-69, 1979.

3. Example In this experiment we will apply the proposed method to very poor quality gray-scale document of Fig. 5(a). This document has many line graphics and is badly illuminated. The binarization of this document using any statistical technique leads to unaccepted results. Fig. 5(b) shows the binarization results obtained by the SOFM without using of any spatial feature. Fig. 5(c) shows the results obtained by the proposed method, where in the SOFM we use the Laplacian mask of equation (1). Fig. 5(d) depicts the final image obtained if we apply as post-processing operations the mean 3x3 mask followed by the global binarization via the SOFM. Probably the binarization results can be improved further but this requires the application of a time consuming document structure analysis technique.

4. Conclusions

[7] J. N. Kapur, P. K. Sahoo and A. K. Wong, “A new method for gray-level picture Thresholding using the Entropy of the histogram”, Computer Vision Graphics and Image Processing, vol. 29, pp. 273-285, 1985. [8] N. Papamarkos and B. Gatos, “A new approach for multithreshold selection”, Computer Vision Graphics and Image Processing-Graphical Models and Image Processing, vol. 56, no. 5, pp. 357-370, 1994. [9] P.K. Sahoo, S. Soltani, and A. K. C. Wong, “A survey of thresholding techniques”, Computer Vision, Graphics and Image Processing, vol. 41, pp. 233-260, 1988.

This paper proposes a fuzzy technique for binarization of badly illuminated documents. The method is based on a hybrid neurofuzzy classifier that consists of a Kohonen SOFM and a FCM. The feature vector consists of the image gray-scale values and additional spatial features that emphasize the text components. The proposed technique is simple, unsupervised and in most of the cases leads to desirable binarization results. However, in the cases of very poor quality documents, the proposed technique can be used as a pre-processing procedure that helps another structural binarization technique to achieve better binarization results.

5. References

[lo]

[ 1] H. Sagan, Space-Filling curves, Spinger-Verlag, New York, 1994.

155

C. Strouthopoulos and N. Papamarkos, “Text identification for image analysis using a neural network”, Image and Vision Computing-Special Issue on Image Processing and Multimedia Environments, vol. 16, pp. 879-896, 1998.

[l l] N. Papamarkos, C. Stmuthopoulos and I. Andreadis, “Multithresholding of color and gray-level images through a neural network technique”, image and Vision Computing, vol. 18, pp. 213-222,200O.

[18] C. Strouthopoulos and N. Papamarkos, “Multithresholding of mixed type documents”, Engineering Application of A&i&l Intelligence, vol. 13, no. 3, pp. 323-343,200O. [19] O.D. Trier, T. Taxt, “Improvement of & integrated function algorithm for binarization of document images”, Pattern Recognition Letters, vol. 16, no. 3, pp. 277-283, 1995. [20] Y. Liu, S.N. Srihari, “Document image binarization based on texture features”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 5, pp. 540-544, 1997. [21] D. Nauck, F. Klawonn and R. Kmse, Neuro-Fuzzy Systems, John Wiley & Sons, 1997. [22] Z. Chi, H. Yan and T. Pham, Fuzzy Algorithms: With Applications to Image Processing and Pattern Recognition, World Scientific, 1996. [23] L.K. Hung and M.J. Wang, “Image thresholding by minimizing the measure of fuzziness”, Pattern Recognition, vol. 28, pp. 41-51, 1995.

[12]

T. Kohonen, Self-Organizing Maps, Springer-Verlag, New York, 1997. [13] S. Haykin, Neural Networks: A comprehensive foundation, MacMillan College Publishing Company, N. York 1994. [I41 Y. Yang and H. Yan, “An adaptive logical method for binarization of degraded document images”, Pattern Recognition, vol. 33, no. 5, pp. 787-X07,2000. [lS] L. O’Gxman, “Binarization and multithresholding of document images using connectivity”, CVGIP: Graphical Models Image Process, vol. 56, no. 6, pp. 494.506, 1994. [16] J.R. Parker, “Gray level thresholding in badly illuminated images”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 8,pp. 813.819, 1991. [17] J. Saw& and M. Pietikiiinen, “Adaptive document image binarization”, Pattern Recognition, vol. 33, no. 2, pp. 225236,200O.

(b)

Cd) Figure 5. (a) Original document,(b) binarization using the SOFM, (c) binsrization after post-processing.

156

using the proposed technique and (df document