Document Image Enhancement Using Directional ... - Semantic Scholar

Report 2 Downloads 162 Views
Image Enhancement of Historical Documents using Directional Wavelet Qian Wang1, Tao Xia2, Chew Lim Tan1, Lida Li1 1

School of Computing, National University of Singapore 3, Science Drive 2, Singapore 117543 [email protected], [email protected], [email protected] 2

Centre for Wavelets, Approximation and Information Processing Department of Mathematics, National University of Singapore 2, Science Drive 2, Singapore 117543 [email protected]

Abstract - This paper proposes a novel algorithm to clean up a large collection of historical handwritten documents kept in the National Archives of Singapore. Due to the seepage of ink over long period of storage, the front page of each document has been severely marred by the reverse side writing. Earlier attempts have been made to match both sides of a page to identify the offending strokes originating from the back so as to eliminate them with the aid of a wavelet transform. Perfect matching, however, is difficult due to document skews, differing resolutions, inadvertently missing out reverse side and warped pages during image capture. A new approach is now proposed to do away with double side mapping by using a directional wavelet transform that is able to distinguish the foreground and reverse side strokes much better than the conventional wavelet transform. Experiments have shown that the method indeed enhances the readability of each document significantly after the directional wavelet operation without the need for mapping with its reverse side.

Keywords – Document image analysis, directional wavelet transform, thresholding AMS Subject Classification – 65T60, 68U10

1. Introduction The National Archives of Singapore keeps a large collection of double-sided handwritten historical documents. Due to seeping of ink over long periods of storage, the front page of each document has been severely marred by the reverse side writing. As the original copies of these historical documents are carefully preserved and not available for public reading, photocopying of these documents for public access makes the documents even more difficult to read (See Figure 1). There is thus a need to enhance these images by removing the interfering strokes. When we were first approached by the National Archives of Singapore, we tried to look for existing methods to deal with this problem. In the process, we found Negishi’s several automatic thresholding algorithms [1] in extracting character bodies from the noisy background. The algorithms dealt with terribly dirty and considerably large images, and cases where the gray levels of the character parts overlap with that of the background. We later found Liang and Ahmadi’s [2] morphological approach to extract text strings from regular periodic overlapping text/background images. Searching on further saw two binarization algorithms by Chang et al. [3] and Jiang and Hong [4] using histogram-based edge detection and thin line modeling, respectively. Finally, Don’s [5] method utilized the noise attribute features based on a simple noise model to overcome the difficulty that some objects do not form prominent peaks in the histogram.

2

(a)

(b)

Figure 1. Two sample historical document images.

While appealing as they are, the above methods cannot be applied directly to our problem, mainly because the interfering strokes appear in varying intensities relative to the original foreground strokes in different documents. In certain cases, the edges of the foreground strokes are more prominent than the interfering strokes. Image (a) in Figure 1 presents such a case. On the other hand, the interfering strokes sometimes look even darker than the foreground strokes, as we can observe from image (b) in Figure 1. So an entirely different approach has to be resorted for these historical documents. In our earlier work [6][7], we introduced methods for matching both sides of a page to identify the offending strokes originating from the back so as to eliminate such strokes from the front. It is based on the observation that the interfering strokes cannot be stronger than its originating stroke because not all ink seeped through the page. In [6], a point pattern matching algorithm is used to retrieve the correspondence between two sides. Once the matching pixels are found, the intensity difference determines which pixel is originated from the reverse side and should then be removed. The adopted algorithm is tolerant to varying transformation due to image distortion. On

3

the other hand, as the matching is processed pixel by pixel, the method is not efficient enough to be used in real-time applications. We further proposed a wavelet transform [7] to strengthen the foreground strokes and weaken the interfering strokes. The method first manually matches both sides of a document page. Then the foreground and interfering strokes are identified roughly, which guide the revision of wavelet coefficients during the wavelet transform. The wavelet transform is performed iteratively to improve the readability of the document image step by step. Perfect mapping of strokes from both sides, however, is difficult due to reasons such as (1) different document skews and resolutions during image capture of both sides, (2) inadvertently missing out of a reverse page during scanning, and (3) warped surfaces caused by the placement of the thick bound volume of the documents on the scanner’s glass plate. It is with these considerations that another approach without the need for the reverse page is proposed in this paper. It is observed that the writing style of these documents is slanting from the lower-left to upper-right. In contrast, by its mirror image effect, the interfering strokes originating from the reverse side are slanting from the upper-left to lower-right. We therefore would like to take advantage of this distinguishing feature to separate the foreground strokes and the interfering strokes using techniques such as wavelet. However, the conventional wavelet transform for image highlights and separates the horizontal and vertical edges into different wavelet frequency domain [8]. Therefore it is not suitable for our application. To exploit the directional property of the strokes in these documents, we develop a directional wavelet transform for 2-D image which separates the foreground strokes and the interference mainly in different wavelet frequency domains. A theoretical

4

analysis shows that it can be implemented by orientation filtering operation of conventional filters. In section 2, we will elaborate the directional wavelet based method. Section 3 will present the experimental results of the proposed system, which is followed by the conclusion in section 4.

2. Proposed Method The writing style of the document determines the differences between the orientations of the foreground and the interfering strokes. Basically the foreground and the interfering strokes are slanting along the directions of 45° and 135° respectively. It is known that a two-dimensional wavelet transform extracts the spatial information contained in the image, more precisely, the horizontal, vertical and diagonal components. However, in our application, we need to differentiate 45° and 135° components. One way is to rotate the document image 45° clockwise. Thus the foreground strokes become horizontal and the interfering strokes are parallel to the vertical axis. But this approach introduces additional storage, computational cost and some distortion of image caused by the rotation operation. A more straightforward method is to use a directional wavelet transform for the image, in which the wavelet filters are convolved along the directions of 45° and 135° instead of horizontally and vertically, so that the foreground strokes will be captured in one component and on the other hand, the interfering strokes will contribute to the other component.

5

2.1. Directional Wavelet Transform Let f ( x, y ) denote the 2-D continuous signal. The directional wavelet transform for f ( x, y ) is as follows, 2 (C j f )(m, n) = (< f ( x, y ), Φ j , m , n ( x, y ) >)( m, n ) ∈ Z  1 2 1 ( D j f )(m, n) = (< f ( x, y ), Ψ j , m, n ( x, y ) >)( m , n ) ∈ Z  2 2 2 ( D j f )(m, n) = (< f ( x, y ), Ψ j , m, n ( x, y ) >)( m , n ) ∈ Z  3 2 3 ( D j f )(m, n) = (< f ( x, y ), Ψ j , m , n ( x, y ) >)( m, n ) ∈ Z

(1)

where j is the scale index, (C j f )(m, n) are the wavelet approximation coefficients of f ( x, y ) at scale 2 j , while ( D kj f )(m, n)(k = 1,2,3) are the wavelet coefficients accordingly. For simplicity, we use C j , D 1j , D 2j , D 3j to represent C j f , D 1j f , D 2j f , D 3j f . In the normal wavelet transform, 2-D scaling and wavelet functions are constructed by a tensor product of 1-D scaling and wavelet functions of two separable variables [8]. The proposed 2-D directional wavelet transform for image is constructed as follows, Φ j , m , n ( x, y ) = φ j , m ( x1 )φ j , n ( y1 ) Ψ 1j , m , n ( x, y ) = ψ

j ,m

( x1 )φ j , n ( y1 )

Ψ j2, m , n ( x, y ) = φ j , m ( x1 )ψ Ψ 3j , m , n ( x, y ) = ψ

where φ j , m ( x) =

1 2

j

φ(

j ,m

( x1 )ψ

j ,n

(2)

( y1 )

j ,n

( y1 )

x x 1 − m), ψ j , m ( x) = ψ ( j − m), φ ,ψ are conventional 1-D j j 2 2 2

x  x  cos θ scaling and wavelet functions , and  1  = A , A = c  y  − sin θ  y1 

sin θ  , In this paper we cos θ 

use parameters c = 2 , θ = π / 4 for document image analysis application.

6

It is well known that wavelet is translation variant. However translation invariance is desired for some applications, such as signal denoising and our document image analysis. Translation invariant wavelet transform is proposed for signal applications [9]. It is equivalent to wavelet transform without downsampling in other words, frames in some sense. The following theorem gives the translation invariant directional wavelet transform. Theorem: Using the orthogonal/biorthogonal 2-D directional wavelet defined in (1) and (2), directional wavelet transform decomposition and reconstruction are computed as in (3) and (4), where C 0 (m, n) is the approximation coefficient of scale 0, and inserting 2 j − 1 zeros between each sample of filter k (n) makes the dilated filter k j (n) , and

{h(n)}n , {g (n)}n are the analysis filters associated with scaling and wavelet functions ~ respectively, {h (n)}n , {g~ (n)}n are the synthesis filters associate with 1-D scaling and ~ wavelet functions, h (n) = h(n), g~ (n) = g (n) hold for the orthogonal wavelets and do not hold for biorthogonal wavelets. The proof is given in the appendix. C j + 1(m, n) = ∑ ∑ C j (m + k − l , n + k + l )h j (k )h j (l )  l k  1 (m, n) = ∑ ∑ C (m + k − l , n + k + l )h (k ) g (l ) D j j j  j + 1 l k  2  D j + 1(m, n) = ∑ ∑ C j (m + k − l , n + k + l ) g j (k )h j (l )  l k  3  D j + 1(m, n) = ∑ ∑ C j (m + k − l , n + k + l ) g j (k ) g j (l )  l k j ∈ Z , (m, n) ∈ Z 2 C j ( m, n ) =

~ ~ 1 (∑ ∑ C j +1 (m − k + l , n − k − l )h j (k )h j (l ) + 4 l k ~ ∑ ∑ D1j +1 (m − k + l , n − k − l )h j (k ) g~ j (l ) + l

k

∑∑D l

2 j +1

~ (m − k + l , n − k − l ) g~ j (k )h j (l ) +

k

∑∑D l

3 j +1

(m − k + l , n − k − l ) g~ j (k ) g~ j (l ))

k

7

(3)

(4)

j ∈ Z , (m, n) ∈ Z 2

In the application, the original image is regards as { C 0 ( m , n )}( m ,n )∈Z 2 . It is obvious that (3) and (4) can be implemented by the convolution of conventional wavelet filters along 45° and 135°. The three-level wavelet decomposition generates the wavelet coefficients and approximation coefficients as below. The decomposition results for the first level decomposition are shown in Figure 2. Wf (m, n) = {C 3 (m, n), D kj ( m, n), j = 1,2,3; k = 1,2,3}

(a1)

(b1)

(a2)

(b2)

8

(5)

(a3)

(b3)

(a4)

(b4)

Figure 2. Directional wavelet decomposition results: (a1)(b1) C1 ( m , n) ; (a2)(b2) D11 ( m , n) ; (a3)(b3) D12 ( m , n ) ; (a4)(b4) D13 ( m , n ) .

As we can observe from the images above, the foreground and background strokes with the diagonal orientation are distinct and highlighted in the images of D1j (m, n) and

D 2j (m, n) respectively, therefore directional wavelet transform output is suitable for further image processing operations for the document image [7]. An alternative is to enhance D1j (m, n) and smear D 2j (m, n) as below, ~ Do { D kj (m, n) = e kj D kj (m, n) } for ( j=1,2,3)

9

(6)

where e kj {e1j > 1, 1 > e2j > 0, j = 1,2,3} are set empirically. The inverse directional wavelet transform (4) is used for the reconstruction back from the processed wavelet coefficients ~

~

together with approximation coefficients { C 3 ( m , n ), D 1j ( m , n ), D 2j ( m , n ), D 3j ( m , n ), j = 1,2 ,3 } . Daubechies [10] has proved that there is no symmetric orthogonal wavelet except for Haar wavelet, so in the image compression and image processing applications biorthogonal wavelets are in popular use. In this paper we use biorthogonal wavelets with (5,3) taps. The directional wavelet transform can strengthen the foreground strokes and weaken the interference in the reconstructed image, as we can see from Figure 3.

(a)

(b)

Figure 3. Final results of directional wavelet reconstruction.

It is noticed that the reconstructed foreground strokes is much darker than the interference images. So unwanted noise can be easily removed after binarization (See Figure 4).

10

(a)

(b)

Figure 4. Reconstruction images after binarization.

2.2 Image Recovery Directional wavelet transform produces a clean output; however some of the foreground strokes become broken when interfering strokes that were intersecting with these foreground strokes have been removed. And it is obvious that small pieces of strokes may have extremely different orientations other than the majority and the system will remove them together with the interference. To deal with this problem, the reconstructed image from the above now serve as loci for us to recover streaks of gray level images from the original document image, such that the neighboring pixels within a 7×7 window centered on each edge pixel are recovered. It may be perceived as that while tracing along the edges, a small 7×7 window is opened up to view the original document image (see Figure 5). The size of the window is based on the average width of the strokes in the documents. This window is more reasonably defined for performing adaptive thresholding. Through this recovery, isolated or broken foreground strokes are fully restored. Figure 6 shows the restored images of foreground strokes. To improve the final

11

appearance, the restored images are binarized using Niblack’s [11] method and the resultant images are shown in Figure 7. 7 Edges

7

Figure 5. Recovery of the foreground strokes images from the original document image while tracing along edges.

(a)

(b)

Figure 6. Restored foreground text images.

(a)

(b) Figure 7. Final binarized output.

12

3. Experiment Results And Discussion The performance of our approach has been evaluated based on the scanned images of historical handwritten documents from the National Archives of Singapore. About 200 images have been tested and found to produce readable outputs. For illustration purpose, we choose 50 images that contain serious interference and manually count the following numbers of words: words in the original documents, words that are fully restored, words or parts of words from the interference, words that are impaired by the interference. The two evaluation metrics: precision and recall [12] defined below are used to measure the performance of the system. No. of words correctly detected Total no. of words detected

(7)

No. of words correctly detected Total no. of words in the document

(8)

Precision = Recall =

In equations (7) and (8), the total number of words detected refers to the words appearing in the final output, some of which are the original words on the front side and some are from the reverse side (interfering words). The total number of words in the document refers to all the original words on the front side of a document image. If some words on the front side are lost or not recovered properly in the resultant image, the whole word is considered lost and not counted. If parts of a word from the reverse side appear, the total number of words detected will be increased by 1. Precision shows how well the system can remove the interfering strokes while recall is an indication of the performance of the system in restoring the front page to its original state. The evaluation of the 50 images is shown in table 1, and the first image with its final binarized results are shown in Figure 8. Table 1 shows a high average precision and recall

13

of 87.5% & 92.4% respectively. By enhancing and smearing wavelet coefficients sufficiently, almost all the original foreground strokes can be detected. However, in the image recovery process, although the interfering strokes have already been removed, bits and pieces of interfering strokes can still fall into the 7×7 window and remain as interference in the foreground. On the other hand, a few strong interfering strokes are erroneously regarded as the foreground strokes. These have thus prevented the system from achieving a perfect recall and precision.

(a)

(b)

Figure 8. Result of interference removal of the entire document page: (a) original image; (b) final binarized result.

14

Table 1. Evaluation of image restoration results Image Total # words Precision (%) Recall (%) Image Total # words Precision (%) Recall (%) Image Total # words Precision (%) Recall (%) Image Total # words Precision (%) Recall (%) Image Total # words Precision (%) Recall (%)

1 188 93.7 94.1 13 180 87.5 93.3 25 173 89.7 96 37 147 80.1 87.8 49 142 77.5 87.3

2 204 96.6 97.1 14 173 89.7 96 26 180 87.5 93.3 38 109 86.2 97.2 50 88 89.5 96.6

3 4 162 166 67.8 98.8 72.2 100 15 16 157 150 89.9 86.7 96.1 91.3 27 28 211 175 85.6 86.7 92.9 93.1 39 40 113 199 92.2 87.7 94.7 89.9 Average 140 87.5 92.4

5 247 90.8 91.9 17 99 85 91.9 29 95 74.3 82.1 41 130 88.8 97.9

6 146 96 98 18 113 91.3 92.9 30 150 86.7 91.3 42 135 84.4 88.1

7 186 90 92 19 120 92 95.8 31 157 89.9 96.2 43 124 89.4 95.2

8 187 86.8 88.2 20 115 93.2 94.8 32 159 75 84.9 44 149 92.3 96.6

9 194 96.4 96.9 21 109 85.1 89 33 91 77.9 81.3 45 127 94.7 97.6

10 172 93.7 95.3 22 124 92.3 96.8 34 78 79.1 87.1 46 40 95.1 97.5

11 113 91.3 92.9 23 127 82 89.8 35 98 87 96 47 66 83.6 92.4

12 124 92.3 96.8 24 139 77.6 87.1 36 48 84 87.5 48 117 84.9 91.5

4. Conclusions And Future Work In this paper we introduce a new method based on directional wavelet to remove interference appearing in historical handwritten documents. This new algorithm improves the appearances of the original documents significantly. Currently we are looking into the development of a flexible method to work with strokes of arbitrary θ and more general affine transform A other than constant multiply a unitary matrix in (2). This will then allow other such historical documents with different stroke orientations to be applied with the directional wavelet.

15

Acknowledgement This research is supported by a joint grant R-252-000-071-112/303 provided by the Agency for Science, Technology and Research (A*STAR) and the Ministry of Education, Singapore.

We would like to thank the National Archives of Singapore, for the

permission to use their archival documents.

References [1]

H. Negishi, J. Kato, H. Hase and T. Watanabe, Character extraction from noisy background for an automatic reference system, Proc. of 5th International Conference on Document Analysis &Recognition, India, 1999, pp. 143-146.

[2]

S.Liang, M. Ahmadi, A morphological approach to text string extraction from regular periodic overlapping text/ background images, Graphical Models and Image Processing. CVGIP, Vol. 56, No. 5, Sep.1994, pp. 402-413.

[3]

M. S. Chang, S. M. Kang, W. S. Rho, H. G. Kim and D. J. Kim, Improved binarization algorithm for document image by histogram and edge detection, Proc. of 3rd International Conference on Document Analysis and Recognition, Canada, 1995, pp. 636-639.

[4]

J. H. Jang and K. S. Hong, Binarization of noisy gray-scale character images by thin line modeling, Pattern Recognition, 32, 1999, pp. 743-752.

[5]

H. S. Don, A noise attribute thresholding method for document image binarization, Proc. of 3rd International Conference on Document Analysis and Recognition, Canada, 1995, pp. 231-234.

16

[6]

Q. Wang, C. L. Tan, Matching of double-sided document images to remove interference, IEEE Conference on Computer Vision and Pattern Recognition, CVPR2001, Hawaii, USA, 8-14 Dec 2001.

[7]

C. L. Tan, R. Cao, P. Shen, Restoration of archival documents using a wavelet technique, IEEE Tansactions on Pattern Analysis and Machine Intelligence, Vol.24, No. 10, October 2002, pp. 1399-1404.

[8]

S Mallat, A wavelet tour of signal processing, Academic Press, 1998.

[9]

R. R. Coifman and D. Donoho, Translation invariantde-noising, Technical Report, 475, Dept. of Statistics, Stanford University, May, 1995.

[10] I Daubechies, Ten lectures on wavelets, SIAM, Philadelphia, PA, 1992. [11] W. Niblack, An introduction to digital image processing, Englewood Cliffs, N. J., Prentice Hall, pp. 115-116, 1986. [12] M. Junker, R. Hoch and A. Dengel, On the evaluation of document analysis components by recall, precision, and accuracy, Proc. of 5th International Conference on Document Analysis and Recognition, India, 1999, pp. 713-716.

17

Appendix Proof We only prove the first equation of (3), all other equations in (3) and (4) can be proved in same way. We first prove the following equation, C

j +1

(m, n) = ∑∑ < f ( x, y ), l

k

1 1 1 φ ( j ( x + y ) − (m + k + l ))φ ( j (− x + y ) − (n + (k − l ))) > h(k )h(l ) j 2 2 2

(9)

According to the definition, C

j +1

( m , n ) =< f ( x , y ), φ j ,m ( x1 )φ j ,n ( y1 ) >=< f ( x , y ),

1 1 x+ y 1 −x+ y φ( j ⋅ − m )φ ( j ⋅ −n)> j 2 2 2 2 2

We prove (9) holding for 2 cases according to the parity of m + n , 1) If We

m+n m−n ∈ Z , then ∈Z 2 2 do

the

coordinate

 x'  transform A1 :   =  y'

x A ,  y

transform  1 1  , A =   − 1 1

for A

is

XY

plain

the

matrix

by in

affine (2)

with

c = 2 , θ = π / 4 , then, A1 f ( x , y ) → g 1 ( x' , y' )

m−n m+n 1 1 A1 φ ( 2 − j ( x + y ) − m )φ ( 2 − j ( − x + y ) − n ) → φ ( 2 − j x' − )φ ( 2 − j y' − ) j j 2 2 2 2

thus,

C In

new

j +1

( m, n) =< g ( x ' , y ' ),

coordinate

system,

1 m−n m+n φ (2 − j x'− )φ (2 − j y '− )> j 2 2 2

the

conventional

2-D

wavelet

transform

~ ~ for (C j f )(m, n), ( D kj f )(m, n), j ∈ Z ,k = 1,2,3 can be defined as (1) with 2-D wavelets and scaling function, 18

Φ j , m, n ( x, y ) = φ j , m ( x )φ j , n ( y ) Ψ 1j ,m , n ( x, y ) = ψ Ψ

2 j ,m, n

j ,m

( x )φ j , n ( y )

( x, y ) = φ j , m ( x )ψ

Ψ 3j ,m , n ( x, y ) = ψ

j ,m

j ,n

( x )ψ

( y) .

j ,n

(2’)

( y)

A simple extension of Mallat’s work [8] to 2-D case leads to, C

j +1

m−n m+n ~ , ) (m, n) = (C j +1 g 1 )( 2 2 ~ m−n m+n = ∑∑ (C j g 1 )( + l, + k )h(k )h(l ) 2 2 l k = ∑∑ < g 1 ( x, y ), φ l

k

= ∑∑ < g 1 ( x, y ), l

k

We do the affine transform A1

j,

m −n +l 2

( x)φ

j,

m+n +k 2

( y ) > h(k )h(l )

1 1 m−n 1 m+n φ( j x − ( + l ))φ ( j y − ( + k )) > h(k )h(l ) j 2 2 2 2 2

−1

Then, C

j +1

2) If We

( m , n ) = ∑∑ < f ( x , y ), l

k

1 1 1 φ ( j ( x + y ) − ( m + k + l ))φ ( j ( − x + y ) − ( n + ( k − l ))) > h( k )h( l ) j 2 2 2

m+n m ± n +1 ∉ Z , then ∈Z 2 2 do

the

 x'  transform A2 :   =  y' 

coordinate

transform

for

 x  1  A  +  ,  y  0

 1 1  , then, A =   − 1 1

XY

plain

by

affine

A2 f ( x, y ) → g 2 ( x' , y ' )

m − n +1 m + n +1 1 1 A2 φ (2 − j ( x + y ) − m)φ (2 − j (− x + y ) − n) → φ (2 − j x'− ) )φ (2 − j y '− j j 2 2 2 2

thus,

C

j +1

( m, n) =< g 2 ( x' , y ' ),

m − n +1 m + n +1 1 φ ( 2 − j x'− )> )φ (2 − j y '− j 2 2 2 19

Similar to case 1), we apply the conventional 2-D wavelet transform defined in (1) and (2’), C

m − n +1 m + n +1 ~ ( m , n ) = ( C j +1 g 2 )( , ) j +1 2 2 m − n +1 m + n +1 ~ = ∑∑ ( C j g 2 )( + l, + k )h( k )h( l ) 2 2 l k = ∑∑ < g 2 ( x , y ),φ l

k

= ∑∑ < g 2 ( x , y ), l

k

We do the affine transform A2

j,

m − n +1 +l 2

( x )φ

j,

m + n +1 +k 2

( y ) > h( k )h( l )

1 1 m − n +1 1 m + n +1 φ( j x − ( + l ))φ ( j y − ( + k )) > h( k )h( l ) j 2 2 2 2 2

−1

Then, C

( m , n ) = ∑∑ < f ( x , y ),

j +1

l

k

1 1 1 φ ( j ( x + y ) − ( m + k + l ))φ ( j ( − x + y ) − ( n + ( k − l ))) > h( k )h( l ) j 2 2 2

So in both cases, (9) holds, from (9) we have, C

j +1

( m , n ) = ∑∑ < f ( x , y ), l

k

l

k

l

k

1 1 1 φ ( j x1 − ( m + k + l ))φ ( j y1 − ( n + k − l )) > h( k )h( l ) j 2 2 2

= ∑∑ < f ( x , y ), Φ j ,m + k + l ,n + k −l > h( k )h( l ) = ∑∑ C j ( m + k + l , n + k − l )h( k )h( l )

That is the first equation of (3). That completes the proof.

20