Estimation of 3D Shape of Warped Document Surface for Image Restoration Zheng Zhang, Chew Lim Tan, Liying Fan School of Computing, National University of Singapore 3 Science Drive 2, Singapore 117543 Email: {zhangz, tancl,fanly}@comp.nus.edu.sg
Abstract While scanning pages from a thick, bound book, there are two sources of distortion in the document images: 1) shade along the book ‘spine’, and 2) warping of the book surface in the shade area. In this paper, we propose a fast method to estimate the 3D shape of a book surface. Based on this shape, we remove the shade and correct the warping to restore the document images. The experiments show that the photometric and geometric distortions are mostly removed. The OCR tests on the original and restored document images are also presented.
1. Introduction There are two sources of distortion in the document images scanned from a thick, bound book due to curved book surface – 1) shade along the book ‘spine’, and 2) warping of the book surface in the shade area. In this paper, we present a fast restoration method by estimating the 3D shape of a book surface from the shading information. In the last few years, some related techniques have been reported in the literature. Pilu [1] presents a novel method based on the physical modeling of paper deformation with an applicable surface. Since this method represents the applicable surface as a polygonal mesh, the texts in the experiment results after correction are simply not legible even to human eyes. Brown [2] proposes a general deskewing algorithm for arbitrary warped documents based on the 3D shape. It requires a special lighting setup to do active lighting to obtain structural information. This algorithm cannot be applied to normal scanner or camera images. Cao [3] introduces a method to rectify the warping of a bound document image captured by the camera. He builds a general cylindrical model and uses the skeleton of horizontal text lines in the image to help estimate the model parameters. This method would not work if there are no text lines or very few text lines. Wada [4] develops a complicated model incorporating interreflections (increased illumination on one part of the book caused by secondary reflections from another). However, the interreflections have only little effect on the estimated shape of book surface, while the computational cost of interreflections is quite expensive even using the
tessellation method he proposes. The above methods are still far from providing a practical solution. In this paper, we estimate the book surface by re-modeling [4]. We next restore the document image by removing the shade and correct the warping in the shade. As a measure of success of the restoration, we present an experiment to show the improvement of OCR results.
2. Book surface shape estimation In this section, we first specify our problem by describing the practical conditions of scanning a book surface by an image scanner. We then state the assumptions in our model, and formulate the problem. Finally we provide the solution for the problem. 2.1. Practical scanning conditions The structure of the scanner and its coordinate system is shown in Figure 1. The image scanner consists of a light source L, a linear CCD sensor C, a mirror R and a lens E. The sensor C takes a 1D image P( x i ) along the scanning line S and moves with L, R, and E. The sequence P( x i ) forms a 2D M × N image P( xi , y j ) . A typical scanned grayscale document image is shown in Figure 2. 2.2. Problem formulation The following assumptions are made in our model: The book surface has a smooth cross section shape on y-z plane. • The book spine (x-axis) is parallel to the scanning light. • The book surface is Lambertian, i.e. no specular reflections and uniform brightness of reflected light in all directions. This is quite reasonable as most papers are nearly Lambertian. The problem is formulated by considering the following factors: 1) proximal and moving light source, 2) Lambertion reflection, and 3) nonuniform albedo. We first consider an ideal shape-from-shading problem, which ideally satisfies 1) a distant and fixed light source, 2) Lambertion reflection, and 3) uniform •
albedo. The problem under these ideal conditions can be formulated as: (1) I o ( x ) = I s ⋅ k ⋅ cos ϕ ( x ) where x denotes a 2D point in the scanned image, I o (x ) the reflected light intensity observed at x , I s the illuminant intensity, k the albedo on the surface, and ϕ (x ) the angle between the light source direction and the surface normal at the 3D point on the book surface corresponding to x . With a proximal and moving light source, the illuminant intensity is no longer constant over the object surface. The illuminant intensity on one point x is now a function of the location of x and that of the light source corresponding to x . We formulate the problem as follows: (2) I o ( x ) = I s ( s ( x ), l ( x )) ⋅ k ⋅ cos ϕ ( x )
z
P ( xi , y j ) : The image intensity at ( xi , y j ) in the
•
α , β : The gain and bias of the photoelectric
•
transformation in the image scanner respectively. z ( y j ) : The distance between the scanning plane and
observed image, i.e. the scanned document image.
the book surface, i.e. z ( y j ) is the practical representation of s . z ( y j ) is represented as follows:
•
i
(5)
I s ( z ( y j )) : The illuminant intensity distribution on the
I s ( z ( y j )) = = =
j
where
k
I s ( s ( p ), l ( p )) . Based on the directional linear light source model, I s ( z ( y j )) is represented as follows:
where k ( s ( x )) denotes the albedo at s (x ) , cos ϕ ( x) the reflectance property. Note that our assumptions reduce the 3D shape reconstruction problem to a 2D cross section shape reconstruction problem. By the coordinate system in Figure 1 and Equation (3), the relationship between the image intensity (pixel value) and the reflected light intensity is represented as follows: P( xi , y j ) (4) = α ⋅ I (x , y ) + β o
∑ tan θ ( y )
y k = y N −1
y-z plane when taking the 1D image at y j , i.e. I s ( z ( y j )) is the practical representation of
where s (x ) and l (x ) denote the 3D point on the book surface and light source location corresponding to x . Finally, we can formulate our problem by incorporating Lambertian reflection and nonuniform albedo distribution characteristics into Equation (2): (3) I o ( x) = I s ( s ( x ), l ( x )) ⋅ k ( s ( x )) ⋅ cos ϕ ( x)
= α ⋅ I s ( z ( y j )) ⋅ k ( x i , y j ) ⋅ cos ϕ ( y j ) + β
yj
z( y j ) =
•
I D (ψ ) ( y j − ( y j − d1 )) 2 + ( z ( y j ) − ( − d 2 )) 2
(6)
I D (ψ ) ( d1 ) 2 + ( z ( y j ) + d 2 ) 2 I D (ψ ) ( z ( y j ) + d 2 ) ⋅ tan 2 ψ + 1
where ( y j , z ( y j )) and ( y j − d 1 ,− d 2 ) denote the locations of the 3D point and the linear light source on the y-z plane respectively, ψ the angle between the light source direction and the normal to the scanning plane, I D (ψ ) the directional distribution of the illuminant intensity. k ( x i , y j ) : The 3D point albedo corresponding to ( xi , y j ) .
Book Surface z
Scanning Plane
V x
S
x M −1
x0 y0
y N −1
L
R
E
z V ψ
θ(y j )
0
z(yj )
yj
C L : R: E: C:
Light Source Mirror Lens Linear CCD Sensor
V: S:
Vertical Plane Scanning Line
S
d2
L
y N −1
d1
R
E
y
y
C
Moving Direction
Figure 1. The practical scanning conditions.
Figure 2. A scanned grayscale document image.
•
cos ϕ ( y j ) : Lambertian reflection property. ϕ ( y j ) is the angle between the light source direction and the surface normal at y j .
Parameter α , β , d 2 , ψ , I D (ψ ) are estimated a priori using calibrated images.
3.1. Removing the shade If the physical document page is flat while scanning, by Equation (4), the optimal image intensity P * ( x i , y j ) for point ( x i , y j ) will be: P * ( xi , y j ) = α ⋅ k ( xi , y j ) ⋅ I s (0) ⋅ cosψ + β
2.3. Solution of the problem Note that our goal is to restore the document image by 1) removing the shade, and 2) correcting the warping in the shade. We need the albedo distribution k ( x i , y j ) and
(9)
where k ( x i , y j ) is calculated by Equation (8), I s (0) is calculated by Equation (6), and α , β , ψ are known constants. Therefore, we recalculate the image intensity for each pixel by Equation (9).
the shape of the book surface z ( y j ) for 1) and 2) respectively. Our shape estimation is calculated by first finding the maximum intensity values over each column (each y value). These values are assumed to be the values of the background white of the book, since each column will of course have at least some white. Because the book surface is assumed to be Lambertian, to find the ϕ (angle between surface normal and light source direction), we can assume: (7) ϕ ( y j ) = c ⋅ arccos (P ( y j ) / Pmax ) where P ( y j ) is the maximum intensity of the column y j , Pmax is the maximum intensity over all columns and c is some constant related with the scanner. From Figure 1 we can see that θ (angle between surface normal and horizontal line) can be calculated by knowing ϕ and
ψ (angle between perpendicular and light source direction). Then the book surface shape z ( y j ) is calculated by Equation (5). Figure 3 shows the estimated cross section shape on the y-z plane of the book surface in Figure 1. Comparing with [5], this is a quick estimation of warped surface (total runtime is given in Section 4, Table 1), but sufficient for the image restoration task. The albedo distribution is recovered by submitting z ( y j ) and ϕ ( y j ) to Equation (4), and represented as: P ( xi , y j ) − β (8) k ( xi , y j ) =
α ⋅ I s ( z ( y j )) ⋅ cos ϕ ( y j )
3.2. Correcting the warping Taking the image generated by Equation (9) as the input, we correct the warping based on the shape z ( y j ) of the book surface. Since the sensor picks up a 1D projection for each column, the distortion along x-axis is due to perspective projection, while the distortion along yaxis is only due to orthogonal projection. 3.2.1. Correction along x-axis. Figure 4 shows a slice of x-z plane at y j . By the coordinate system in Figure 1, AO = z ( y j ) , OD ' = 2 × OC ' = xM −1 , and the focal length f = CC ' . We restore the x-axis distortion by regenerating the image intensity P ** ( x i , y j ) for each pixel on y j from Equation (9) (stretching A’B’ to the length of AB, from perspective equations, this stretching proves to be uniform for each y j and proportional to the z value) ⎧ P* ( xr , y j ) ⎪ * P** ( xi , y j ) = ⎨P (ceil( xr ), y j ) ⋅ ( xr − floor( xr )) ⎪+ P* ( floor( x ), y ) ⋅ (ceil( x ) − x ) r j r r ⎩
if xr is int eger otherwise
(10) where • P* ( x, y ) : Pixel value regenerated from Equation (9). •
x r : The relative location on A’B’ corresponding to
xi . x r is represented as follows:
Z
3. Restoration of document image
B
A
Book Surface
Z(yj)
Z [mm] 60 50 40
O
30
A’
C’
B’
D’ X
Image Plane
f
20 10 0 -10
0
20
40
60
80
100
120
140 Y [mm]
Figure 3. Estimated cross section shape on y-z plane of the book surface in Figure 1.
C Figure 4. Perspective projection on a slice of x-z plane.
A' B' f xr = xi ⋅ + OA' = xi ⋅ + AB z( y j ) + f
1 z ( y j ) ⋅ xM −1 2 z( y j ) + f
(11)
•
floor ( x r ) : Returns the largest integer that is less or equal to x r .
•
ceil( x r ) : Returns the smallest integer that is greater or equal to x r .
3.2.2. Correction along y-axis. Distortion along y-axis is due to orthogonal projection. Thus first we add the estimated true width w calculated from Equation (12), then we stretch the observed image to its true width using similar method discussed in the last section. y 1 (12) w = ∑( − 1) j = 0 cos θ ( y j ) Figure 5 shows the final restored image. The readability of the book surface is drastically improved by the image restoration. This result demonstrates that the estimated shape is sufficient for the image restoration task. 0
4. OCR results comparison Our OCR tests were performed twice before and after the image restoration respectively. Precision and recall defined in [6] are used as the comparison metrics. We carried out our tests on 30 documents images. Due to space constraint, we only show the precision and recall for 5 sample document images and the average for overall 30 document images in Table 1. In addition, the runtime of image restoration running on a PIV2.6G PC is given. The average run time of the image restoration process is of the order of few seconds (for resolution 300 ppi), which is tolerable in comparison with the normal scanning time. Comparing with [5], the runtime is reduced by nearly 50%.
Figure 5. Restored document image. Table 1. OCR precision and recall on 5 document images. Restoration Original Restored runtime image image (sec) P R P R Doc 1 85.2 84.0 95.8 94.1 2.31 Doc 2 82.1 81.6 94.1 93.2 2.19 Doc 3 85.7 82.4 94.6 91.7 2.14 Doc 4 85.6 83.6 94.4 91.6 2.56 Doc 5 83.1 82.6 94.9 93.5 2.94 84.2 82.9 94.7 93.5 Ave of 30 2.43 P: Precision (%) R: Recall (%)
5. Conclusion and future works In this paper, we address the problems of distorted images while scanning thick, bound documents. We restore the document images based on a quick estimation of book surface. The experiments show that the photometric and geometric distortions are mostly removed, and the OCR results are improved markedly. To improve the value in practical application, our future work will look into relaxing the assumptions mentioned in Section 2.
Acknowledgement This research is supported by Agency for Science, Technology and Research, and Singapore Ministry of Education, under grant R252-000-071-112/303.
References [1] M.Pilu, "Undoing Page Curl Distortion Using Applicable Surfaces", Computer Vision and Pattern Recognition Conference, Volume 1, pp. 67-72, December 2001. [2] M.S. Brown, W.B. Seales, “Document Restoration using
3D Shape: a General Deskewing Algorithm for Arbitrarily
Warped Documents”, International Conference on Computer Vision, Volume 2, pp. 367-374, July 2001. [3] H. Cao, X. Ding, C. Liu. “A Cylindrical Model to Rectify the Bound Document Image”, International Conference on Computer Vision, Volume 2, pp. 228-233, October 2003. [4] T. Wada, H. Ukida, T. Matsuyama. “Shape from Shading with Interreflections Under a Proximal Light Source: DistortionFree Copying of an unfolded Book”, International Journal of Computer Vision, 24(2), 125-135(1997). [5] Z. Zhang, C.L. Tan, L.Y. Fan. “Restoration of Curved Document Images through 3D Shape Modeling”, International Conference on Computer Vision and Pattern Recognition, 2004. [6] M. Junker, R. Hoch and A. Dengle. “On the Evaluation of Document Analysis Components by Recall, Precision and Accuracy”, International Conference on Document Analysis and Recognition, India, pp. 713-716, 1999.