Warped Document Image Restoration Using Shape-from-Shading and Physically-Based Modeling Li Zhang and Chew-Lim Tan School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543
Abstract With the pervasive use of handheld digital devices such as camera phones and PDAs, people have started to capture images as a way of recording information. However, due to the non-planar geometric shapes of some documents such as thick bound book pages, rolled scripts, etc, the captured images are often warped. This causes problems for many document analysis and retrieval tasks. This paper proposes a de-warping approach that first reconstructs the surface using a Shape-from-Shading (SFS) technique and then flattens the warped surface using a physically-based deformable model. The SFS method is based on the viscosity framework and considers perspective projection under oblique light source. The recovered surface is represented as a triangular mesh and is restored to its planar state through a numerical integration process. We have tested the proposed approach on both synthetic and real images. The synthetic images are mainly used to evaluate the SFS method and the results are compared with those of a prior SFS method. The reconstructed surfaces of real document images are also compared with the actual shapes captured using a 3D scanner and the restored images are evaluated with those obtained from range-scanned surface shapes.
1. Introduction With the wide availability of current high resolution digital cameras, camera imaging has become a new trend of digitizing physical documents. This is especially true for those ancient manuscripts that are too fragile to be manipulated without special care. It is often hard to make those bumpy document surfaces adhere to the flatbed scanning plane without physically damaging the manuscripts. Therefore, current digitization tasks tend to rely on high resolution digital cameras or even special 3D scanners that are able to capture both 2D texture and 3D structure informa-
tion at one go. Besides digitization, people have also started to capture document images as an alternative to daily notes taking. As camera photos can be snapped quickly from documents wherever and however they appear, this creates many distorted images containing both geometric and photometric distortions due to the non-planar geometric shapes of documents and the imaging process based on the pinhole camera projection principle. Some examples of the warped document images are shown in Figure 1. It is thus necessary to correct these distortions and produce a flattened rendition of the warped document for better human perception and further document analysis and retrieval tasks.
(a)
(b)
Figure 1. Examples of warped document images.
The existing document image de-warping approaches generally can be divided into two categories: 1) restoration approaches based on 2D image processing; 2) restoration approaches based on 3D shape discovery and modeling. The first category makes use of 2D information in the document image to find distortion parameters [1]or approximate the 2D transformations [2, 3] using certain image processing techniques in order to perform restoration. These approaches usually rely on the content of the document, such as identifiable reference points, text lines, document boundaries, etc. Moreover, the warping curvature estimated based on 2D information alone does not reflect the actual 3D warping very accurately, which also affects the restoration results. On the other hand, the second category is less content dependent in a sense that they make use of the shape information that can be either captured di-
rectly using special setups [4, 5] or reconstructed based on 2D images [6, 7]. The disadvantage of using special setups is on the complexity or the high cost of the equipment, while the reconstruction-based approaches are often restricted to a certain type of surfaces such as cylindrical. In this paper, we propose a de-warping approach based on 3D shape recovery and physically-based deformable model restoration. We first reconstruct the 3D surface shape using the shading variations extracted from the initial 2D image. We then create a triangular mesh from the recovered surface and model it as a particle system. Next, the deformed shape is restored by pushing down the particles on the 3D mesh to a plane through a numerical integration process. As a result, the texture associated with the original deformed surface is restored accordingly after the flattening process. To evaluate the proposed approach, we conducted experiments on both synthetic and real images. The reconstructed surfaces from synthetic images are compared with those produced by an existing method and show a better performance. The reconstructed surfaces from real document images are also compared with the actual surface shapes captured using a 3D scanner and is fairly similar. Moreover, comparisons of the restored document images using both the range-scanned surface and the reconstructed surface are shown. OCR improvements on word/character precision and recall are reported.
2. Shading Extraction In order to apply the SFS technique to recover the surface shape, we need to first extract the shading image. Here we use an adaptive thresholding [9] and interpolation method. First, we try to identify the text and graphics regions in the image through an adaptive thresholding technique and fill them with white color. The intensity gradients in the nonwhite regions of the resulted image are mostly caused by pure shading variations with no reflectance changes due to different colors. Next, by interpolating the filtered white regions using the neighboring non-white pixels and applying some smoothing operators, we obtain a smooth shading image that can be used in the SFS phase. Examples of the shading images obtained from some input warped images are shown in Figure 2.
3. Surface Reconstruction Using SFS SFS is a classic computer vision problem originated by Horn [10], which has been continuously studied over the years with methods based on variational minimization [11], control theory, local analysis, etc [12]. Here, we make use of a modified perspective SFS technique based on the fast marching method described by Kimmel and Sethian [13]
(a)
(b)
(c)
(d)
Figure 2. (a)(c) Initial warped images ; (b)(d) Shading images extracted.
and Tankus et al. [14]. Based on the paper [15] by Prados and Faugeras, the SFS formulation considering distance attenuation can be a well-posed problem and the PDE has unique viscosity solution.
3.1. Problem Formulation Based on our document imaging system shown in Figure 3, some of the notations are described as follows. The surface shape is represented using the depth value z(x, y) of a set of surface points (x, y, z(x, y)). The projection of the surface point (x, y, z(x, y)) is denoted by (u, v) in the image plane Ω. By definition, z(x, y) ≡ z(u, v). The real-world coordinates of the point (u, v) in the 2D image is defined as (u , v , f ), where f is the focal length of the camera and u = u − u0 , v = v − v0 . The focal length f and the principle point coordinates (u0 , v0 ) are obtained through a simple camera calibration process [16]. Let , the document surface can be represented zˆ(u, v) = z(u,v) f using a similar scheme as presented in [17]: S = {ˆ z (u, v)· (u , v , f ), (u, v) ∈ Ω}
(1)
Assuming the illumination is a known close point light source located at (α, β, γ), the unit vector of the illumination direction at each surface point x can be written as: L(x) = (α− zˆ(u, v)· u , β− zˆ(u, v)· v , γ − zˆ(u, v)· f ) (2) The surface normal at each point x can be derived as: N (x) = (f p, f q, −1 − u p − v q)
(3)
∂lnz where p = ∂lnz ∂u and q = ∂v . Based on Eq. (2) and (3) with the assumption of Lambertian reflection, we derive the image irradiance equation
where A1 = a2 I 2 (f 2 + u2 ), B1 = a2 I 2 (f 2 + v 2 ) A2 = (αf − γu )2 /r 4 , B2 = (βf − γv )2 /r 4 C = a2 I 2 u v − (αf − γu )(βf − γv )/r 4 2 2
z f − γ)(αf − γu )/r D = a I u − (ˆ
(8)
4
E = a2 I 2 v − (ˆ z f − γ)(βf − γv )/r 4 F1 = a2 I 2 − (ˆ z f − γ)2 /r 4
Figure 3. Imaging system and the SFS formulation.
To solve Eq. (7), an iterative scheme can be used by first estimating the value of F in Eq. (6) using the values of p, q and zˆ in the (n − 1)th iteration and then computing the new values of zˆ for the nth iteration using the numerical approximation scheme given in [14]. approximations for p and Thez(i,j−1) z ˆ(i,j+1)−ˆ z (i,j) ,− ,0 q are given as: p ≡ max zˆ(i,j)−ˆ ∆u ∆u and q ≡ max
based on Lambert’s cosine law as: I(u, v) =
(αf − γu )p + (βf − γv )q + zˆf − γ a· r2 · N (x)· L(x)
(4)
where r = (α − zˆu )2 + (β − zˆv )2 + (γ − zˆf )2 is the distance from the point light source to the surface point and a is a constant that accounts for the distance attenuation of the close point light source.
3.2. Equation Solving Using Fast Marching To solve the PDE in Eq. (4), we refer to the fast marching algorithm presented by Tankus et al. [14], which uses an iterative scheme to solve the perspective SFS problem with a distant point light source. In contrast, our problem here requires a close point light source for which the distance attenuation needs to be considered. Based on [13], the image irradiance equation to the orthographic case with a vertical illumination L = (0, 0, −1) can be written as an Eikonal equation:
p2 + q 2 = 1/I(u, v)2 − 1
(5)
Similarly, our formulation of the image irradiance equation Eq. (4) can be simplified into the form: p2 A1 + q 2 B1 = F
(6)
where A1 and B1 are non-negative and independent of p, q and zˆ, while F depends on all p, q and zˆ. More specifically, following [14], Eq. (4) can be transformed into: p2 A1 + q 2 B1 = p2 A2 + q 2 B2 − 2pqC − 2pD − 2qE − F1 (7)
zˆ(i,j)−ˆz(i−1,j)
,−
z ˆ(i+1,j)−ˆ z (i,j) ,0 ∆v
. As shown by Rouy and Tourin [18], this numerical approximation gives the viscosity solution for the orthographic SFS problem. Similarly, the solution for Eq. (7) can be derived as: ∆v
zˆ + = zˆ + n 1
zˆn+1
Fn , A1
n Fn , 2 B1 n n z ˆ1 A1 +ˆ z2 B1 +∆F A1 +B1
if zˆ2n − zˆ1n > if zˆ1n − zˆ2n >
Fn A1 Fn B1
(9)
, else
where zˆn+1 denotes the depth value at the (n + 1)th it eration, ∆F = (A1 + B1 )F n − A1 B1 (ˆz1n − zˆ2n )2 and zˆ1 = min{ˆ zi,j−1 , zˆi,j+1 }, zˆ2 = min{ˆ zi−1,j , zˆi+1,j }. To initialize the iterative process, we use the depth value obtained through the orthographic SFS problem with an oblique light source. The adaptation from vertical light source to oblique light source can be considered as transforming the surface from the original coordinate system to the light source coordinate system with the light source located at (0, 0, −1). Once the surface is reconstructed in the light source coordinate system, it is transformed back to the original coordinate system accordingly.
4. Physically-based Restoration Once a set of 3D surface points are obtained, we can easily construct a triangular mesh to represent the shape of the warped surface. Meanwhile, the initial 2D image can be registered to the 3D mesh through a texture mapping process. Next, we make use of a physically-based modeling approach to restore the warped surface. Physicallybased modeling [19] is often applied in computer graphics applications to simulate dynamic deformations of physical materials such as cloth and garment, by applying forces to change their geometric structure in a simulated environment. In contrast, here we start with a deformed mesh and
try to restore its planar state by reversing the deformation steps. The de-warping process should obey a set of material constraints to simulate the physical process of flattening. We model the 3D mesh as a particle system that is controlled by the second order Newtonian equation, f = ma, where f is the force, m is the mass of a particle and a is the acceleration. Considering the document as a rigid object, the 3D shape deformation follows the property that any change in shape preserves the distance between all points on the surface. To model this, we simulate each edge on the mesh as a stick. The stick can be considered as an infinitely stiff spring so that it will restore instantly to its original rest length from its deformed state. We simulate the stick by constraining a fixed Euclidean distance between two particles connected by an edge. The fixed distance is defined as the initial distance between any pair of particles on the deformed mesh. During the simulation, the particles will be pushed or pulled away to maintain this distance. The numerical simulation is done using Verlet Integration [20], which achieves better numerical stability and efficiency than Euler Integration. Figure 4 shows six frames of a typical flattening process. For details of the method, refer to our earlier paper [21].
(a)
(c)
(e)
(b)
(d)
(f)
Figure 4. Six frames of the flattening simulation.
5. Experimental Results We have tested the proposed approach on both synthetic shading images and real document images. The experiments on synthetic shading images show that the modified SFS algorithm is able to reconstruct various warped surfaces effectively. In addition, real images taken from pages in conference proceedings are tested and the restored images are compared with those obtained from range-scanned surface shapes. The results are encouraging.
5.1. Experiments on Synthetic Images To generate a synthetic shading image, we first create a warped surface with the function: zˆ(x, y) = 3x3 + 100. This gives us a surface as shown in Figure 5(a).The shading image is generated using Eq. (4) with the light source given as L = (1, 0, −1), the focal length set as f = 200 and a = 1. The camera is assumed to be on top of the center of the surface. The reconstructed surface using our method and its contrast with the original surface demonstrate a close reconstruction. Because of a fast convergence rate, here we show the result after 2 iterations.
(a)
(b)
(c)
(d)
Figure 5. (a) Synthetic surface:zˆ(x, y) = 3x3 + 100; (b) Shading image under perspective projection with L = (1, 0, −1); (c) Reconstructed surface from (b); (d) Real surface (green) vs. reconstructed surface (red)
We also compare our results with those presented by Tankus et al. in [8] on perspective reconstruction of the sur face: zˆ(x, y) = 2cos( x2 + (y − 2)2 + 100. Figure 6(a) and (b) show the original surface and the corresponding shading image produced under perspective projection with a vertical light source L = (0, 0, −1). Figure 6(c) and (d) show the contrast of our reconstructed surface over the original surface and that presented by Tankus et al. As we can see, our method produces a better recovery of the original surface with less abrupt errors.
5.2. Experiments on Real Images First, warped document images are taken using a normal digital camera Olympus C7070 (f = 518). These initial images are pre-processed to obtain the shading image using the method described in Section 2. Here we assume no
(a)
(c)
(a)
(b)
(c)
(d)
(b)
(d)
Figure 7. (a) Initial warped page image; (b) Extracted shading image; (c) Reconstructed surface (triangular mesh); (d) Restored image.
Figure 6. (a) Original surface; (b) Shading image under perspective projection with L = (0, 0, −1); (c) Recovered surface (red) vs. Original surface (green) by our method; (d) Recovered surface (red) vs. Original surface (green) by Tankus et al.
shadowing or inter-reflection, and the paper is thick enough to avoid the show-through effects from its backside contents. Proper gamma correction is also applied to the shading image when necessary. Next, the SFS technique is applied to the shading image to reconstruct the surface shape. We first use an orthographic fast marching SFS algorithm to provide an initial estimation of the surface depth. Based on our observation, many close light sources such as flash light on cameras exhibit distance attenuation. Therefore, we fix the distance between the camera and the document surface as 40 cm. This is used in the iterative perspective SFS computation. Once the shape is recovered, we then construct a triangular mesh based on the set of 3D surface points. Finally, the 3D mesh is modelled as a particle system and flattened through a numerical simulation process. Figure 7 shows an example of restoring a page image captured from a thick bound conference proceeding (1280 × 960). Due to the unremoved shading, the restored image may still appear warped. One future work is thus to remove the shading distortion to produce better visualization results. To show how well the surface shape is recovered using the SFS technique, we compare it with the real shape captured using a 3D scanner. The 2D image used to recover the shape is captured using the 3D scanner for comparison purposes, which has a relatively low resolution of 640 × 480. In practice, normal digital cameras can capture images of much higher resolution. Therefore, the restored images will be better for OCR. This is also one of the advantages of our proposed method, which can recover shapes from higher resolution images without being limited by the low resolution 3D scanners. Figure 8(b) and (c) show a comparison of
(a)
(b)
(c)
(d)
(e) Figure 8. (a) Initial warped image; (b) Mesh captured using 3D scanner; (c) Mesh reconstructed using our SFS technique; (d) Restored image using range-scanned mesh; (e) Restored image using reconstructed mesh.
the range-scanned mesh and the reconstructed mesh for the image in Figure 8(a). As we can see, the reconstructed mesh is quite close to the range-scanned mesh although it does not re-produce the actual depth in the spine area. This is because the dark shading at the steep spine region provides too little variation for an accurate shape reconstruction. However, such areas usually do not contain text information and thus are not significant to our problem. Finally, Figure 8(d) and (e) show the restored images based on real range-scanned shape and our reconstructed shape, respectively. We can see that the results based on reconstructed shape are fairly satisfactory. The OCR results
Table 1. Comparison of OCR results on original image and restored images in Figure 8.
Images Figure 8(a) Figure 8(d) Figure 8(e)
Pw 74.6% 86.1% 85.3%
Rw 88.4% 95.7% 94.6%
Pc 83.4% 91.2% 89.9%
Rc 88.6% 97.4% 96.8%
on both the original image and the restored images provide the verification as shown in Table 1, where Pw , Rw , Pc and Rc denote word precision, word recall, character precision and character recall, respectively. As we can see that the precisions and recalls are clearly improved on restored images comparing to the original image, while the performance on the two restored images are similar.
6. Summary and Conclusions In this paper, we present a framework for restoring warped document images captured using normal handheld cameras. We first use a perspective SFS method to recover the surface shape based on the shading image extracted from the original 2D image. Next, we construct a triangular mesh from the recovered shape and represent it using a particle-based deformable model. Finally, the reconstructed mesh with the corresponding texture mapping is flattened to produce the restored 2D image. Experiments on both synthetic and real images show encouraging results. In general, this framework tries to perform restoration based on a single 2D image without the need of any special 3D setups. High resolution images can thus be used for future OCR purposes without being constrained by the low resolution 3D scanners regardless of their high economic cost. Moreover, it does not depend on any content information as many other 2D-based approaches do. With this framework, we would be able to produce better pre-processed images to facilitate further document analysis tasks such as segmentation, script identification, document retrieval, etc. Some future work will be to handle inter-reflections, remove shading distortion and estimate the lighting direction if possible.
Acknowledgments This research is supported in part by National University of Singapore URC grant R252-000-202-112 and Agency for Science, Technology and Research(A*STAR) grant 042101-0085.
References [1] H. Baird. “Document image defect models and their uses.” Int. Conf. on Document Analysis and Recognition, pages
730–734, 1993. [2] Y. Y. Tang and C. Y. Suen. “Image transformation approach to nonlinear shape restoration.”IEEE Trans. on Systems, Man, and Cybernetics, 23(1):155–171, 1993. [3] Y. Weng and Q. Zhu. “Nonlinear shape restoration for document images.”IEEE Int. Conf. on Computer Vision and Pattern Recognition, pages 568–573, 1996. [4] M. Pilu. “Undoing page curl distortion using applicable surfaces.” IEEE Int. Conf. on Computer Vision and Pattern Recognition, 1:67–72, 2001. [5] M. S. Brown and W. B. Seales. “Image restoration of arbitrarily warped documents.” IEEE Trans. on Pattern Analysis and Machine Intelligence, 26(10):1295–1306, 2004. [6] H. Cao, X. Ding, and C. Liu. “A cylindrical model to rectify the bound document image.”Int. Conf. on Computer Vision, 2:228–233, 2003. [7] T. Wada, H. Ukida, and T. Matsuyama, “Shape from shading with interreflections under a proximal light source: distortion-free copying of an unfolded book,” Int. Journal of Computer Vision, vol. 24, no. 2, pp. 125–135, 1997. [8] A. Tankus, N. Sochen, and Y. Yeshurun. “A new perspective on shape-from-shading.”IEEE Int. Conf. on Computer Vision, 2:862–869, 2003. [9] B. Fisher, S. Perkins, A. Walker, and E. Wolfart. “Adaptive thresholding.” In Hypermedia Image Processing Reference. Department of Artificial Intelligence, University of Edinburgh, 1994. [10] B. K. P. Horn, “Obtaining shape from shading information,” in The Psychology of Computer Vision (P.H. Winston). New York: McGraw-Hill, 1975. [11] K. Ikeuchi and B. K. P. Horn, “Numerical shape from shading and occluding boundaries,” Artificial Intelligence, no. 17, pp. 141–184, 1981. [12] R. Zhang, P. S. Tsai, J. E. Cryer, and M. Shah, “Shape from shading: A survey,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 8, no. 21, pp. 690–706, 1999. [13] R. Kimmel and J. A. Sethian. “Optimal algorithm for shape from shading and path planning.” Journal of Mathematical and Imaging and Vision, 14(3):237–244, 2001. [14] A. Tankus, N. Sochen, and Y. Yeshurun. “Perspective shapefrom-shading by fast marching.”IEEE Int. Conf. on Computer Vision and Pattern Recognition, 1:43–49, 2004. [15] E. Prados and O. Faugeras. “Shape from shading: a wellposed problem?” IEEE Int. Conf. on Computer Vision and Pattern Recognition, 2:870–877, 2005. [16] Z. Y. Zhang. “Flexible camera calibration by viewing a plane from unknown orientations.” IEEE Int. Conf. on Computer Vision, pages 666–673, 1999. [17] E. Prados and O. Faugeras. “Perspective shape from shading and viscosity solutions.” IEEE Int. Conf. on Computer Vision, 2:826–831, 2003. [18] R. Kimmel and J. A. Sethian. “A viscosity solutions approach to shape from shading.”SIAM Journal on Numerical Analysis, 29(3):867–884, 1992. [19] D. Baraff. “Rigid body simulation in an introduction to physically-based modeling.” SIGGRAPH, 1994. [20] W. H. Press, B. P. Fannery, S. A. Teukolsky, and W. T. Verrerling, Numerical Recipes: The Art of Scientific Computing. Cambridge, UK: Cambridge University Press, 1986. [21] K. B. Chua, L. Zhang, Y. Zhang, and C. L. Tan. A fast and stable approach for restoration of warped document images. Int. Conf. on Document Analysis and Recognition, pages 384–388, 2005.