A Neural Network Scheme for Transparent Surface ... - CiteSeerX

Report 0 Downloads 22 Views
A Neural Network Scheme for Transparent Surface Modelling∗ Mohamad Ivan Fanany† Imaging Science and Engineering Lab. Tokyo Institute of Technology

Itsuo Kumazawa‡ Imaging Science and Engineering Lab. Tokyo Institute of Technology

Abstract This paper presents a new neural network (NN) scheme for recovering three dimensional (3D) transparent surface. We view the transparent surface modeling, not as a separate problem, but as an extension of opaque surface modeling. The main insight of this work is we simulate transparency not only for generating visually realistic images, but for recovering the object shape. We construct a formulation of transparent surface modeling using ray tracing framework into our NN. We compared this ray tracing method, with a texture mapping method that simultaneously map the silhouette images and smooth shaded images (obtained form our NN), and textured images (obtained from the teacher image) to an initial 3D model. By minimizing the images error between the output images of our NN and the teacher images, observed in multiple views, we refine vertices position of the initial 3D model. We show that our NN can refine the initial 3D model obtained by polarization images and converge into more accurate surface. CR Categories: I.3.5 [Computational Geometry and Object Modeling]: Physically based modeling—Algorithm; Keywords: transparent surface, 3D modeling, neural network

1

Introduction

Transparency is ubiquitous and attractive natural phenomena to be modelled and rendered. This is because there are many potential advantages in using transparency to simultaneously depict multiple superimposed layers of information. In addition, the driving applications for modeling transparent surface are immense, ranging from computer-aided manufacturing, 3D object recognition, and 3D object modeling. Recently, medical radiation therapy for control or cure of cancer also take advantages from modeling transparent 3D distribution of radiation dose. Such transparent surface modeling enables physician to easily locate cancerous tissue from normal tissue to avoid complications after radiation [Interrante et al. 1997]. Despite recent advances in opaque surface modeling, transparent surface modeling relatively has not received much attention. Unfortunately, many successful methods aimed for opaque surface fail to deal with transparent surface. This is because the perception of transparent surface is a hard vision problem. The transparent surface lacks of body reflection and has only little surface reflection. In ∗ This work is supported by The Japan Society for Promotion of Science. † e-mail:

[email protected]

‡ e-mail:[email protected] § e-mail:[email protected]

Copyright © 2005 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected]. © 2005 ACM 1-59593-201-1/05/0010 $5.00

433

Kiichi Kobayashi§ Advanced Contents Creation Lab. NHK Engineering Service Inc.

addition, it suffered much from inter-reflection [Saito et al. 1999]. The scene of transparent surfaces, even the very simple and regular ones, lacks of naturally-occurring shape. The only potential sources of surface shape information are specular highlights, environmental reflections, and refractive distortion. Whereas the surface depth information is almost completely unavailable [Nakayama et al. 1990]. Only recently, some prospective techniques for modeling transparent surface have emerged. We categorize these techniques into two groups as follows. The first group elaborates as much the surface related features as possible to explicitly define its shape and pose. For examples, [Murase 1992] recovers the shape of water pool’s surface from the way it refracts the texture of the bottom of the pool; and [Hata et al. 1996] projects a light stripe onto transparent objects and recovered the surface shape of the object by using genetic algorithm. More recently, polarization gains more popularity in dealing with transparent surface due to its effectiveness and simplicity in detecting and measuring specular or surface reflection [Wolff 1987]. But polarization based techniques do face two difficult problems, i.e., lack of surface reflection problem and ambiguity problem. The first problem is addressed by introducing a whole-surface lighting (photometric sampler) [Saito et al. 1999] or edge lighting (multiple parallel linear lighting) [Fanany et al. 2004] for rotationally symmetric object. While the second problem is addressed by introducing additional information such as thermal radiation [Miyazaki et al. 2002], new view image [Miyazaki et al. 2003], trinocular stereo [Rahmann and Canterakis 2001], or object’s symmetry [Fanany et al. 2004]. The second group elaborates some ways to synthesize a realistic image of transparent object without using any 3D shape information of the surface. For examples, [Zongker et al. 1999] and [Chuang et al. 2000] propose a method called environment matting for capturing the optical behavior of transparent surface from known and controlled background for rendering and compositing purposes. [Wexler et al. 2002] extends this idea to obtain the environment matting model from uncontrolled backgrounds. [Matusik et al. 2002] use environment matte obtained from multiple viewpoints to create novel views by interpolation. [Szeliski et al. 2000] separates overlapped image of glass plates into two images, one is the reflected images, and the other is transmitted images. The first group techniques rely heavily on real images and especially aimed for accurate 3D modeling. Whereas the second group techniques rely heavily on graphical synthesized images and especially aimed for realistic 2D representation. We believe that the ability to represent realistic synthetic images is beneficial, not only visually, but also for understanding the 3D shape. In this paper, we pursue an integrated framework which enables the use of synthesized graphical images and also real images to infer the 3D shape of transparent surfaces. We view it as an extension to our previous works on neural network for opaque surface modelling [Fanany and Kumazawa 2004b; Fanany and Kumazawa 2004a]. The neural network provides an analytic mapping between vertices position of a 3D model and the pixel values inside the projection images of this model. By minimizing the error between the synthesized projection images of this NN and the teacher images observed in multiple views, we can analytically refine the vertices position of the initial 3D model using error back-propagation learning. The main contribution of this paper is a new analytic relation between vertices

position of the 3D model with the pixel value inside the projection image of the model to represent a transparent surface. The organization of this paper is as follows. In Section 2, we explain the analytic mapping between 3D polyhedron vertices to the pixel value of synthesized transparent surface images based on ray tracing. In Section 3, we explain the experimental setup used to acquire the teacher images. In Section 4, we explain the initial 3D shape creation. In Section 5, we explain how the texture mapping can be used instead of ray tracing to generate synthesized transparent images. In Section 6, we present some experimental results. Finally, in Section 7, we conclude this paper with discussions and future plans.

2

2.2

Transparent Surface Mapping and Learning

If the triangle is transparent, the changes in vertices position will also give rise to different surface normal N, which in turn will give rise to different pixel value F(x, y) due to reflection R and transmission T of the light ray in that pixel. We may write this relation as: {V0 , V1 , V2 } ⇐⇒ N ⇐⇒ R + T ⇐⇒ F(x, y).

(3)

R = u − (2u · N)N

(4)

T=

Problem Formulation

ηi ηi u − (cosθr − cosθi )N ηr ηr

(5)

s In this section, we set a relation between 3D vertices of a triangle Vk (k = 0, 1, 2) with the value of a pixel f (x, y) inside projection image of this triangle. In computer graphics, this relation is called as rendering problem. But here, our genuine interest is not only to render the triangle but to actually ’learn’ (modify) the triangle’s vertices based on the pixel value error of its projection image compared to a given teacher image. For that purpose, we devise an analytic relation between these two variables so that the vertices position could be learned through error back-propagation learning. In our framework, the rendering problem is actually the forward mapping process, that should be followed by back-propagation learning.

2.1

Opaque Surface Mapping and Learning

If the triangle is opaque, the changes in vertices position will give rise to different surface normal N, which in turn will give rise to different pixel value F(x, y) due to a given light source pointing to L direction and ambient/diffuse light A spread inside the scene. We may write this relation as: {V0 , V1 , V2 } ⇐⇒ N ⇐⇒ ρλ (N · L) + A ⇐⇒ F(x, y).

(1)

where ρ is surface reflectance and λ is intensity of the illuminant. In the forward mapping process, first we give the triangle vertices position Vk into our NN. Then this NN will use three sigmoid gates which mimic AND gate functions to specify whether the pixel under observation is inside the triangle. If it is inside then the NN will assign a value of another sigmoid unit placed at its output, i.e., f (x, y), as the value of that pixel. If the sigmoid gain is set sufficiently high, it will produce near flat intensity surface. The f (x, y) is then superimposed by ρλ (N · L) + A to give F(x, y). A smooth shaded representation (Gouraud shading) of F(x, y), i.e., S(x, y), can be added to give more flexibility and stability during learning [Fanany et al. 2002]. In the backward learning process, we measure the error E = (F(x, y) − G(x, y))2 + (S(x, y) − G(x, y))2 , where G(x, y) is the pixel value of teacher image, to be back propagated for updating Vk as m−1 −ς Vm k = Vk

∂E + µ 4Vm−1 k ∂ Vk

(k = 0, 1, 3),

(2)

where ς is learning rate and µ is momentum constant. Complete derivation of ∂ E / ∂ Vk can be found in [Fanany and Kumazawa 2004b; Fanany and Kumazawa 2004a].

434

cosθr =

1−

ηi 2 (1 − cos2 θi ) ηr

(6)

where u is incoming ray direction as viewed from the center of camˆ and θr = −T ˆ · N, ˆ and ηi and ηr are respectively era, θi = −uˆ · N the refraction index of incident and refracting material [Hearn and Baker 1998]. In the forward mapping process, until f (x, y) is produced the process is the same as in the opaque surface mapping. However, currently we map the R + T instead of (N · L) + A to give F(x, y). In the backward learning process, again we measure the error E = ((R(x, y) + T(x, y)) − G(x, y))2 , where G(x, y) is the pixel value of teacher image, to be back propagated for updating Vk using Equation 2. But this time when we compute ∂ E / ∂ Vk we have to include ∂ E / ∂ (R + T).

3

Images Acquisition

For supplying teacher images, we acquired nine images of coca cola bottle filled with water and viewed from different view points using a high resolution (HDTV) camera. The changes of view point includes horizontal rotations of the object which is put on a turn table (−30◦ , 0◦ , +30◦ ), and the vertical rotation of camera (−10◦ , 0◦ , +10◦ ). We want to analyze the construction of a regular pattern such as checkerboard pattern on the front surface of transparent object if we put it behind that object. In each view point we take the image of the object with background and also the image of background only as shown in Figure. 1. In such setting, we have to pull the object image from its background [Smith and Blinn 1996]. For this purpose, we subtract the image from blue screen matte background for each view. There is some color spill, i.e., reflection of back-light on the foreground object at grazing angles due to Fresnel effect. We simply remove them manually. We do not need to develop a fully automatic process for this, such as done by [Matusik et al. 2002], because we have the 3D model that will give the opacity hull for any view. In [Matusik et al. 2002], the opacity hull construction should be done fully automatic for interpolating view images, due to the absence of a 3D model. As for the light source position, we used two point light sources at the left and the right of the camera. In illuminating the object we heuristically tried to reduce shadows and specular reflections. Such setting is aimed to view the background through the bottle as clearly as possible. The focus of camera was set to be in the middle of the object and the background, so we got a just focus for both. We also put several mark points using marker pen on the object to be used as correspondent feature points in camera calibration using self-calibration method [Faugeras et al. 1992].

Figure 1: A teacher image with background (a) and the background image (b).

Figure 2: Ray traced model images:(a) triangle, (b) icosahedron, (c) coca cola bottle, (d) Texture mapped model.

4

by pixel computation, the lose of surface characteristics that are not simulated such as self shadows. As for texture mapping, our method also has both merits and dismerits. The merits includes readily available pixel color to be mapped, well preserved surface characteristics as our eyes can see. The dismerits includes much unavoidable noises such as specularities.

Initial 3D Shape Creation

It is desirable to start the NN learning from a closed initial 3D shape model. Creating such model even with a help from an expert user or artist is difficult and inefficient. Fortunately, we previously worked on a method to reconstruct the same object using single view polarization images [Fanany et al. 2004]. This method selects accurate reference points near occluding boundary of rotationally symmetric object and rotate those reference points aimed to produce the shape of the object. The selection process is performed using an induction decision tree, which simultaneously uses object’s symmetry, brewster angle, and degree of polarization. This method works fine when sufficient degree of polarization is available. But according to our experiment, it is very difficult to deliver sufficient degree of polarization through all the boundary points, especially when the occluding boundary profile is complicated such as the profile of the coca-cola bottle. Some points in the boundary may escape from the light source and fail to give high specular reflection, i.e., sufficient degree of polarization.

5

Ray Tracing vs Texture Mapping

Ray tracing provides a framework for physically accurate rendering that can faithfully depict realistic visual scenes by simulating a wide range physical phenomena such as refraction, reflection, shadows, and polarization, of light ray [Wolff and Kurlander 1990]. However, basic ray tracing algorithm is limited to sharp reflections, sharp refraction, and sharp shadows [Cook et al. 1988]. Physically accurate rendering algorithms are difficult to implement and time-consuming to compute. On the other hand, texture mapping is originally intended more as a means of improving rendering efficiency than as a device for improving the comprehensibility of surface shape. It can be argued that the best of texture mapping methods serve both purposes well [Interrante et al. 1997]. In this paper, we consider both approaches to represent the generated 3D model which is immersed in teacher’s background image. As for ray tracing, our method has both merits and dismerits for its implementation. The merits includes practicality that we do not necessary compute ray and model intersections, because it was computed intrinsically by our NN. The dismerits includes impracticality to provide accurate light sources position, expensive pixel

435

For experimental comparison, we constructed a cheap ray tracing procedure to save computation time. This ray tracing only traced the light which is transmitted by the viewable surface as can be seen from the camera center. It also did not trace the light into some light sources, but only traced the light until intersect the background image. Such cheap ray tracing seems fine for simple shape such as triangle, but becomes more erroneous as the shape become complicated such as icosahedron or the initial 3D bottle shape as shown in Figure 2. These ray traced model images (Figure 2(a-c)) are compared to the the texture mapped model image 2 (d). Instead of tracing the light through the scene which is prohibitively expensive, we also investigate another method based on texture mapping. We simultaneously use flat shaded, smooth shaded, and texture images. We consider the flat shaded image as representing the silhouette shape, the smooth shaded representing the opaque shape, and the texture representing the transparent shape. The formulation is as follows. If we used only the texture, then in many places in the projection image, the error E = ((R(x, y) + T(x, y)) − G(x, y))2 would be zero. Because the (R(x, y) + T(x, y) is equal to G(x, y). To alleviate this problem, we include F(x, y) and S(x, y) in the error calculation. So it will be measured as E = (F(x, y) − G(x, y))2 + (S(x, y) − G(x, y))2 + ((G(x, y) + F(x, y) + S(x, y))/3 − G(x, y))2 . The last term serves as regularization factor. The F(x, y) is the flat shaded output, and S(x, y) is the smooth shaded output. Both are shown in Figure 5. Even though the underlying formulation is the ray tracing formulation, we believe that the texture mapping still in principle can also be used.

6

Results

We performed two experiments for the reconstruction of the cocacola bottle. First, we compared the ray tracing method and the texture mapping method (simultaneously use flat shaded, smooth shaded, and texture mapping). Second, we compared the result of using plain learning and Simulated Annealing (SA) [Fanany and

Figure 3: An image of flat shaded model (a) and an image of smooth shaded model (b).

Kumazawa 2003] in the ray tracing method. We set a fixed learning rate for the two experiments (1.0E-9). For the first experiment, we simulated the training for 500 epochs. The results after 0, 50, 100, and, 500 epochs, are shown in Figure 4. The error profiles of both methods are shown in Figure 5 and 6. For the second experiment, we also set the same learning rate 1.0E-9, and perform the simulated annealing (SA) by setting the initial temperature to 1.0E7 and the cooling rate to 0.99, and the momentum to 0.9. The error profile is shown in Figure 7. We also compute their computational time (for 100 epochs) and summarized the two experiments in Table 1. From the first experiment, we observed that ray tracing method give lower error and visually better result than texture mapping method but required more computation time. In the error profiles, we observed that ray tracing were also more stable learning compared to the texture mapping method. For the second experiment, we observed that by using annealing method we could improve the rate of convergence of the ray tracing method. It is important to be noted that since our system is a multiple view images system, the number of view images is necessary to obtain better overall reconstruction. Furthermore, learning in all views should be balanced. Over training with some specific views may cause under training for unlearned views. Ray Tracing

Texture Mapping

Ray Tracing (SA)

0.2097 31.74 s

0.4105 21.81 s

0.1884 31.92 s

Lowest relative error Computation time

ditional computation. In addition, the ray tracing method was also more stable than the texture mapping method. We believe that our method will further open ways for practical integration of computer vision and computer graphics through neural network learning. Especially, we hope that our method would be beneficial not only for synthesizing and rendering realistic transparent surfaces but also for modeling 3D transparent surfaces based on 2D images. To improve the results, further modification on the algorithm and increasing number of views are still necessary.

References

Table 1. Results summary.

7

Figure 4: The reconstruction results after (a) 0 epoch, (b) 50 epochs, (c) 100 epochs, and (d) 500 epochs. Upper figures are the ray tracing method results, while the lower figures are the texture mapping result.

C HUANG , Y.-Y., Z ONGKER , D. E., H INDORFF , J., C URLESS , B., S ALESIN , D., AND S ZELISKI , R. 2000. Environment matting extensions: towards higher accuracy and real-time capture. In SIGGRAPH, 121–130.

Conclusion

In this paper, we presented a new analytic mapping of 3D vertices position with the pixel value of synthesized transparent images. Such analytic mapping enables the modification or learning of the 3D vertices based on error of pixel value in the teacher images and the synthesized images. We formulated a ray tracing learning method into our NN. We also formulated a texture mapping learning method to be used in our NN instead of ray tracing method. The ray tracing method experimentally had given a lower error and better result compared to texture mapping method, but required ad-

436

C OOK , R. L., P ORTER , T., AND C ARPENTER , L. 1988. Distributed ray tracing. 139–147. FANANY, M. I., AND K UMAZAWA , I. 2003. SA-optimized multiple view smooth polyhedron representation nn. In Discovery Science, 306–310. FANANY, M. I., AND K UMAZAWA , I. 2004. Multiple-view shape extraction from shading as local regression by analytic nn scheme. Mathematical and Computer Modelling 40, 9-10, 959– 975.

Figure 5: The relative error plot of ray tracing method.

Figure 7: The relative error plot of ray tracing method using plain learning and simulated-annealing learning.

M IYAZAKI , D., S AITO , M., S ATO , Y., AND I KEUCHI , K. 2002. Determining surface orientations of transparent objects based on polarization degrees in visible and infrared wavelengths. JOSAA 19, 4 (April), 687–694. M IYAZAKI , D., K AGESAWA , M., AND I KEUCHI , K. 2003. Polarization-based transparent surface modeling from two views. In ICCV, 1381–1386. M URASE , H. 1992. Surface shape reconstruction of a nonrigid transport object using refraction and motion. IEEE Trans. Pattern Anal. Mach. Intell. 14, 10, 1045–1052. NAKAYAMA , K., S HIMOJO , S., AND R AMACHANDRAN , V. 1990. Transparency: Relation to depth, subjective contours, luminance and neon color spreading. Perception 19, 497-513, 497–513.

Figure 6: The relative error plot of texture mapping method.

FANANY, M. I., AND K UMAZAWA , I. 2004. A neural network for recovering 3d shape from erroneous and few depth maps of shaded images. Pattern Recognition Letters 25, 4, 377–389. FANANY, M. I., O HNO , M., AND K UMAZAWA , I. 2002. A Scheme for Reconstructing Face from Shading using Smooth Projected Polygon Representation NN. In Proc. of the IEEE ICIP (volume II), 305–308. FANANY, M. I., KOBAYASHI , K., AND K UMAZAWA , I. 2004. A combinatorial transparent surface modeling from polarization images. In IWCIA, 65–76. FAUGERAS , O. D., L UONG , Q.-T., AND M AYBANK , S. J. 1992. Camera self-calibration: Theory and experiments. In European Conference on Computer Vision, 321–334. H ATA , S., S AITOH , Y., K UMAMURA , S., AND K AIDA , K. 1996. Shape extraction of transparent object using genetic algorithm. In ICPR96, D93.6. H EARN , D., AND BAKER , M. 1998. Computer graphics: C version, prentice hall, upper saddle river. In NJ 98, 1997. I NTERRANTE , V., F UCHS , H., AND P IZER , S. 1997. Conveying the 3D shape of transparent surfaces via texture. Tech. Rep. TR97-27. M ATUSIK , W., P FISTER , H., Z IEGLER , R., N GAN , A., AND M C M ILLAN , L. 2002. Acquisition and rendering of transparent and refractive objects. In Rendering Techniques, 267–278.

437

R AHMANN , S., AND C ANTERAKIS , N. 2001. Reconstruction of specular surfaces using polarization imaging. In CVPR01, I:149– 155. S AITO , M., K ASHIWAGI , H., S ATO , Y., AND I KEUCHI , K. 1999. Measurement of surface orientations of transparent objects using polarization in highlight. In CVPR, 1381–. S MITH , A. R., AND B LINN , J. F. 1996. Blue screen matting. In SIGGRAPH ’96: Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, ACM Press, New York, NY, USA, 259–268. S ZELISKI , R., AVIDAN , S., AND A NANDAN , P. 2000. Layer extraction from multiple images containing reflections and transparency. In CVPR, 1246–. W EXLER , Y., F ITZGIBBON , A. W., AND Z ISSERMAN , A. 2002. Image-based environment matting. In Rendering Techniques, 279–290. W OLFF , L. B., AND K URLANDER , D. J. 1990. Ray tracing with polarization parameters. IEEE Comput. Graph. Appl. 10, 6, 44– 55. W OLFF , L. 1987. Shape from polarization images. In CVWS87, 79–85. Z ONGKER , D. E., W ERNER , D. M., C URLESS , B., AND S ALESIN , D. 1999. Environment matting and compositing. In SIGGRAPH, 205–214.