Closed-Form Attitude Determination Under Spectrally Varying Illumination Mark S. Drew Leonid L. Kontsevich School of Computing Science Smith–Kettlewell Eye Research Institute Simon Fraser University 2232 Webster St. Vancouver, B.C. Canada V5A 1S6 San Francisco, CA 94115 e-mail
[email protected] e-mail
[email protected] Centre for Systems Science/Laboratory for Computer and Communications Research Technical Report CSS/LCCR TR 94–02 Abstract
Interreflected light is known to account for a substantial amount of total illumination content in a closed environment. But often such light is colored, leading to a light environment at any point on a surface which changes spectrally depending on the orientation of the surface normal at that point. A similar effect occurs when several sources of light are present, e.g. artificial light combined with daylighting. For such area sources, provided the equivalent lighting matrix arises from lights that are linearly independent in space as well as in color space, the RGB color is linearly related to the surface normal in a Lambertian model. While surface normals lie on the Gaussian sphere, colors lie on an ellipsoid in color space. However, not all points see all lights. Therefore there are many outliers not on the ellipsoidal shell. A robust regression method can, however, reliably find the correct ellipsoid. Nothing need be known about the light environment, and surface normals can thus be recovered from colors. However, such an algorithm recovers normals only up to an overall orthogonal transformation. This rotation can nevertheless be recovered by imposing an integrability condition. Here, we show how to use a smoothness criterion to solve in closed form for the correct rotation and thus automatically determine object attitude. The method is applied to synthetic and real images. We show that a well-known test image was probably taken under spectrally varying illumination. c °1994 M.S. Drew and L.L. Kontsevich
1
1
Introduction
Consider a single, uniformly colored Lambertian surface illuminated by a set of distant lights that vary spectrally with orientation from the surface. If the lights were separated spectrally then the situation would be exactly equivalent to photometric stereo. However, for non-structured lights, with non-separated chromatic properties, the problem is different. For in this case the colors of the lights are important and not merely a convenience for separating what are effectively three graylevel images. Within the photometric stereo paradigm, Woodham et al. [1] pointed out that, given three graylevel images, it is still possible to recover surface orientation up to a rotation if the surface is assumed to be Lambertian even if the light positions are not known a priori. For because of the linear relationship between the surface normal and the 3-tuple of measured graylevels, the Gaussian sphere of normals is transformed into an ellipsoid. Petrov [2, 3, 4, 5] considered a collection of chromatic lights or extended sources with chromatic properties that create a light environment that varies spectrally with orientation. Such an environment arises in many situations. E.g., in an office with overhead lighting as well as daylighting, lights with quite different spectral characteristics impinge from different directions on each surface point. As well, interreflected light is an important contributor to the total intensity and such light is typically colored. Petrov first described the linear shading model set out below. In it, just as in photometric stereo the surface normal is related linearly to a 3-tuple, but in this case that vector is simply the color. His work is an extension to the case of three or more illumination sources of the work of Nikolaev [6, 7]. Nikolaev had proposed a color segmentation scheme based on the normal to a plane in color space spanned by colors formed by a pair of light sources reflected from a colored object (see [8]). For more than two lights, Petrov’s model decomposes an arbitrary illumination environment into three basic light sources. In [9], Kontsevich et al. put forward a segmentation scheme based on Petrov’s model, reporting work published earlier in Russian [10]. In this scheme, local regions are assumed to obey the linear model and are grown outward until a tolerance is reached for the current model parameters. The linear model says that color and surface normal are related by a linear transformation. Therefore if one knows the model parameters, surface orientation is recoverable from color. However, the recovery is only up to an unknown overall rotation, as in [1]. Ref. [9] provides a solution to this problem: by applying an integrability condition it is possible to recover the surface normals not only up to an overall unknown rotation, but also to calculate that rotation. In [11], Brill also considered the linear relationship between color and orientation for Lambertian pixels. Invertibility of the linear relationship is dependent on the transformation being of full rank, i.e., 3. In general, one may define the rank of a given surface region as the rank of the 3 × N set of RGB responses for the surface [11]. In [12] Brill pointed out that, if one somehow knew the values of the normals at three pixels, and used Petrov’s model, one could determine surface normals with no arbitrary rotation coming into play. Here, we adopt Brill’s definition of rank. However, it is important to realize that within the linear color model a problem arises from real, non-convex or convex surfaces: the rank may be 3 in several surface patches while the RGB’s still arise from different sets of lights in each patch. This is because each surface patch sees a different set of illuminants due to self-shadowing from some of the lights. I.e., if a patch is beyond the occluding boundary from a light, it no longer sees that light. If there were only three lights, then that partially hidden surface patch would have only rank 2. On the other hand, with a larger collection of lights the colors for a patch may still be a full-rank (i.e., rank 3) transformation away from the surface normals. The task of a recovery algorithm is to decide which rank-3 matrix is most appropriate. 2
In [13] Petrov and Kontsevich provide a complete classification of surface patches based on the rank of the illuminating lights and on the rank of the illuminated surface. In [14], Drew applied the statistical analysis of [1] to single color images. In [15, 16], Drew examined how the model behaves when specularities are added. The linear relationship between normal and color is broken for pixels with specular content. However, one can still recover orientation for non-specular pixels by finding the non-specular ellipsoid in color space by using a robust regression. At the same time, specular pixels are identified. In that work [16], two types of outliers to the ellipsoidal shell were identified: specularities are one type, and the other type come from pixels that see only a subset of the illuminating lights. In order to find that rank–3 ellipsoid corresponding to the greatest number of image points, it is most accurate to employ a robust regression technique. Here we use a Least Median of Squares (LMS) regression [17]. In situations in which at least half the data lie on the same color ellipsoid, the LMS method will reliably choose the correct surface, whereas the Least Squares (LS) method can easily fail, particularly when there are many high-intensity off-shell pixels [16]. The LMS method provides excellent resilience to noise and can be used for hypothesis testing on how well the model is obeyed and for constructing confidence limits on each coefficient in the regression. As well, the LMS method provides a non–ad hoc method for determining which pixels are outliers. Here we use the method to determine the best linear matrix relationship between normals and colors, and then apply robust outlier detection to effectively segment the image into those pixels that correspond, or do not correspond, to the most important rank–3 segment. Once surface orientation is known for a majority of surface points, a coupled depth/slope interpolation method can be used to derive a complete depth map. However, a problem with the color ellipsoid method is that it returns surface normals only up to a single overall rotation. This problem was addressed in principle in [9] by imposing an integrability condition to register recovered surface normals with the camera plane. But in that work, the correct rotation was approximated by an iterative scheme based on random steps in the 3–dimensional space of all possible rotations. The main aim of the present paper is to provide an approximation based on closed-form methods. Determination of the rotation amounts to determination of attitude for the imaged object. Here, recovery of the correct rotation is carried out in two steps. In the first step, the very complex integrability condition is replaced by a smoothness criterion. Optimization with respect to this smoothness criterion results in Euler equations that take the form of an eigenvector problem and are thus easily solved in closed form. This first step reliably finds that rotation direction that maximizes the components of all the surface normals in the direction toward the camera. Once this direction is known, the full integrability condition can be optimized simply by solving for the angle giving the best rotation in the plane perpendicular to that direction. In Section 2 the linear relationship between normals and colors for Lambertian surfaces under orthography is recapitulated. We show how mutual illumination can be responsible for creating a spectrally varying illumination environment. In Section 3 we set out the optimization problem for determining a closed-form solution for determining attitude. Section 4 investigates how well the method does for both synthetic and real images. As a test that can be compared to other depth-recovery methods, we consider a standard test image in computer vision, taken from a commercial photograph. We show that, with high probability, this image was actually taken under spectrally varying illumination; to some degree, this finding calls into question previous results in shape-from-shading that make use of this test image as if it had been taken under a single illuminant. Section 5 sets out conclusions and possible extensions. 3
2
Linear Shading Model
Consider the RGB camera responses ρ for a given pixel. This color vector arises from the filtering effect of three camera system sensor response functions Q (λ) on the incoming color signal, the product of illumination E(λ) and surface spectral reflectance S(λ). In a Lambertian model, for a uniform surface under distant lighting and distant viewing the shading simply equals the cosine of the angle between the (normalized) light source direction a and the (normalized) surface normal n , which is a T n (where T means transpose). This leads to a color vector Z ρ = E(λ) S(λ) Q (λ) (a T n ) dλ (1) Now consider an area source such that the light spectrum varies with direction from the surface. E.g., an ordinary luminaire can have spectral characteristics that vary across the glass globe [18], or mutual illumination can provide such a lighting environment [19, 20]. Let us approximate an area source by a collection of L discrete sources. To maintain the linear relationship between n and ρ in eqn.(1), lighting conditions have to be such that the surface is effectively illuminated by a distant hemisphere of illumination. In this case, we must replace (1) by a sum: L Z X Ei (λ) S(λ) Q (λ) (a Ti n ) dλ (2) ρ = i=1
Each illuminant produces a different color when reflected from the surface. From (2), this color is Z bi ≡ Ei (λ) S(λ) Q (λ) dλ .
(3)
Then we can rewrite eqn.(2) as ρ = B A n ,
(4)
where we stack all the directions a i row–wise into a matrix A and group the strength–direction color space vectors b i column–wise into a matrix B : A = (a 1 , a 2 , . . a L )
T
, B = (b 1 , b 2 , . . b L ) .
(5)
Here, B is a 3 × L matrix and A is L × 3. Let F ≡ B A , so that (4) becomes simply ρ = F n .
(6)
This is a linear model relating surface normal to color. The model breaks down when not every illuminant can be seen from a surface area. This is the case when the surface is in shadow or is self-shadowed. To solve this equation for the variable of interest, the surface normal n , the matrix F must be invertible. For this to be the case, neither matrix B nor matrix A can be rank-reduced. I.e., neither can have rank less than 3. This means that all the light source directions A must not be coplanar (or, as a special case, collinear) in space; and similarly the collection of reflected colors B must be 3-dimensional in color space. If the matrix F is invertible, the surface normal n is linearly related to RGB color ρ . However, since eqn.(6) involves a 3 × 3 matrix multiplication, one can only expect to recover the matrix F and the vectors n up to an overall orthogonal transformation R , since such a transformation could be inserted after F and the inverse R T before n . To determine the rotation, additional knowledge must be injected into the model. 4
2.1
Deriving rotated normals
Let the inverse of F be denoted G . Then n = Gρ .
(7)
1 ≡ n Tn = ρ TG TG ρ ≡ ρ TC ρ .
(8)
Since n is unit length, we have
Since C is G T G , it is 3 × 3 and symmetric positive definite. Therefore it consists of 6 independent elements. If we can find C we have determined the 9 elements of F up to the 3 degrees of freedom corresponding to an unknown orthogonal transformation. The quadratic form (8) comprises an ellipsoid centered on the origin, fixed by 6 parameters; it can be written c11 ρ21 + c22 ρ22 + c33 ρ23 + 2c12 ρ1 ρ2 + 2c13 ρ1 ρ3 + 2c23 ρ2 ρ3 = 1 (9) in terms of the measured values of ρ = (ρ1 , ρ2 , ρ3 )T for each pixel. Since we have N such equations, one for each pixel, a regression can be used to solve for the matrix entries cij . We cast (9) into the form of a standard linear matrix equation ¡by collecting observations in an N¢ × 6 matrix M , if there are N pixels. For each pixel, let M have columns ρ21 , ρ22 , ρ23 , 2ρ1 ρ2 , 2ρ1 ρ3 , 2ρ2 ρ3 . Denote by z the best approximation to the matrix elements cij , where z is a 6–component object z = (c11 , c22 , c33 , c12 , c13 , c23 )T . Then eqn.(9) reads M z ≃ 1 (10) N ×6 6×1 N ×1. We find the best hyperplane z by using the robust LMS regression. Once one has z and hence C the matrix G is determined up to a rotation. Any root of C will do for G , therefore, and here we use the polar decomposition method of [21], which is based on the Singular Value Decomposition (SVD).
2.2
Color space ellipsoid; outlier detection
Consider Fig.1(a), a laser range finder produced depth map of a plaster bust of Mozart. 1 It is straightforward to shade this depth data using a Lambertian model. Let us synthesize a color image by placing several accurate natural lights around the surface. Here, we used the surface reflectance function of “light skin” [22], and used standard illuminants A, B, C, D65, and a fluorescent tube as lighting sources [23]. These represent incandescent and fluorescent lights, and several phases of sunlight and skylight. For sensor response curves we used those for a Sony DXC151 camera with infrared filter. The result is shown in Fig.1(b). Here, we did not include self-shadowing arising from a surface location having its line of sight to a light blocked by the surface itself. Instead, we simply used the inner products of surface normals with each light direction to obtain a shading factor, and set negative shading to zero. Plotting the resulting RGB values in color space, one finds that the preponderance of pixels do indeed lie on an ellipsoid in color space. Fig.1(c) shows the ellipsoidal shell found by the LMS regression. 1 The laser range data for the bust of Mozart is due to Fridtjof Stein of the USC Institute for Robotics and Intelligent Systems.
5
The method of [17] for outlier detection is as follows: consider the robust dispersion estimate r med 5 s0 = f 1 + i ri2 N −6
where ri is the residual r for the ith case, r = ρ T C ρ − 1, and, in terms of the probability function Φ, f = 1/Φ−1 (0.75) ≃ 1.4826. Then an RGB point is accepted as corresponding to the model if |ri /s0 | ≤ 2.5; else the point is an outlier and is rejected. Typically, a Reweighted Least Squares regression is then carried out, using only the accepted points. This allows confidence limits to be established on coefficients using standard techniques. Outliers arise because many surface points are outside the occluding boundary for a particular light, and thus have negative shading (set to zero). These points do not lie on the ellipsoid. Because they see less light, these points tend to have small RGB values. The method works best, then, when most points see most lights. This situation would obtain when lights are not close to the horizon and the surface is fairly flat, so that there is less chance of a surface normal being perpendicular to a light.
2.3
Rotation matrix: integrability condition
ˆ of G = F −1 . However, because of the unknown rotation we recover rotated So far we have an estimate G versions of the correct surface normals n . Call these rotated estimates N . Then defining the rotation matrix R via ˆ −1 ρ = G ˆ −1 F n ≡ R n , N = G
(11)
our estimate of the unrotated normal is n = R −1 N = R T N .
(12)
It is possible to recover the rotation matrix R by considering an integrability condition on the normals [9]. The two-dimensional partial derivatives p(x, y) and q(x, y) effectively constitute a set of homogeneous coordinates for n : p = −n1 /n3 , q = −n2 /n3 . In order for these derivatives to be derivable from the same depth z, they must satisfy py − qx = 0. Since we use masks to approximate derivatives, this condition holds exactly everywhere for any image if p and q are derived from a z. In [9], rotations are chosen using descent with random steps and the matrix R is selected that satisfies min Z Z 2 θ, φ, ψ [−(n1 /n3 )y + (n2 /n3 )x ] dx dy,
R = R θ Rφ Rψ
(13)
symbolizing a sum over samples by an integral and bearing in mind that the correct n is given in terms of the recovered N by eqn.(12). Unless such a rotation is carried out, py = qx is not satisfied. To understand this result, consider an arbitrary surface z = f (x, y). Form the derivatives p and q, and from them the p normals n = (−p, −q, 1)/ 1 + p2 + q 2 . Now rotate these normals into new vectors N . The new estimates of derivatives, P and Q say, are not determined from a ‘potential’, z, and thus do not obey Py = Qx . 6
Finding the rotation R effectively aligns the object axes with the camera axes and thus determines the pose of the object. Instead of minimizing (13) by random assignments of the rotation matrix we would like to find a closedform solution. In Section 3 we show how to replace (13) with a smoothness condition which may be used to form Euler equations that can be solved for R .
2.4
Depth recovery
The LMS regression will discard outliers. These correspond to pixels in the image; therefore not every pixel will have a recovered normal vector. Hence we are faced with reconstructing depth from p and q values that have lacunæ in the image. Harris’ coupled depth/slope method [24] can be used in such a circumstance to interpolate p,q, and z values over outlier gaps.
2.5
Mutual illumination and spectrally varying illumination
Here we show that mutual illumination can contribute to a spectrally varying hemisphere of radiation impinging on a given surface region. The form factor f is the fraction of radiation leaving an infinitesimal area dA1 that reaches a finite area A2 [25, 19]. This is linearly related to the fraction of light leaving the finite area that impinges on the infinitesimal area via a reciprocity relation. Thus the diffuse light reaching dA1 from A2 is proportional to f times the radiosity (radiation leaving) surface A2 . The fraction f is given by f =
Z
£ ¤ [cos θ1 (x, y) cos θ2 (x, y)] / π d2 (x, y) dA2 (y)
where x is the position of dA1 , y ranges over the area A2 , and d is the distance between dA1 and the integration point y on the finite area A2 ; also, θ1 (x, y) is the angle between the normal n to dA1 and the vector d to y, and θ2 (x, y) is the angle between the local normal n (y) at y and the vector −d from y to x. For a particular dA1 at x, we can drop the x notation. Then replacing cos θ1 by a dot product, for that small region at x with normal n the form factor can be written f = n ·l ,
with l = (1/π)
Z h i dˆ (y) cos θ(y)/πd2 (y) dA2 (y)
where dˆ (y) is the normalized vector from x to y. Therefore, mutual illumination can act as a secondary source of illumination that can be thought of as a distant source, provided the region of interest is small compared to the interreflecting patch. But of course this condition obtains for each local point x of the surface of interest. It should be noted that this interpretation of the mutual illumination as a distant source holds only in a 1–bounce model [19]. 7
3 3.1
Optimization Equations for Rotation Matrix Smoothness criterion
The integrability condition of Section 2 is a smoothness condition in another guise. Consider a roof edge within the domain of infinite precision. Since this function is not differentiable it is not smooth. As a consequence its derivatives do not obey the integrability condition. Several smoothness criteria have been explored in computer vision. In [26], Horn and Brooks study functionals that contain a smoothness term based on derivatives of the normals themselves, (lx2 +ly2 +m2x +m2y ), where l = n1 , m = n2 . Here, we need to consider three dimensions, and extend their smoothness criterion to include o = n3 as follows: Z Z ¡ ¢ kn x k2 + kn y k2 dx dy min I1 = =
Z Z
£ ¤ (lx )2 + (ly )2 + (mx )2 + (my )2 + (ox )2 + (oy )2 dx dy
(14)
The minimization is over all rotations taking the normals N , recovered using the LMS method, into estimates of correctly aligned normals n = R T N . Only pixels identified as non-outliers by the regression are included in the minimization. We can simplify the notation for the rotation matrix by considering rows r 1 , r 2 , r 3 of the matrix R T , such that l = r 1 · N , m = r 2 · N , o = r 3 · N . Then to ensure an orthonormal set r i we must include six Lagrange multiplier terms in (14): I2 =
3 3 X X i=1 j≥i
Λij (r i · r j − δij )
(14′ )
with δij the Kronecker delta. We form the Euler equations by taking derivatives with respect to r i . E.g., taking ∂/∂r11 we have Z Z [(∂N1 /∂x)(N x · r 1 ) + (∂N1 /∂y)(N y · r 1 )] dx dy + Λ11 r11 + (Λ12 r21 + Λ13 r31 ) /, //, 2 = 0 (15) We can collect all such equations together and rewrite them in terms of the derivatives of the vector N , as follows (letting Λji ≡ Λij ): Z Z X (16) [N x (N x · r i ) + N y (N y · r i )] dx dy + Λii r i + Λij r j / 2 = 0 , i = 1..3 j6=i
Eqn.(16) can be rewritten in the form of an eigenvalue problem for each of the three vectors r by making use of the orthogonality property of rotations. Taking partial derivatives of (14′ ) with respect to the variables R R Λ, we retrieve the orthonormality of the vectors r i . Denote by S the matrix with elements Skj = [(N x )k (N x )j + (N y )k (N y )j ]dxdy. Now suppose r i is an eigenvector of S . Then premultiplying (16) by each r i we find that off-diagonal Lagrange multipliers Λij are zero. Making use of this, we find 8
that for each rotation vector r i we arrive at the same Euler equation: Z Z [N x (N x · r i ) + N y (N y · r i )] dx dy + Λii r i = 0,
i = 1..3
(17)
Eqn.(17) has the form S r + Λr = 0 where S is symmetric and therefore has real eigenvalues and is diagonalized by the three orthonormal vectors r. Instead of (14), the most direct minimization would be to simply minimize the integrability condition of Section 2 over all r i , and in fact it is straightforward to write down a minimization akin to (14) and (14′ ) for the integral of all squared py − qx residuals. However, the resulting equations are intractable. The advantage of eqns.(14),(14′ ) is that they lead to a simple closed-form solution. The disadvantage is that they should be followed by another minimization step perpendicular to the r 3 direction. For the meaning of (14) is that, in a least squares sense, all variation in n is constrained to be in the z direction. I.e., the rotation found will lead to a set of normals maximally oriented toward the camera. Thus the optimization will find the best r 3 , which produces n3 = r 3 · N , since it will tend to maximize n3 . However, if the object happens to have some off-camera-axis surface orientation which dominates, the reconstruction may fail. In particular, it is clear that (14) cannot work for a flat surface, since in that case the derivatives in (17) are all zero.
3.2
Operation of the smoothness condition
To understand (17), consider how the minimization behaves on a known, but rotated, surface. Suppose we start with known actual normals n over some region, but apply a rotation R T , with columns r Ti , to them to form N ’s. Then to see how the algorithm does we can form the matrix S and diagonalize it, hoping to recover the r Ti from its eigenvectors. For this known case we have observed N ’s with components Ni = r Ti · n , so Z Z £ T ¤ (r i · n x )(r Tj · n x ) + (r Ti · n y )(r Tj · n y ) dxdy Sij =
where the r Ti ’s are the ones we put in. We expect to diagonalize and find that we can undo the effect of R T by rotating back by R , with columns r i . The effect of S on r i is P3 S r j = i=1 sij r i RR where skj ≡ [(n x )k (n x )j + (n y )k (n y )j ]dxdy is given in terms of the correct n ’s. It is easy to show that the eigenvectors of S are in fact combinations of the r i weighted by the eigenvectors of the matrix with elements sij . Now, the closer s13 and s23 tend to zero, the closer r 3 gets to an eigenvector of S , i.e., the more the third eigenvector of the sij matrix tends to (0, 0, 1)T . The value s13 will be small if the covariance of the gradient of n1 with the gradient of n3 is less than the variance of the gradient of n3 (and similarly for n2 ); this is the case for symmetric objects facing the camera. When these conditions do not hold then the minimization (14) will still try to enforce them. As an example, if the surface is a hemisphere facing the camera, and we rotate its normals by R T , then (17) recovers R exactly. If instead we image only one octant of the sphere then (17) finds an r 3 close to the correct one that still minimizes (14). The values 9
of the de–rotated components n3 are all positive and are much closer to the correct ones than the N3 ’s of the object rotated by R T . If a biased object, such as a sphere octant, is on a black background then the algorithm’s accuracy in recovering r 3 improves. The reason is that the derivatives at the object edge help to align with the camera plane since the background’s vanishing p and q produce an n facing the camera.
3.3
Selection of third eigenvector
Since eqn.(17) is the same for all three r i , how can we decide which eigenvector is the correct one to identify with r 3 ? Fortunately, the answer is simple: the eigenvector that produces values of r i · N that are all positive (or all negative, in which case we change the sign of r 3 ) must be r 3 since n3 must be positive. Each of the other two results r 1 · N and r 2 · N give values that range over positive and negative numbers. This incidentally defines a useful test on whether the method can successfully be applied to a given image: if n3 has more than just a few negative values then there is an error.
3.4
Second minimization
Eqn.(17) produces a best r 3 ; the other vectors in the rotation matrix, r 1 and r 2 , lie in the plane perpendicular to r 3 . To find them, it is simple to substitute r 3 into a full py − qx minimization, since only a single angle in the r 1 , r 2 plane remains to be found. Suppose we label the three eigenvectors found so far r 3 , u 1 , u 2 . Let the correct rotation vectors be given in terms of u 1 , u 2 by (r 1 , r 2 )T = R α (u 1 , u 2 )T , where R α is a two-dimension rotation by an unknown angle α. Now minimize the original integrability condition over α: min α
Z Z "µ
r1·N r3·N
¶
y
−
µ
r2·N r3·N
¶ #2 x
dxdy , (r 1 , r 2 )T = R α (u 1 , u 2 )T
(18)
In the Appendix, we show how taking the derivative with respect to α results in a simple equation of the form a sin2 α + b sin α cos α + c = 0 (19) Solution of this implicit equation in α is trivial by efficient methods. One question remains, however, to fix the assignment of all three r i : since it is practical to have an algorithm bracket the zero of eqn.(19) over a fixed range of possible values of α, 0◦ to 90◦ say, which eigenvector of (17) should be set to u 1 , (and the second vector will be the cross-product of that vector with r 3 )? Again the answer is simple: repeat the minimization twice, setting u 1 to each of the candidate vectors in turn, and select that vector which produces an α value giving the smaller minimization integral (18).
3.5
Other minimization schemes
We show below that in fact the algorithm set out above does quite well in recovering correct normals and the absolute attitude of the object. Nevertheless we point out that the accuracy of the present algorithm could be improved at the price of more complexity. We could take the values of r i recovered so far as starting values in an iteration scheme using a closer approximation to the full integrability minimization than eqn.(14). 10
The difficulty with the full minimization is that n3 = r 3 ·N appears in the denominator. One could take the tack suggested in [26] and simply consider the numerator only. For the present problem this amounts to ignoring a term (n3 )4 in the denominator of the minimization integrand. Then including Lagrange multiplier terms as in (14′ ), the Euler equations still turn out to be quite complex. However, if one assumes r 1 , r 2 to be fixed then the equations again take the form of an eigenvector equation for r 3 . This suggests solving in turn for r 3 , and then the set r 1 , r 2 as in the last section, and iterating. We have not tried this scheme, however.
3.6
Ambient light
If an ambient light term is considered important, it is straightforward to include it in the LMS regression. For if we replace eqn.(6) by α = F n + s with α an ambient light term, then we arrive at a regression in 9 unknowns D and β , with D = C /(1 − α T α ) and β = D α as follows: ρ T D ρ − 2ρ T β = 1. Once we solve for D and β (by regression), we arrive at C and the ambient term α . In Section 4, we examine the behavior of the algorithm on the synthetic image of Fig.1 as well as on real imagery and show that the algorithm performs well. No ambient term was found necessary.
4 4.1
Test Images and Discussion Synthetic image
Consider the ellipsoid in color space found by the LMS regression; for the input image of Fig.1(b) the best surface is shown in Fig.1(c). The robust outlier detection method identifies pixels in the original image that must be discarded as not belonging to the largest patch. Fig.1(d) shows those points accepted by the regression. Normals faced away from one or more lights are discarded. Therefore the regression forms an automatic segmentation mechanism for the most important segment with no region growing necessary. Eqn.(7) yields an estimate of the unrotated normal N , before the optimization steps of Section 3. To show that integrability is indeed a problem throughout the image and is not simply a boundary effect, Fig.1(e) shows the absolute value of py − qx for the recovered normals N . The second minimization step of Section 3, that over the angle α in the plane perpendicular to r 3 , leads to the necessity for solution of the equation (19). For the Mozart image, this equation looks like Fig.1(f), and its solution is simple. Once an estimated rotation matrix has been applied to form n = R T N we might expect all the rotated normals to likely be in error with respect to the correct values by some specific rotation angle. However, here some near-outlier pixels are accepted by the regression as being sufficiently close to the regression surface. Fig.1(g) shows the values of the errors in the recovered normals in the image, not including any outlier pixels, compared to the correct values. The median value of the angular error is 15.6◦ . To see that this is quite a small error in practice, consider the image of Fig.2(a). Here we show the original depth map, shaded from the right. Now applying the LMS recovery mask of Fig.1(d) to this correct image results in Fig.2(b). This is the best we can hope to do in recovering the surface. 2 Now in Fig.2(c) 2 We found that coupled depth/slope surface reconstruction did not work well in this case with large gaps between pixels with known p and q. Adding natural boundary conditions may improve depth/slope surface recovery [27]. Here we present a shaded image derived from p and q rather than a depth map.
11
we show the result of shading the recovered p and q; the results are very close to those in the correct image, Fig.2(b).
4.2
Real image
Fig.3(a) shows an (in)famous image [28], ‘Lenna’, used as a standard test for shape-from-shading algorithms. Consider the fairly uniform region identified by the box in the image. Plotting the RGB values of this subimage in color space in Fig.3(b) clearly shows an ellipsoidal cluster, as well as lower intensity outlier values. An LMS fit to this data shows that indeed the data does belong to an ellipsoidal shell, to a high degree of probability – the value of the robust version of the coefficient of determination R2 , equal to 1 minus the square of the median of the absolute residuals, here, is 0.99954. All ellipsoid coefficients are statistically significant. Outliers are identified in white in Fig.3(c). These are generally lower-intensity pixels. The fact that an ellipsoid fits so well is striking. Clearly, this commercial image was in all likelihood taken under spectrally varying illumination. Considering the techniques of professional photographers this should not be too startling in hindsight. However, this finding does to some extent call into question results in shape-from-shading that make use of this image to prove their worth. To further show that the data falls on an ellipsoid, consider Fig.3(d). For any pixel, the quantity ρ T C ρ ≡ κ determines how close to the ellipsoid the point ρ is. For a value √ κ 6= 1 one can interpret that point as lying on an ellipsoid larger than the ellipsoid ρ T C ρ = 1 by a factor κ. Therefore the value of κ gives the ratio of the square of the semi-axes of the particular ellipsoid on which ρ lies to the model ellipsoid. Fig.3(d) shows the histogram of this value. We see that for the Lenna image, κ is close to 1. The outliers detected lie in steeper–normal areas of the region used in the regression, viz. near the eye and nose. An important feature of eqn.(7) is that it is not guaranteed to produce normalized normal vectors. E.g., applied to pixels far off the ellipsoid the color-to-normal relationship would produce wrong and unnormalized normals. Therefore it is a useful check to determine whether the non-outlier points do indeed produce normal vectors that are length 1. Fig.3(e) shows that for the Lenna image the recovered normals are close to length 1; this lends further weight to the interpretation of the RGB values as coming from a spectrally varying light environment. Fig.3(f) shows the recovered p and q for the cheek region, shaded with illumination from the left, in front of, and below the face. Here, Harris’ coupled depth/slope method has been used to interpolate the p and q values across outliers. Once one has recovered a matrix C , the same matrix and LMS outlier detection scheme can be applied non-locally to a larger region of the image. Let us again transform ρ to n and detect outliers to the ellipsoid found in Fig.3(b), but apply the same C to the larger image area in Fig.3(g). The resulting recovered p and q, shaded with light as in Fig.3(f), is shown in Fig.3(h). For non-outlier pixels the recovery seems to be reasonable, and outlier pixels are identified as not part of the main segment of the image.
5
Conclusions
We have shown that it is possible to recover surface orientation by exploiting the linear relationship between color and surface normal that obtains under spectrally varying illumination. Moreover, recovery of the 12
surface normal rotation left ambiguous by a regression in color space is also achievable using an integrability condition. The chief problem standing in the way of a closed-form solution for the rotation comes from the intractable nature of the problem when written in terms of the original integrability condition. Here we remedy this difficulty by substituting a smoothness condition instead. The resulting Euler equations consist of a simple eigenvalue problem. The equations generate the best transformation rotating the surface normals in the direction towards the camera. A straightforward optimization of the full integrability condition in the plane perpendicular to this direction completes the recovery of the surface normal rotation. As well as recovering the rotation, the present method identifies those pixels belonging to the non-outlier segment. Outlier regions can be rank 3 but still not belong to this segment if the region sees a reduced but linearly independent set of lights, or may be lower rank and properly should belong to the main segment. For general surfaces, rank 1 patches correspond to the usual shape-from-shading problem. An open question remains as to how one could knit together solutions for the rank 3, rank 2, and rank 1 patches, particularly for non-connected regions. We shall pursue these ideas in future work. The possibility of degeneracy in the data is an important concern. It is unlikely that the matrix A of light source directions will be rank-reduced if interreflections are present. As for matrix B , the synthetic image presented here used ordinary lights and produced a B with full rank (i.e., effective rank calculated using SVD). The Lenna image also had rank-3 ρ ’s. Nevertheless if colors were even more highly desaturated than those in the Mozart image they could be close to coplanarity and the method would fail; however, the failure would be easily detected and signaled. One main strength of the method is its simplicity as well as the fact that no structured lighting is required: nothing need be known about the illumination environment. As well, the method has the laudable feature of identifying those pixels for which its underlying model does not hold.
Acknowledgement One of the authors (M.D.) is indebted to Dr. Peter Rousseeuw for making available his LMS software.
Appendix Substituting r 1 = cos α u 1 + sin α u 2 , r 2 = − sin α u 1 + cos α u 2 into (18) and taking the derivative with respect to α yields Z Z [(− sin α uy + cos α vy + cos α ux + sin α vx ) /w +
((sin α u − cos α v)wy + (− cos α u − sin α v)wx ) /w2
¤
[(cos α uy + sin α vy + sin α ux − cos α vx ) /w + ¤ ((− cos α u − sin α v)wy + (− sin α u + cos α v)wx ) /w2 dxdy = 0
where u = u 1 · N , v = u 2 · N , w = r 3 · N . This equation has the form
(ac − bd) sin2 α + (ad + bc) sin α cos α + bd = 0 which reduces to eqn.(19). 13
References [1] R. J. Woodham, Y. Iwahori, and R. A. Barman. Photometric stereo: Lambertian reflectance and light sources with unknown direction and strength. Technical Report TR 91-18, University of British Columbia Department of Computing Science, 1991. [2] A. P. Petrov. Light, color, and shape. In E. P. Velikhov, editor, Cognitive processes and their simulation, pages 350–358, 1987. In Russian. [3] A. P. Petrov. Color and Grassman–Cayley coordinates of shape. In B. E. Rogowitz, M. H. Brill, and J. P. Allebach, editors, Human Vision, Visual Processing and Digital Display II, volume 1453, pages 342–352. SPIE, 1991. [4] A. P. Petrov. On obtaining shape from color shading. Color Research and Application, 18:375–379, 1993. [5] A. P. Petrov. Surface color and color constancy. Color Research and Application, 18:236–240, 1993. [6] P. P. Nikolaev. Some algorithms for surface color recognition. In M. S. Smirnov, editor, Simulation of learning and behavior, pages 121–151, 1975. In Russian. [7] P. P. Nikolaev. Monocular colour discrimination of volumetric objects in different conditions of illumination. Biofizika, 33:140–144, 1988. In Russian; translated as Biophysics 33:148–153, 1988. [8] M. H. Brill. Object-based segmentation and color recognition in multispectral images. In Image Understanding and the Man–Machine Interface II, volume 1076, pages 97–103. SPIE, 1989. [9] L. L. Kontsevich, A. P. Petrov, and I. S. Vergelskaya. Reconstruction of shape from shading in color images. J. Opt. Soc. Am. A, 11:1047–1052, 1994. [10] L. L. Kontsevich. Computation of surface shape for uniformly colored and illuminated by chromatic light sources convex object from its two-dimensional projection. In Data Processing in Information Systems, pages 16–19, 1986. In Russian. [11] M.H. Brill. Image segmentation by object color: a unifying framework and connection to color constancy. J. Opt. Soc. Am. A, 7:2041–2047, 1990. [12] M. H. Brill. Photometric models in multispectral machine vision. In B. E. Rogowitz, M. H. Brill, and J. P. Allebach, editors, Human Vision, Visual Processing and Digital Display II, volume 1453, pages 369–380. SPIE, 1991. [13] A. P. Petrov and L. L. Kontsevich. Properties of color images of surfaces under multiple illuminants. J. Opt. Soc. Am. A, 11:2745–2749, 1994. [14] M.S. Drew. Shape from color. Technical Report CSS/LCCR TR 92–07, Simon Fraser University School of Computing Science, 1992. Available using ftp:// fas.sfu.ca / pub / cs / techreports / 1992 / CSS– LCCR92–07.ps.Z. 14
[15] M.S. Drew. Shape and specularity from color. Technical Report CSS/LCCR TR 93–01, Simon Fraser University School of Computing Science, 1993. Available using ftp/gopher from fas.sfu.ca in directory /pub/cs/techreports/1993 as CSS–LCCR93–01.ps.Z. [16] M.S. Drew. Robust specularity detection from a single multi-illuminant color image. CVGIP:Image Understanding, 59:320–327, 1994. [17] P. J. Rousseeuw and A. M. Leroy. Robust Regression and Outlier Detection. Wiley, 1987. [18] J.E. Kaufman. IES Lighting Handbook. Illuminating Engineering Society, 4th edition, 1966. [19] M.S. Drew and B.V. Funt. Calculating surface reflectance using a single-bounce model of mutual reflection. In Proceedings: International Conference on Computer Vision, Osaka, Dec.4-7, pages 393– 399. IEEE, 1990. [20] B.V. Funt and M.S. Drew. Color space analysis of mutual illumination. IEEE Trans. Patt. Anal. and Mach. Intell., 15:1319–1326, 1993. Reprinted in: Physics-Based Vision. Principles and Practice, Vol. 2, eds. G.E. Healey, S.A. Shafer, and L.B. Wolff, Jones and Bartlett, Boston, 1992, page 385. [21] B. K. P. Horn. Closed-form solution of absolute orientation using orthogonal matrices. J. Opt. Soc. Am. A, 5:1127–1135, 1988. [22] C.S. McCamy, H. Marcus, and J.G. Davidson. A color-rendition chart. J. App. Photog. Eng., 2:95–99, 1976. [23] G. Wyszecki and W.S. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulas. Wiley, New York, 2nd edition, 1982. [24] J. G. Harris. A new approach to surface reconstruction: the coupled depth/slope model. In Proc. First Int. Conf. on Comp. Vision, pages 277–283, 1987. [25] C.M. Goral, K.E. Torrance, D.P. Greenberg, and B. Battaile. Modeling the interaction of light between diffuse surfaces. Computer Graphics, 18:213–222, 1984. [26] B. K. P. Horn and M. J. Brooks. The variational approach to shape from shading. Comp. Vision, Graphics, and Image Proc., 33:174–208, 1986. [27] Y. Li. Surface reconstruction by coupled depth/slope model with natural boundary conditions. TR-91– 24, Dept. of Computing Science, University of British Columbia, Vancouver, BC, 1991. [28] ‘Lenna’. Playboy, Feb., 1972.
15
Mozart: Ellipsoid in Color Space .............. .... .................. .. ...... ................... ......... ......................... ............................................ .... .......... .................. . ...................... ............... ....................... ....................................... . . . .. . ..... .... ..... .... .. ..... ............ ...... ..... .. . ... .......... ........... ........ . . ............ ................. ................. ......................... ........ ........... .. . . ... ........... ......... .. ... .. . .... .............. .......................................................... . . . . .. . ... . . .. . ... ................. ...... ... ... ............ ...... .. ........ .... .......... . ...... ..... ............ .. .... ..... . ......... ........... ..... .... . . ............... ....... ..... ......... . ........ .................... ................ ... .. . .. .. .................. ....................... ..... .... ... ....... ............................ . . . . . . . .. . .. . . . . . ...... ......... ..... . . ... ................. ....... . ....... ... .................................. . . ...... . ......................... ....... . . ...... ... . .. . . .. ..... . . .... . .. . . . . .. . . . . . . . .. . .... . . . . . .. . ..... . . .. ...... .
Green
60
80
250 300 0 50 100 150 200
Mozart: Depth
60
40
50
60
40
50
30
40
20
30 20
20
10
10
.. .. ......... ..... .........
. .......
50
(a)
100
(b)
150 Red
200
250
(c)
-0.4
-0.2
0.0
f(alpha) 0.2 0.4
0.6
0.8
Mozart: 2nd Minimization Function
0
20
40
60
80
alpha
(d)
(e)
(f)
0
100
200
300
400
500
600
Mozart: Angular errors of recovered normals
8
10
12 Degrees
14
16
(g)
Figure 1: (a): Depth for plaster bust of Mozart. (b): Color image from shading with 5 natural lights. (c): Ellipsoid in color space; fitted ellipsoid shown with △. (d): Mask for non-outliers produced by LMS regression. (e): Integrability condition values over image. (f): Function of α to solve in minimization perpendicular to rotated r3 . (g): Histogram of angular errors for recovered normals.
(a)
(b)
(c)
Figure 2: (a): Shading of actual Mozart bust using a single illuminant. (b): As in (a), but multiplied by the mask in Fig.1(d). This is the best shaded depth image that is possible. (c): Recovered p and q, shaded as above. The result is close to the best possible image, (b).
Green 100
150
Lenna: Cheek--RGB points and Outliers
0. 0. 0.
50
0. 0.
. ... . ............. ...................................... . .. ......................... .. ................ 0. 0. 0.......................... .. . .................. ................ 0. ... ..... .......................... 0.. ............. .... ................. ...... 0. . .0..0.0.......................... .. . . . . . ..... .......... 0.0...0....0................... 0. 00. 0.0 .... ................. . 0. 0. 0..0...0........ ................ . ................ 0. 0. 0 0. 0.0. 0. 0 0....0....................................... . 0.. 0.0 0. 0. 0 . . 0 ........... 0. ..0...0............................... 0..0................... 0.00. 0 0. ...0 0.0.0. ............... 0. . .0 .0.0.0.0 . . . 0.0 . . . . . . . 0 ................ .. ...0..0......0.......................... 0 0. 0. . 0.00 .0. 0.0.. ........................... .0 0. 0 0. 0.. 0 . .. 0...0..0.0.............. 0..0 0 ...0 . ..... .. 0.0.0.0 0.0 00.0..... .. 0..0.0. 0. 0. .00 ..0... . 0 0. 0.0 000 . 0. 0.0
0.
0. 0. 0.
0. 100
120
140
160
180
200
220
240
Red
(a)
(b)
Lenna: Cheek. Norms of recovered normals.
0
0
100
100
200
200
300
300
400
400
500
500
Lenna: Cheek--Ellipse Ratios
(c)
0.95
1.00
1.05
0.96
(d)
0.98
1.00
1.02
1.04
(e)
(g)
(f)
(h)
Figure 3: (a): Input color image. Regress on RGB’s in boxed region. (b): Color space points and outliers. (c): Outlier mask. (d): Values κ are mostly ∼ 1. (e): Recovered normal vectors are approximately normalized. (f): Recovered p and q shaded with a single illuminant. (g): Larger area of input. (h): Recovered p and q shaded with single illuminant.