On Illumination Invariance in Color Object ... - Semantic Scholar

Report 3 Downloads 58 Views
On Illumination Invariance in Color Object Recognition Mark S. Drew, Jie Wei, and Ze-Nian Li School of Computing Science, Simon Fraser University, Vancouver, B.C. Canada V5A 1S6 email fmark,jiew,[email protected] Keywords: Object Recognition, Image Library Indexing, Illumination Invariance, Wavelet-Based Compression, DCT, Color

c M.S. Drew, J. Wei, and Z. Li 1997

Abstract Several color object recognition methods that are based on image retrieval algorithms attempt to discount changes of illumination in order to increase performance when test image illumination conditions di er from those that obtained when the image database was created. Here we investigate under what general conditions illumination change can be described using a simple linear transform among RGB channels, for a multi{colored object, and adduce a di erent underlying principle than that usually suggested. The resulting new method, the Linear Color algorithm, is more accurately illuminant{invariant than previous methods. An implementation of the method uses a combination of wavelet compression and DCT transform to fully exploit the technique of low{pass ltering for eciency. Results are very encouraging, with substantially better performance than other methods tested. The method is also fast, in that the indexing process is entirely carried out in the compressed domain and uses a feature vector of only 63 integers.

1 Introduction Several color object recognition schemes exist that purport to take illumination change into account in an invariant fashion. In a sense, object recognition may be considered a special application within the general rubric of image retrieval: an object recognition system should certainly have the capability of correctly retrieving images of objects from an image database when there is no alteration in the object's pose, position, scale, or illumination environment and the methods cited below have this characteristic. Nevertheless, while producing good 2

results in many cases, we show below that some of these methods actually produce poor results when confronted by illumination change. In this paper we address the problem of illumination change in the setting of very general lighting environments. We argue that often an approximately linear relationship exists between the image of an object under one set of lights and the image under a di erent set. This is a new result in that previously such a relationship was used only for singly{colored surfaces [1] whereas here we argue that the result can in some situations be carried over to the case of general, multi{colored objects. We also demonstrate that a linear relationship between pixel colors under a change of illumination can also serve as a useful method even when some of the assumptions made in general break down. We denote the image retrieval method based on this observation as the Linear Color method. Using this method, we nd substantially better performance than with previous methods under conditions that tend to isolate a pure lighting change. As well, we provide an ecient implementation of the method, making use of both wavelet{based reduction and Discrete Cosine Transform (DCT) to realize a good low{pass ltering. The simplest previous method for illuminant{invariant recognition is the Color Angles method of Finlayson et al., who propose indexing on six numbers corresponding to the \angles" between image channels [2]. That method consists of the following: normalize each mean{subtracted color channel R; G; B to length 1, and then take as indexing numbers the three angles formed from the inverse cosine of the products RG, RB , GB . Along with these, append a second set of angles derived in exactly the same way from the edge image of the smoothed color image, r2G , where  = (R; G; B ). The idea is that, if camera sensors are 3

suciently narrowband, these angles are invariant to color shifts in the illuminant because in that situation illumination change corresponds to a simple scaling of the color channels: (the \coecient rule"). In the limit of Dirac delta function sensors, this approximation would hold identically. If sensors are not \sharp" enough, then provided one knows the camera sensor curves one can carry out a \spectral sharpening" operation [3] to bring the color angle approximation more in line with actual conditions [2]. Thus illumination change amounts to a diagonal matrix transform among color channels. With respect to illumination change, one important idea has been that of modeling illumination change by means of a linear mapping of color channels within either a color image itself or a feature space derived from a color image (see, e.g., [4, 5]). That is, one would like to model illumination change by a simple 3  3 matrix transformation M from the RGB planes of a database image to those of a test image. If illumination change is such that a diagonal model is insucient to capture the full characteristics of the lighting, then without knowledge of the sensor curves we need to actually deal with all nine elements of

M . In that case there is no need to normalize the color bands since a good matrix M will take care of scaling for us. Thus the color angle idea can be summarized as a use of a covariance matrix (plus the addition of an edge image covariance). As such, its closest relative is the method of Healey and Wang [4], which has been applied to color textures but in fact works reasonably well on the object recognition tests performed below. The color angle method has distinct advantages over [4], however, in that it is invariant to 2D rotations and translations, but here we are mostly concerned with the behavior of methods with respect to illuminant change 4

and we try to isolate that behavior by ruling out rotations, change of scale, etc. However, these considerations could possibly be applied as preprocessing steps. Healey and Wang's method [4] consists of considering a set of correlations Rjk (n; m),

j; k = 1::3 (i.e., covariance terms for mean{subtracted image channels), with shifts n; m from 0 to 16 pixels. In practice, they use n = 0::16 and m = ?16::16, so for each set of channel products jk they use 561 correlations. Then there are 6 di erent color channel products (RR, RG, etc.) for each set of shifts, so an image feature vector has 561  6 = 3366 elements. For matching, an error measure is taken to be the distance between a new image's feature vector and the result of writing that vector in terms of the eigenvectors of each model image's set of correlations. Let us refer to this method as the Correlation Method. In comparison, other illumination invariant matching schemes include the color{constant{ color{indexing method [6], which uses the counts in 4096 histogram bins for a color edge image, and the method of Healey and Slater [5], which uses moments of color histograms and makes the assumption that local image patches can be modeled as textured, at planes. Justi cation for the linear mapping notion has been by reference to the idea that surface re ectance can be well represented by a nite dimensional model of dimension n, where n is the same as the number of camera sensors [4, 5, 2]. Here, n is taken to be 3. However, although low{dimensional surface re ectance can be justi ed in the context of the Maloney{Wandell model [7], this assumption can sometimes be inaccurate [8]. Instead, we would like to argue that in fact the idea of a linear mapping can be fairly well justi ed, but on entirely di erent grounds | the grounds that, outside of a laboratory setting, lighting is necessarily complex, with many lights of varying strengths and colors 5

at play, along with colored interre ections. Such a lighting environment means that \the" illumination must itself be apprehended by means of a 3  3 matrix. Then the result that lighting change is described by means of a linear transformation necessarily follows for a singly{colored surface, and in some cases for a many{colored surface. In Section 2, we set out the justi cation for these remarks, and show how they impact recent work on illumination{invariant object recognition. In Section 3 we detail a new method for deriving the correct image transformation, based on low{pass ltering by a wavelet{based reduction of image size followed by a DCT transformation. In Section 4 we test the method on sets of data that approximate elimination of every other change except change in illumination, and compare the results with some other methods. The results are quite favorable. Section 5 contains a general assessment of the method.

2 Illumination Change 2.1 Justi cation for linear model: simple illumination Consider the nite{dimensional{model justi cation for a linear map relating two images di ering by an illumination change. Suppose the surface corresponding to pixel x has surface spectral re ectance function S x() (and no specularity or shadows). Then if one can nd a set of s vectors that form a good basis for re ectances, say a set of Singular Value Decomposition vectors Sj (), j = 1::s, then one can approximate S () via

S x() '

s X !jxSj () j =1

6

(1)

at position x, with weights !jx. If camera sensors consist of a set of three functions q () and the light spectral power distribution is given by E (), then in a Lambertian model and under conditions of orthography the RGB triple  x at location x on a at surface is given by

Z

 = a  n E () X !jxSj () q () d  M ! x x

3

j =1

(2)

with a the normalized light source direction and n the surface normal. Integration is over the visible spectrum. Clearly, an illumination change E ! E 0,

a ! a 0 and tilt of the plane n ! n 0

corresponds simply to a change M ! M 0, so that illumination change implies a linear transform from  to  0, assuming an invertible matrix M . If there are many di erently colored patches x on the at surface, the same situation holds, with one matrix M the same for all x, since only the illuminant changes in matrix

M. However for non{ at surfaces there is another way of looking at the linear map for changing illumination provided illumination is complex, as described next.

2.2 Complex illumination Suppose there are many illumination sources, i = 1::L say, where L could be very large. Then, without assuming any nite{dimensional{model, suppose we consider for the moment a single Lambertian surface, and assume the surface is illuminated by several, possibly very many, colored lights of unknown strength, direction, and color. An image of a uniformly colored surface under several colored lights, say room light plus 7

daylight (plus interre ections, perhaps) under conditions of orthography | distant lighting and distant viewing | exhibits pixel RGB values that lie on a surface in color space, rather than on a ray. If the lighting collection, considered as a matrix indexed by color and by spatial direction, has rank 3 then colors lie on an ellipsoid; if the rank is 2, then colors make up a planar, lled ellipse. Such a situation is by no means unlikely | in [9] it is claimed that rank 2 images are the rule rather than the exception in ordinary lighting situations and examples are given there (e.g., an image taken near a window, and with a room light on). As well, rank{3 lighting can be easily achieved in a controlled setting. Now we have to add up the contributions of all the lights that contribute to the color  x at location x:

x =

Z

L X i=1

(a Ti n x) Ei() Sx () q () d  F n x

(3)

where n x is the surface normal for that pixel and light Ei has normalized spatial direction

a i. The above form is due to Petrov et al. [10, 11]. It says that color is linearly related to surface normal.

We can restate eq.(3) by de ning the color of light i re ected from the surface, Z

b i  Ei() S () q () d ;

(4)

x = B A nx = F nx ;

(5)

so that eqn.(3) becomes

8

with L  3 direction matrix A , 3  L color{strength matrix B , and 3  3 color{strength{ direction matrix F . The simple form (5) is due to Drew [1, 12]. Now under an illumination change E ! E 0, becomes

b 0i

=

Z

a ! a 0, the re ected color for light i

Ei0() S () q () d

(6)

so that we observe color

 0x = F 0 n x = F 0(F ?1F )n x = M  x;

(7)

de ning a linear transform M between colors under di ering illuminants. In [13] this idea was used to match color face images, assuming faces to be all one color. So far, we have considered a singly{colored surface. But we argue below that the above linear transform carries over for multi-colored surfaces in some situations provided lights and surface conjoin in an approximate multiplicative model to form observed colors.

2.3 Factor model of color: justi cation for linear model under complex illumination Borges [14] carefully considered errors resulting from the computer graphics method of creating re ected colors by simply multiplying light RGB by surface RGB as under white light. He showed, unsurprisingly, that if illumination is white enough then this simple \factor model" of color is a good approximation. The approximation works fairly well for light that is not very \white" as well. For convenience, he assumed that sensor curves q sum to unity. 9

Here, let us include possibly unknown camera sensor curve sums explicitly as three scalings

k ; k = 1::3. Then under a factor model, the re ected color of light i is given by bik ' sk eik =k ; k = 1::3

(8)

where sk is the surface color under equi-energy white light Z

sk =

S ()Qk ()d

(9)

E i()Qk ()d

(10)

and eik is the color of the ith illuminant,

eik =

Z

The camera scaling term is

k =

Z

Qk ()d

(11)

Substituting, eq.(3) becomes

k = = sk

X

j

L X eik sk =k i=1

Ekj nj ; where

ain

L X Ekj = eik aij =k i=1

(12)

and the above holds for pixels in any color patch, with appropriately substituted color sxk, provided lighting is the same for every pixel. Note that the lighting matrix E is the same for all pixels. This says that F for every patch is composed of one single E , times a diagonal 10

matrix for that patch formed from diag(sk ). Denote the diagonal matrix for surface color by S x = diag(sxk). Thus, under an illumination change E ! E 0, the new color observed is

xk0 = sxk

X 0 x Ekj nj j

= sxk or

= sxk

X

l

Xh

(E 0E ?1)E

j

i

nx kj j



E 0E ?1 kl (sxl)?1xl;

 x 0 = S xM S x ?1  x

(13)

where M is a 3  3 matrix, and is the same matrix for every pixel. Now it is of interest to determine under what conditions matrix M is diagonally dominant, because in those cases matrix S approximately commutes with M and even for multi{ colored objects one could model illumination change via a linear transform  0 ' M  . This situation obtains when impinging lights are each mostly of one color, R,G, or B, and the matrix A has components mostly in the z-direction. But the latter situation is necessary in any event for the method to work|unless lighting directions are fairly near to the camera axis there will be too many object points that are shaded from some of the lights. In that case each object point will sum up its own version of matrix E , rather than that which includes contributions from all lights. For at surfaces that consist of dimension{3 surface re ectances , as in eq.(2), the single{

11

matrix equation  0 ' M  again holds even for complex lighting (cf. [5]): L

Z

 x = X n  a i E i() X !jxSj () q () d  M ! x i=1

3

j =1

(20)

Even if special lighting conditions do not apply, it is straightforward to demonstrate that the expectation value of ( 0 ? M  ) is zero if  characterizes a \gray{world" assumption (i.e. that  is drawn from a random{uniform probability density), and obeys the factor model, if we take M to be the best least{squares t to  0 ' M  . Of course, if only a single light is present (or matrix E is rank{reduced) then the inverse in (13) will not exist. Nevertheless we show below that an indexing scheme based on a linear transformation between images still works well.

2.4 Transformation of covariance So far, we have shown that under the assumption of the factor model, illuminant change can in many cases be expressed by a linear transform of the color image. If we wish to use the above covariance{based methods, we should also prove that covariance matrices transform linearly under illuminant change under the factor model assumption. In [4] this assumption is proved using the nite{dimensional{model assumption, but here we can prove it via  0 ' M  . Let us de ne the (non{mean{subtracted) covariance matrix

Ckm 

X a a k m a

12

;

(14)

summed over a = 1::N pixels. Since it is a rank{2 tensor, it must transform as

C0 = M CMT

(15)

Thus covariance does indeed transform linearly under illuminant change. The role of covariance in indexing into a database of images is as follows. Suppose we wish to nd the best least{squares version of M for the transform between a database image

 and a new, target image  0. It is convenient to treat the whole image at once, so now let  denote a 3  N array of color pixel values for an N = n2{pixel image: treat the image as three stretched{out long vectors. Then the Linear Color model is still expressed as  0 = M  , now treated as a matrix equation. We observe that the best least{squares solution of the minimization

min k  0 ? M  k2

(16)

M = ( 0 T )(  T )?1

(17)

obeys the normal equation

We could now base an indexing scheme solely on eq.(17), but it would be a costly one, being based on whole images. Nevertheless such a scheme is useful for benchmarking an appropriate faster implementation, set out below in x3. To di erentiate the result (17) from a more ecient implementation, let us refer to a matching scheme based on (17) as Linear Color{A, as opposed to a faster implementation, Linear Color{B, given in x3. The matrices M , ( 0 T ) and (  T )?1 in eq.(17) are each 3  3. In terms of matrix 13

M , the best transform of color image  is a distance dist =k  0 ? M  k

(18)

away from a target image  0. The best image in a database of color images  will be that with the least distance from the new image  0. Now if we indeed have  0 '

M  , then substituting the best least{squares solution

for M yields the fact that the squared distance error measure is just given in terms of a di erence of covariances: n

dist2 = Tr C 0 ? M C M T

o

(19)

The most direct method of nding matrix M would be a least{squares map of one image to the other, via eq.(17). Of course, this is quite inecient since forming each of matrices ( 0 T ) and (  T ) involves 9N 2 multiplies (for an N {pixel image), but if the images themselves could be accurately stored in a compressed format, then eq.(17) could be applied eciently. In the next section we show that if we store a representation of each image that is rst reduced in size by a wavelet transformation, and then further reduced by going to the frequency domain and discarding higher{frequency DCT coecients, one can indeed apply eq.(17) accurately and eciently to retrieve matrix M and hence derive a simple indexing method that is invariant under illuminant change. In the following method, the number of parameters in the feature vector is only 63 integers.

14

3 Implementation: Wavelet{DCT Compression

Applying eq.(17) directly to full{size color images would be highly computationally intensive; we would like to recover an accurate approximation of matrix M without sacri cing eciency. Since the underlying principle in the present approach is eq.(7), i.e., a linear relationship between an image  and a image  0 under a new illuminant, we can approximate

M most accurately by applying only linear operations to both sides of (7) while compressing the images. Therefore, here we rst apply a linear low{pass lter to both images, resulting in new images  l and  0l. These satisfy eq.(7) if the original images do. To best approximate matrix M , the low{pass ltered images should approximate the original ones as closely as possible, yet be of lower resolution. The scaling function of biorthonormal wavelets, as a symmetrical low{pass lter, can be exploited to that end. Basically, scaling functions with more \taps" use polynomials of higher order to approximate the original function (the number of taps is the number of nonzero coecients). Our main concern is to capture the most detail but in lower resolution, and after some experiments we arrived at a good balance between eciency and precision by using the symmetrical 9{tap lter. The 1D mask of the separable 2D scaling function is:

f0:037829; ?0:023849; ?0:110624; 0:377403; 0:852699; 0:377403; ?0:110624; ?0:023849; 0:037829g 15

After applying the scaling function several times to the original images, assuming for simplicity square images of side 2n, we obtain size 16  16 lower resolution images  l and

 0l. Next, let us consider the values in every row of  l and  0l as the values of respective random variables, v i and v 0i, where i ranges from 1 to 3. Hence each v 0i is a linear combination of v j 's. For example, if the rst row of M is (p; q; r), then

v 01 = pv 1 + qv 2 + rv 3

(19)

Now consider the DCT: if we denote the result of v transformed via a DCT by c v , then since the DCT is linear we also have

vc01 = pvc1 + qcv 2 + rvc3

(20)

That is, after applying the DCT transform the values of corresponding frequencies still satisfy the original linear equation and eq.(7) thus still holds after the DCT transform. Since the lower frequencies in the DCT capture most of the energy of an image, after applying the DCT we can retain just the lower frequency coecients to estimate M with fairly good accuracy | a very e ective and ecient way of realizing a further low{pass ltering. By trial and error we found that using only 21 coecients for each band worked well, these being those in the rst 21 numbers in the upper left corner of the DCT coecient matrix, as shown in Fig. 1. 1

1

Instead of using a conventional 8  8 window for the DCT, a 16  16 window is adopted. As a result, a

16

1 2 3 5 4 9 10 12 11 20 21 23 22 35 36

6 8 13 19 24 34

7 15 16 28 29 14 17 27 30 18 26 31 25 32 33

Figure 1: Arrangement of DCT coecients We then truncate to the integer part of each coecient. Denote by  d that 3  21 matrix whose rows are comprised of the 21 integers (derived from DCT coecients) in the order illustrated in Fig. 1, for each of the three bands. Then, nally, if the two original images satisfy eq.(7) we have approximately

 0d = M  d

(21)

Hence, an approximation of Linear Color{A, the whole{image least squares solution (17) for matrix M , is simply

M d = ( 0d Td )( d Td )?1

(22)

Denote a scheme based on low{pass frequency vectors  d and eq.(22) as Linear Color{B. Populating the database, then, consists of calculating o -line the 63 integers  d, viewed ner resolution (twice as high as with 8  8 ) in the spatial frequency domain is realized. Since the low{pass ltering after DCT can only retain a limited number of coecients for eciency, the net e ect of having a larger (16  16 ) window is that a more detailed parameterized description at the lower end of the spectrum is facilitated. This is bene cial when very low{resolution wavelet images are used for matching in our method (Linear Color{B).

17

as indices for each model image. For image query, rst the 63 integers for the query image are computed, thus obtaining  0d; then for every model image, eq.(22) is used to estimate

M and thereby obtain the distance. The model image minimizing the distance is taken to be a match for the query image. Note from eq.(22) that only reduced, DCT transformed, quantized images are used | no inverse transforms are necessary and the indexing process is entirely carried out in the compressed domain.

4 Experimental Results 4.1 Plastic busts Since we wish to focus on image retrieval under illuminant change we use sets of registered (i.e., aligned) images, imaged under di erent lights. Consider Fig. 2(a): this is a set of 256  256 images of 8 plastic busts taken under illumination provided by several colored 100 watt incandescent bulbs. Lighting consists of three bulbs, green from the left, blue from the right, and red from in front and below. Images were taken with a Sony 3-CCD DXC-930 color camera and a Parallax 24-bit frame grabber card attached to a Sun Sparc LX. 2 For each model image, the image rank, in Petrov's sense, is 3. This could be important in that the least{squares solution (17) uses the inverse of the correlation matrix of the model image, which is rank{3 if the image is rank{3. For multi{colored objects this is likely no restriction, but for a surface with essentially one surface color, such as the set of plastic busts (and faces 2

These images were provided by Janet Dueck [15], and we gratefully acknowledge their use.

18

are also mostly one color) rank{3 lighting could be bene cial. As an initial justi cation for viewing the tested methods as object recognition algorithms, we rst ascertained that for each of the tested algorithms, every model image was correctly retrieved from the database of model images, using the models themselves as test images. The set of new, testing images consists of three additional images of each bust, taken under three di erent lighting conditions: illuminated from the left with white light, illuminated from the right with white light, and illuminated by reddish uorescent room light. Thus there are 8 database images and 24 test images. The results of running an implementation of each of the following algorithms are shown in Table 1: the Color Angles algorithm [2], the Correlation Method [4], Linear Color{A based on eq.(17), and Linear Color{B, based on 3  21 matrices  d and eq.(22). Given are matching ranks, with rank 1 being a correct match. Table 1: Busts: Matches of 24 test images for database of 8 model images Algorithm

Rankings 1 2 3 >3 Color Angles 3 3 3 15 Correlation Method 17 4 3 0 Linear Color{A 24 0 0 0 Linear Color{B 22 0 1 1 Since all methods correctly nd the model images amongst the set of model images, they all are reasonable candidates for an object recognition method. However, evidently the Color Angles method fares worst amongst the tested methods, whereas the Correlation Method does reasonably well. However, the present Linear Color method does best, with Linear 19

Color{B providing a good approximation of the accuracy of Linear Color{A. While the recognition rate is clearly important, it is also important to have a gure{of{ merit for how well an algorithm performs on near{misses. Therefore we used the following scheme (cf. [15]): for each of the 24 test image runs, i.e., for each  0, divide the distance from a transformed model image M  by the distance k  0 ? M  0 k of the test image from the correct model image  0 multiplied by an M . Then all distances are measured with respect to a unit error distance corresponding to correct recognition. Let  be this normalized distance:  =k  0 ? M  k = k  0 ? M  0 k. Distances  less than unity to images other than  0 (i.e., to incorrect model images) represent a failure in the recognition system in that the incorrect model would be chosen since distance to it would be less than that to the correct model image. For each of the 24 tests, the most interesting gure of merit is thus the minimum of the distances  for wrong matches. If this minimum is less than unity, the algorithm has failed in at least one recognition. Let us call the (transformed) incorrect model image with the minimum distance from the test image the \next{best" case. To capture statistics for all 24 tests, then, we consider the distribution of the \next{ best"  values. For each of the 24 tests let us also consider the median of the 7 distances to incorrect match images and the distribution of the medians. Fig. 2(b) shows the mean of next{best ratios, and the median and range of distances to incorrect matches. Mean values and standard deviations of the next{best and median distributions are given in Table 2. The best performance is indicated by a clear separation between the correct and incorrect 20

Table 2: Busts: Mean and standard deviation for distance ratios Algorithm

Next-Best Mean SD Color Angles 0.82 0.14 Correlation Method 2.87 3.80 Linear Color{A 4.18 2.23 Linear Color{B 3.23 2.34

Median Mean SD 0.99 0.17 7.02 7.25 5.73 3.20 7.41 6.64

images retrieved from the database. Generally, the higher the mean and median values in Table 2 the better (i.e., the farther from confusion with the correct match, which has distance ratio of 1). If values are too close to unity, or if the range bars cross the distance = 1 line in Fig. 2(b), then the algorithm displays some faulty matching. We see that for the plastic bust images, values in Table 2 and Fig. 2(b) indicate that Linear Color{A performs best in this regard, followed by Linear Color{B. The Correlation Method has quite similar performance for this set of data (but has substantially poorer performance than the present method on the data set in x4.2). The Color Angles method appears not to work on this data set, in our implementation. This is likely due to the fact that the surfaces imaged do not have many colors and thus do not produce a distinctive \signature" vector in each color band.

4.2 Faces To further test the algorithms, again using images that are more or less registered, consider the set of 512  512 images of 15 faces in the model database of Fig. 3(a). Three images of 15 people under three di erent lighting conditions each were taken [15]. Again, it was desired to create rank{3 database images, but this was found dicult for faces with dark 21

skin. Therefore, to make up a model database, the red band of each of the three images for each person was used, and those bands were colored red, green, and blue. Thus the model database is fairly registered with the test set, yet consists of di erent images. We nd that the model data set consists of rank{3 color images. Test images consist of the original 45 images acquired. Finlayson et al. [13] found that using color eigenfaces it was possible to correctly retrieve images from this data set. Here, we again nd that each of the methods tested nds each face in the model set if the test faces are also drawn from the model set. However only the Linear Color method demonstrates perfect accuracy in nding test faces amongst the database faces. In fact, while Healey and Wang's method performs quite adequately, although not as well as the present method, the color angles method performs poorly. Results for the tested algorithms are shown in Table 3.

Table 3: Faces: Matches of 45 test images for database of 15 model images Algorithm

1 Color Angles 3 Correlation Method 33 Linear Color{A 45 Linear Color{B 45

Rankings 2 3 >3 3 3 36 6 2 4 0 0 0 0 0 0

Again, ranges of next{best case and median distance ratios are calculated; these are shown in Fig. 3(b) and also in Table 4. For these tests, the present method far surpasses the other practical algorithms tested. 22

Table 4: Faces: Mean and standard deviation for distance ratios Algorithm

Next-Best Mean SD Color Angles 0.73 0.16 Correlation Method 2.88 2.68 Linear Color{A 9.07 2.99 Linear Color{B 5.45 3.02

Median Mean SD 1.07 0.23 8.50 8.06 14.68 5.84 16.43 8.15

4.3 Textures As another test, consider the model database of ten color texture images in Fig. 4 [16]. As in [2], test images are created by modeling illumination change by means of imaging these textures through three separate colored lters. Thus there are a total of 30 test images under di erent illumination, for a model database of 10 images. Narrowband lters should favor the Color Angles method, and we nd that this is indeed the case. In Tables 5 and 6 results are shown for the four algorithms tested. Instead of graphical display, results for the range of values of incorrect{image distance ratios are shown in Table 7. Table 5: Textures: Matches of 30 test images for database of 10 model images Algorithm

1 Color Angles 30 Correlation Method 30 Linear Color{A 30 Linear Color{B 30

Rankings 2 3 >3 0 0 0 0 0 0 0 0 0 0 0 0

As can be seen, again method Linear Color{B approximates well to using whole images. For the texture dataset, the Correlation Method in fact has excellent performance, better 23

Table 6: Textures: Mean and standard deviation for distance ratios Algorithm

Next-Best Mean SD Color Angles 13.4 18.3 Correlation Method 115.2 342.1 Linear Color{A 102.7 100.1 Linear Color{B 103.3 253.8

Median Mean SD 45.6 73.4 221.6 608.0 140.7 118.2 190.5 405.0

Table 7: Textures: Range of incorrect distance ratios Algorithm

Range Min Max Color Angles 1.09 83.6 Correlation Method 1.15 1772.3 Linear Color{A 22.0 434.8 Linear Color{B 2.09 1312.2 than the present method in terms of mean next{best values, although both methods are very similar in performance. All methods successfully retrieve every texture correctly. Nonetheless, the Linear Color model does best in that the smallest value for the next{best case distance ratio is 2.09 for Linear Color{B, 1.15 for the Correlation Method, and 1.09 for the Color Angles method. That is, the present method has about half the chance of making any incorrect image retrieval in the worst case than the other studied methods.

5 Conclusion In this paper we have used a linear transform approach to illumination change within more general illumination environments than previously studied. The wavelet{DCT based algo24

rithm allows of a fast implementation for the method. Experimental results demonstrate that the present Linear Color method shows strength in resistance to change of illumination. In comparison, the Correlation Method performs quite well, although not as well as the Linear Color method, for the tests reported here. The Color Angles method seems to rely on having richness of color in each image to reliably index into a database, and because some of the images used here are rather poor in the number of distinct colors in each image, that method does not fare well. Applying a sensor{curve \sharpening" transform would likely not improve matters for the cases tested here in that the camera sensor curves are already quite narrowband. However, unlike the proposed Linear Color method, the Color Angles method is not sensitive to certain object transformations such as 2D translations and rotations. As well as underlining the importance of illumination change in object recognition methods, this report serves to indirectly verify the suitability of the factor model and of a linear transform under change of illumination. Further, our results would seem to indicate that a careful image compression scheme can be carried out without giving up the accuracy of a correct illumination change compensating transformation.

References [1] M.S. Drew. Robust specularity detection from a single multi-illuminant color image. CVGIP:Image Understanding, 59:320{327, 1994. [2] G.D. Finlayson, S.S. Chatterjee, and B.V. Funt. Color angular indexing. In ECCV96, pages II:16{27, 1996. [3] G.D. Finlayson, M.S. Drew, and B.V. Funt. Spectral sharpening: sensor transformations for improved color constancy. J. Opt. Soc. Am. A, 11(5):1553{1563, May 1994. 25

[4] G. Healey and L. Wang. The illumination{invariant recognition of color. In ICCV95, volume 12, pages 128{133, 1995. [5] D. Slater and G. Healey. Combining color and geometric information for the illumination{invariant recognition of 3{d objects. In ICCV95, pages 563{568, 1995. [6] B.V. Funt and G.D. Finlayson. Color constant color indexing. IEEE PAMI, 17:522{529, 1995. [7] L.T. Maloney. Evaluation of linear models of surface spectral re ectance with small numbers of parameters. J. Opt. Soc. Am. A, 3:1673{1683, 1986. [8] J.P.S. Parkkinen, J. Hallikainen, and T. Jaaskelainen. Characteristic spectra of Munsell colors. J. Opt. Soc. Am. A, 6:318{322, 1989. [9] M.S. Drew. Reduction of rank{reduced orientation{from{color problem with many unknown lights to two{image known-illuminant photometric stereo. In International Symposium on Computer Vision, Coral Gables, FL, Nov.21{23, pages 419{424. IEEE, 1995. [10] A. P. Petrov. On obtaining shape from color shading. Color Research and Application, 18:375{379, 1993. [11] L. L. Kontsevich, A. P. Petrov, and I. S. Vergelskaya. Reconstruction of shape from shading in color images. J. Opt. Soc. Am. A, 11:1047{1052, 1994. [12] M.S. Drew. Direct solution of orientation{from{color problem using a modi cation of Pentland's light source direction estimator. Computer Vision and Image Understanding, 64:286{299, 1996. [13] G.D. Finlayson, J. Dueck, B.V. Funt, and M.S. Drew. Color eigenfaces. In IEEE 3rd International Workshop on Image and Signal Processing, 1996. [14] C.F. Borges. Trichromatic approximation method for surface illumination. J. Opt. Soc. Am. A, 8:1319{1323, 1991. [15] Janet Dueck. Color and face recognition. Master's thesis, School of Computing Science, Simon Fraser University, 1995. [16] G. Healey and L. Wang. Illumination{invariant recognition of texture in color images. J. Opt. Soc. Am. A, 12:1877{1883, 1995.

26

(a) Next-Best Case

Busts

Median Case

Correct images fall on this line

Color Angles Correlation Method Linear Color-A Linear Color-B -1

0

1

2

Distance Ratio (distance / correct distance) Log scale 2

(b)

Figure 2: (a): Plastic busts, illuminated by colored lights (histogram{equalized for display and

printed in black and white). (b): Logarithmic scale: mean for the next{best case and median case distance ratios, where a ratio of 1 (log=0) is the distance from the test to the correct model image (balls); and range of all distance ratios (lines). The best algorithm has the rightmost values.

27

(a) Next-Best Case

Faces

Median Case

Correct images fall on this line

Color Angles Correlation Method Linear Color-A Linear Color-B -1

0

1

2

Distance Ratio (distance / correct distance) Log scale 2

(b)

Figure 3: (a): Faces, illuminated by colored lights (histogram{equalized for display and printed in black and white). (b): Logarithmic scale: mean for the next{best case and median case distance ratios, where a ratio of 1 (log=0) is the distance from the test to the correct model image (balls); and range of all distance ratios (lines).

28

Figure 4: Color textures.

29