Pattern Recognition 34 (2001) 1189}1197
Color indexing using chromatic invariant Ji Yeun Kim *, Chang Yeong Kim , Yang Seck Seo , In So Kweon Signal Processing Laboratory, Samsung Advanced Institute of Technology, P.O. Box 111, Suwon 440!600, South Korea Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 207-43 Cheongryangri-Dong, Dongdaemoon!Gu, Seoul, South Korea Received 16 December 1998; received in revised form 6 May 1999; accepted 13 April 2000
Abstract We present an e$cient indexing/matching algorithm that is independent of the changes in the illuminant color and the geometric conditions for 3-D object with multiple colors. The color contents of an object can be represented by the peak coordinates in the chromaticity histogram space corresponding to the distinct colors in an image. The visible color areas and their relative sizes of the histograms may change with viewing conditions, but the coordinates of local maxima remain stable. However, a change in illumination color results in a deformation of the chromaticity distribution so as to degrade the performance of color recognition. In order to discount lighting change, we de"ne a chromatic invariant that normalizes the chromaticities of the histogram peaks by the norm of each channel. Therefore, the normalized coordinates of the peaks are stable to the changes in illumination color, scaling, rotation, partial occlusion, viewing direction, and deformation. Test results on a database of diverse images show that the chromatic invariant yields excellent recognition rate even when the illuminant color and geometric conditions vary substantially. 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Color; Chromatic invariant; Indexing; Matching; Peak normalization; Recognition
1. Introduction Three-dimensional object indexing/matching is one of the fundamental problems in computer vision and image retrieval system. Most of the traditional recognition algorithms use geometric features from the single-channel intensity images such as lines, curves and vertices. Recently, it has been reported that color also plays an important role in computer vision as a feature for recognition [1}8]. Swain and Ballad [1] proposed the method, called color indexing, that identi"es objects using color histogram intersection. The color axes used for the histogram were the three opponent colors derived from R(red), G(green) and B(blue): RG"R!G, B>"2;B!R!G and =B"R#G#B. Each
* Corresponding author. Tel.: #82-331-280-8163; fax: #82331-280-9207. E-mail address:
[email protected] (J.Y. Kim).
color channel is split into 16 intervals giving 16;16;16"4096 bins. Color indexing is stable in that variations such as a change in orientation, a shift in viewing position, partial occlusion, a change in the background or a change in shape. Although it turned out to be useful for recognition without geometric information, the performance of the algorithm degrades signi"cantly when the color and the intensity of the illuminant are changed. Funt and Finlayson [2] developed an algorithm, called color constant color indexing (CCCI), which matches histograms of color ratio of the RGB tuples to accumulate the length of the color boundary between regions. Under the assumption of the coe$cient model of sensor responses, color ratios of RGB values of adjacent pixels are relatively stable to illumination changes. Although the performance of this method is far better than color indexing in the presence of the illumination change, there are some signi"cant problems. At low-intensity regions of the image, ratios are sensitive to the noise. In homogeneous regions, pixel ratios are nearly 1, and little information about surface color is preserved because it
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 0 6 6 - 2
1190
J.Y. Kim et al. / Pattern Recognition 34 (2001) 1189}1197
uses color information only where surface color varies. Healey et al. [3] proposed the global color constancy algorithm using the a$ne property of the color distribution change due to illumination color. Although they showed that high-order distribution moments are illuminant invariant, this high-order information is not very stable in the presence of noise. This approach produced more accurate classi"cation results than those of color indexing or CCCI on a database of nine objects. However, they did not consider the geometric e!ects for 3-D object and color distribution change due to illumination pose. Matas et al. [4] presented an e$cient representation for objects with multiple colors- the color adjacency graph (CAG)- and showed excellent results for non-rigid three-dimensional objects under varied viewing conditions. They used the standard chromaticity coordinate, de"ned as the pair r"R/(R#G#B), g"G/(R#G#B). In this approach, the input images are assumed to be processed by a color constancy algorithm in order to remove the e!ect of varying illuminant color. Despite signi"cant progress in color constancy, [9}16] even for fairly simple worlds, the general color constancy algorithm is yet unsolved. In order to use the colors recorded in an image as the recognition cue for 3-D objects, the measured colors must be stable to di!erent illuminant color and lighting geometry. The color distribution of 3-D object with varying surface normals depends on not only the product of spectral re#ectance and the spectral power distribution of illuminant but also the geometric properties related to the pose of the object, the illuminants and the viewing direction. Therefore, changes in illuminant color alter the color distribution of the object. The problem of color constancy for discounting e!ect of the illuminant color has been a major topic of many researches in psychology and computer vision. In this paper, we propose the chromatic invariant that is independent of the illuminant color, shading, scaling, viewing direction, partial occlusion, and deformation. We transform the RGB colors into 2D perspective chromaticity space that is commonly used in color vision as r"R/G, b"B/G.
(1)
In this perspective chromaticity space (r, b) the geometric e!ect of 3-D object due to shading disappears and the vector (r, b, 1) has the same direction as (R, G, B) but a di!erent length. However, the standard chromaticity space (r, g) cannot remove dependency on both lighting geometry and illuminant color. The histogram in the perspective chromaticity space forms clusters according to the number of distinct colors in the image. The coordinates of the local maxima (peaks) of the clusters are independent of the scaling and viewpoints. Since the perspective chromaticity can discount shading e!ect, the transformation from one illuminant to another is linear even for 3-D object. The normalization of the histogram
peaks by the norm of each chromaticity channel is an e!ective method to discount the e!ect of the illuminant color change under the narrow band assumption. We have tested our algorithm by simulation and diverse real images under various illumination color and viewing conditions.
2. Representing color images under illumination change In this section, we discuss a transformation relating two images di!ering by an illumination change. The dichromatic model [17}20] is generally accepted for the good approximation of the physical re#ection phenomena for inhomogeneous dielectric materials. The main hypothesis of the dichromatic model is that the total radiance from the surface of an object consists of two independent components; the light re#ected at the interface of air (specular component) and the light from sub-surface re#ection (di!use component). Specular component almost always covers only a fractional part of a 3-D object with varying surface normal. And very often the high intensity of specular pixels saturates the sensor making color analysis meaningless unless the illumination can be controlled in real camera. In this paper, we neglect the specular component for dielectric materials. The sensor response of di!use component is the product of the spectral power distribution of illuminant with the spectral re#ectance of the surface. At each location (x, y) of a 3-D object, sensor response of a color imaging system is obtained by spectral integration process as follows:
E()S(x, y, )Q () d, 1)k)n, (2) I H where (x, y) is the sensor response of the kth channel, I E() is the spectral power distribution of the illumination that is constant across the scene, S(x, y, ) is the surface re#ectance function at location (x, y), Q () is the spectral I response of the kth sensor, and is the wavelength in the range of visible light. The geometric scaling factor m(x, y) is determined by the dot-product term e(x, y) ) n(x, y). Here, n(x, y) is the unit vector corresponding to the surface normal at (x, y) and e(x, y) is the illumination direction vector. Surface re#ectance function S(x, y, ) can be approximated by "nite-dimensional linear model [9}11] as (x, y)"m(x, y) I
L S(x, y, ) (x, y)S (), (3) G G G where the S () is a basis function and (x, y) is an G G n- component column vector of weighting coe$cients. The basic idea of this model is to describe the surface re#ectance and illuminant through a weighted sum of a "xed set of basis functions. This assumption has been used in color vision and much of work is concentrated
J.Y. Kim et al. / Pattern Recognition 34 (2001) 1189}1197
upon the number of basis functions necessary to reproduce accurately the re#ectance of materials. Maloney showed that approximately 99% of the variance in a set of wide variety of re#ectance functions could be captured by using three basis functions [11,12]. Let (x, y)" [ (x, y), (x, y),2, (x, y)]2 be the column vector of L sensor responses and let (x, y)"[ (x, y), (x, y), 2, L (x, y)]2 be the column vector of coe$cients. Using Eq. (3) it follows that Eq. (2) can be rewritten as (x, y)"m(x, y)A(x, y),
(4)
where A is the n;n lighting matrix with elements
A " IG
E()S ()Q () d. (5) G I H Consider two color images taken from an object under di!erent illumination conditions (e.g., color, intensity, and direction). The "rst image is obtained using illumination E() and the second image is obtained using illumination EI () under di!erent illumination pose. Then the two images are described by (x, y)"m(x, y)A(x, y), (x, y)"m (x, y)A (x, y).
(6)
If the matrices A and A corresponding to E() and EI () are nonsingular, a change in illumination color and pose results in the following transformation of sensor responses:
and T"A A\(n;n matrix).
(7)
Since we use RGB color camera system, the sensor responses are given by "[R, G, B]2 and the number of basis functions in the linear model equal to the number of class of sensor.
3. Color invariant algorithm 3.1. Photometric invariant If the spectral response of the RGB sensor Q () is I narrow enough to be approximated as sampling a single frequency, Q ()"q ( )(! ), then the sensor reI I I I sponses can be written as (x, y)+m(x, y)S(x, y, )E( )q ( ), k"R, G, B, (8) I I I I I where q ( ) is the channel sensitivity and is the peak I I I of the spectral sensitivity of the ith channel. This assumption is not true in general, however it has been proved reasonable for color constancy algorithm [2]. The relationship between the sensor responses (x, y) and (x, y) can be modeled by diagonal matrix transformation as (x, y)"g(x, y)D(x, y),
where the diagonal elements of matrix D are given by d "EI ( )/E( ), d "EI ( )/E( ), d "EI ( )/E( ). 0 0 0 % % % (10) Diagonal transformation has been extensively investigated by many researchers [16] and used successfully for color constancy algorithm. If the object has 2-D planar surface, the geometric factor is constant over the surface because of the "xed surface normal vectors, therefore, a linear relationship between (x, y) and (x, y) holds. In the case of 3-D object with varying surface normals, Eqs. (7) and (9) show that the linear relationship does not hold because of the di!erent geometric factors due to shading e!ect. Hence, we transform the sensor responses into 2-D perspective chromaticity space in order to eliminate the geometric factor and discount the illumination color. In the case of homogeneous region of 3-D object, since surface re#ectance function S(x, y, ) is not dependent on I the location, it can be written as S( ). When the sensor I response under canonical illuminant is represented by (x, y)"[R(x, y), G(x, y), B(x, y)]2, the chromaticity values are de"ned as
R(x, y) m(x, y)S( )E( )q ( ) 0 0 0 0 " G(x, y) m(x, y)S( )E( )q ( ) % % % %
r"
S( )E( )q ( ) 0 0 0 , " 0 S( )E( )q ( ) % % % %
m (x, y) (x, y)"g(x, y)T(x, y) where g(x, y)" m(x, y)
(9)
1191
b"
(11)
B(x, y) m(x, y)S( )E( )q ( ) " G(x, y) m(x, y)S( )E( )q ( ) % % % %
S( )E( )q ( ) " . S( )E( )q ( ) % % % %
(12)
It is clear that the geometric factor disappears and that chromaticity is independent of photometric angle at each pixel (x, y). And chromaticity space has the property of scale invariance that change in illuminant intensity does not vary the chromaticity value. Let us consider two patches P1 and P2 both having the same color, but being subjected to di!erent illuminant intensities (E(), sE()). The sensor responses for the "rst and second patches are [R, G, B] , [sR, sG, sB], respectively. The scale s is a constant due to the intensity change. The chromaticity values of the two patches are calculated as R sR r " " "r , G sG
B sB b " " "b . G sG
(13)
This is equivalent to stating that if the spectral power distribution of illuminant is not changed, then the chromaticity values are independent of illuminant intensity. Combining Eqs. (9), (11) and (12), the relationship between the chromaticities under di!erent illumination
1192
J.Y. Kim et al. / Pattern Recognition 34 (2001) 1189}1197
condition are given by linear transformation as
RI (x, y) g(x, y)d R(x, y) d R(x, y) 0 " " 0 GI (x, y) g(x, y)d G(x, y) d G(x, y) % % d " 0 r, (14) d % g(x, y)d R(x, y) BI (x, y) d B(x, y) " " bI " g(x, y)d G(x, y) GI (x, y) d G(x, y) % % d " b, (15) d % where (r , bI ) is the chromaticity value under di!erent illumination condition. Eqs. (14) and (15) show that, as the illumination condition changes, the geometric factor g(x, y) due to surface normals disappears and the chromaticity values in each channel are scaled by the ratio of diagonal elements. Therefore, the linear relationship between (r , bI ) and (r, b) holds regardless of surface normal variations. r "
3.2. Color histogram and peak detection A common representation of color distribution is a three-dimensional RGB histogram. However, the anal-
ysis of color clusters for 3-D object in the RGB color space is complex because of the shading e!ect as explained in Section 3.1. Figs. 1(b) and (c) show the clusters of the color object in the RGB color space and chromaticity space, respectively. Clusters in the threedimensional space form linear shape and may change their shape depending on the illumination color and pose. And formulation of color invariant in RGB space may not be easily achieved because of the nonlinear relationship due to the geometric factor. It is clear from Eqs. (11)}(13) that the chromaticity values are independent of geometric factor and illuminant intensity. Theoretically, if there is no noise in the camera system, each cluster in this space forms at a single point. Even though di!erences in lighting color produce a deformation of chromaticity distribution, the color constant descriptor can be easily achieved by normalization process. The histogram in the chromaticity space forms clusters corresponding to the distinct colors in the image and the peak locations of the clusters are not a!ected by changes of scaling and viewing direction, unlike histogram bin counts. Various authors proposed for the separation of unimodal cluster in computer vision community [21,22]. We utilize the graph theoretical clustering (GTC) method [22] to "nd the locations of local maxima in the chromaticity histogram space. In "rst step, we compute
Fig. 1. Color distributions: (a) four-colored object; (b) RGB color distribution; (c) chromaticity distribution; (d) detected peaks in the chromaticity space.
J.Y. Kim et al. / Pattern Recognition 34 (2001) 1189}1197
the chromaticity histogram of an image. As the dimensionality becomes larger, the discrimination power improves, but the time complexity increases and the noise e!ects may become more pronounced. Each chromaticity coordinate is divided into 40 sections, for a total of 1600 bins. Next, we can obtain the coordinates of the local maxima in each cluster using GTC method. Fig. 1(d) shows the detected peaks of the clusters. We set the threshold for minimal intensity in order to exclude the dark regions that do not contain useful color information (R, G, B(30). Real cameras have only a limited dynamic range to sense the brightness of the incoming light, so we discard the saturated pixel values over the clipping threshold (R, G, B'240). 3.3. Peaks normalization and illuminant invariant Even though the locations of the maximal peaks of an image are independent of the geometric conditions, they are still a!ected by the changing illuminant color as explained in Eqs. (14) and (15). We propose color constant descriptor that is not only independent of the geometric changes but also discounting the e!ect of the illuminant color. Let the chromaticity coordinates of the maximal peaks under canonical illuminant be r "[r , r , r ,2, r ]R, b "[b , b , b ,2, b ]R and N L N L under another illuminant be r "[r , r , r ,2, r ]R, N L bI "[bI , bI , bI ,2, bI ]R, where n is the number of the N L peaks corresponding the distinct color regions. From Eqs. (14) and (15), illumination change can be approximately described as a scalar multiplication by a di!erent value for each coordinate therefore the corresponding vectors r and r , b and bI are in the same direction but N N N N of di!erent length, respectively. We de"ne the chromatic invariants that normalize the chromaticities of the peaks by the norm of each channel as follows: r (d /d )r 0 % G " G " G r ((d /d ) L r) N 0 % H H r r G " " G " , (16) G ( L r) r H N H bI (d /d )b % G
I " G " G bI ((d /d ) L b) N % H H b b G " " G " . (17) G ( L b) b N H H Because the normalized peaks under two di!erent illuminants are independent of the shading, illuminant conditions (color, intensity), viewing direction, partial occlusion and scaling, we use the chromatic invariants as feature for color object recognition. Normalization process in the standard chromaticity coordinate (r g) can not discount the illumination components (d , d , d ), 0 % since the relationship between the chromaticities under
1193
di!erent illumination colors are given by RI (x, y) RI (x, y)#GI (x, y)#BI (x, y) d R(x, y) 0 " , d R(x, y)#d G(x, y)#d B(x, y) 0 %
(18)
GI (x, y) RI (x, y)#GI (x, y)#BI (x, y) d G(x, y) % " . d R(x, y)#d G(x, y)#d B(x, y) 0 %
(19)
Also if we normalize the color values in the RGB space instead of the perspective chromaticity space, then the geometric factors at each location (x, y) can not be canceled by normalization process. 3.4. Peak matching We adopt the matching algorithm proposed in Ref. [8], however do not use the cluster's relative pixel frequency because the population of color clusters in the histogram space is dependent on the viewpoint, partial occlusion and deformation of the object. Assume that the image M in the database has m peaks and image Q in query has n peaks and it is not necessary to have the same number of the peaks. We need to compute the permutation function P for Q , which maps every cluster i of Q to the closest cluster P(i) of image M. The procedure is outlined as follows: (1) Compute the permutation function. (a) Calculate the distance matrix for the chromatic invariants of the image M and Q. ="[w] , i"1,2, m, j"1,2, n, GH where w "(( ! )#( ! I ). GH G H G H
(20)
(b) Search for the minimum element w in matrix W. VW (c) The permutation function P(x)"y. (d) Delete the row x and column y (do not change the index number of the matrix W). (e) If W is empty then stop else go to step b). (2) Calculate the similarity for matching
KL D (M, Q)" (( ! )#( ! I )). P@ O .O O .O O
(21)
The time complexity of the matching process is given by O(mn) and the actual matching time taken is not high since the number of distinct color region is usually small.
1194
J.Y. Kim et al. / Pattern Recognition 34 (2001) 1189}1197
4. Experimental results 4.1. Simulation on synthetic images To estimate the invariant characteristic for change in illumination color, we began with the 2-D synthetic images using various measured spectra. Because these images do not contain noise from the camera system, we can evaluate the performance of this algorithm without the confounding factors such as specular component. We used 18 surface re#ectances that included in the Macbeth color checker [23,24] except for the neutral colors. The color checker has 24 colored squares on it. The patches are made of a matte paint applied to smooth paperand 12 di!erent illuminants were used; these are 4 phases daylight (D50, D65, D75 and D100), the CIE standard illuminant A, two kinds of the halogen lamp, and CMC color "lters (C110, C136, C138, C202, C203) multiplied by spectral power distribution of D50. The sensor response function of SONY XC-711 CCD camera was also measured by narrow band "lters with 10 nm bandwidth. The responses of sensor were calculated by using Eq. (2) with "xed geometric factor. For each illuminant, 18 colors were generated. We obtained 216 di!erent colors to calculate the cluster convergence characteristics by mean cluster variance (MCV). A cluster variance (CV) can be obtained by the following formula at each cluster, and the MCV is the average of these cluster variances. , C