Illumination Invariant Texture Retrieval Michal Haindl and Pavel V´acha Institute of Information Theory and Automation Academy of Sciences CR, 182 08 Prague, Czech Republic {haindl,vacha}@utia.cas.cz
Abstract Two fast illumination invariant image retrieval methods for scenes comprising textured objects with variable illumination are introduced. Both methods are based on texture gradient modelled by efficient set of random field models. We developed the illumination insensitive measures for textured images representation and compared them favorably with steerable pyramid and Gabor features in the illumination invariant BTF texture recognition.
ject with Lambertian reflectance there are no discriminative functions that are invariant to illumination, the article [3] empirically verified that the direction of the image gradient is reasonably insensitive to changes in illumination direction. Our proposed methods built on these results by introducing simple parametric measure robust to illumination changes. We present two methods that do not require neither mutual texture registration nor the knowledge of illumination direction. They can be applied for textured objects retrieval if only single illumination training textured image for each class is available.
1. Introduction
2. Texture Representation
Content-based image retrieval systems typically query image databases based on some colour and textural features. Optimal robust features should be geometrically and illumination invariant. Although image retrieval has been an active research area for many years this difficult problem is still far from being solved. Simpler methods based only on colour features achieve illumination invariance by normalizing colour bands or using colour ratio histogram. However colour based methods rarely perform sufficiently in natural visual scenes because they cannot detect similar objects in different location, illumination or backgrounds. Textures are important clues to specify objects present in a visual scene. Unfortunately the appearance of natural rough textures is highly illumination dependent. As a consequence most recent rough texture based classification or segmentation methods require multiple training images captured under a full variety of possible illumination conditions for each class. Such learning is obviously clumsy and very often even impossible if required measurements are not available. Authors [4] allow a single training image per class, but they require surfaces of uniform albedo, smooth and shallow relief, the illumination sufficiently far from the texture macro-normal and most seriously the knowledge of illumination direction for all involved (trained as well as tested) textures. Although it was demonstrated in [7],[3] that for an ob-
We use the gray-scale approximation of coloured bidirectional texture function (BTF) textures. Although we neglect spectral information, this representation is still sufficient for the retrieval application while simultaneously speeds up the proposed methods. Both our methods using either causal simultaneous autoregressive model (CAR) or Gaussian Markov Random Field (GMRF) texture representation utilize the multiscale decomposition using the Gaussian pyramid. The Gaussian pyramid is a sequence of images Y (k) in which each one is a low-pass downsampled version of its predecessor. The weighting function (FIR generating kernel) is chosen subject to the separability, normalization, symmetry and equal contribution constrains (for details see [2]). This multiscale approach allows to use both models with smaller contextual neighbourhood and consequently also smaller and more robust parameter sets. We assume single-scale factor texture gra(k)
∂Y (k)
∂Y (k)
dient Yr = [ rr1 , rr2 ]T of the scene to be locally modelled using either CAR or GMRF model, respectively. Both proposed methods use the corresponding random field model to estimate the factor gradient model parameters.
2.1. CAR Factor Model We assume texture gradient components of the single resolution factor to be locally modelled by an adaptive CAR
0-7695-2521-0/06/$20.00 (c) 2006 IEEE
model in some chosen direction: Yr = γZr + r
(1)
where γ = [A1 , . . . , Aη ] is the 2 × η unknown parameter matrix with diagonal matrices Ai and η = cardIr . We T denote the 2η × 1 data vector Zr = [Yr−i : ∀i ∈ Ir ]T with a multi-index r = (r1 , r2 ) and similarly the multiindices i, t; r1 is the row and r2 the column index, respectively. The multiindex changes according to chosen direction of movement on the image plane e.g. t − 1 = (t1 , t2 − 1), t − 2 = (t1 , t2 − 2), . . .. Ir is some contextual causal or unilateral neighbour index shift set. The white noise vector r has zero mean and constant but unknown covariance matrix Ω. We further assume uncorrelated noise vector components, i.e., E{r,1 r,2 } = 0 ∀r and the probability density of r to have the normal distribution independent of previous data and being the same for every position r. The task consists in finding the conditional parameters density p(γ | Y (t−1) ) given the known process history Y (t−1) = {Yt−1 , Yt−2 , . . . , Y1 , Zt , Zt−1 , . . . , Z1 } and taking its conditional mean as the gradient representation. Assuming normality of the white noise component t , conditional independence between pixels and the normalWishart parameter prior, we have shown ([6]) that the conditional mean value is: E[γ | Y (t−1) ] = γˆt−1 .
−1 = Vzz(t−1) Vzy(t−1) ,
Vt−1
= V˜t−1 + V0 , ˜ T Vyy(t−1) V˜zy(t−1) = , V˜zy(t−1) V˜zz(t−1) T = αV˜xw(t−2) + Xt−1 Wt−1 ,
−1 T Vzz(t−1) Zt (Yt − γˆt−1 Zt )T −1 (α2 + ZtT Vzz(t−1) Zt )
.
The optimal contextual neighbourhood Ir can be found analytically by maximizing the posterior probability [6]: ψ(t − 1) − η + 2 p(Mj |Y (t−1) ) = k Γ 2 − 12
|Vj,zz(t−1) |
ψ(t−1)−η+2 2
− λj,t−1
This matrix contains the final estimations of the BTF multiresolution gradient CAR model parameters.
2.2. GMRF Factor Model Alternatively we assume that texture gradient components of the single resolution factor are locally modelled using a Gaussian Markov random field model (GMRF). This model is obtained if the local conditional density of the MRF model is Gaussian:
,
exp − 12 (Yr − γZr )T Σ−1 (Yr − γZr ) 1
2π|Σ| 2
where Ir is non-causal and symmetrical. An optimal neighbourhood is detected using the correlation method [5] favoring neighbours locations corresponding to large correlations over those with small correlations. The GMRF model for centered vectors Yr can be expressed also in the matrix form (1) but we assume the following driving noise correlation structure (diagonal Σ):
and V0 is a positive definite matrix. We assume slowly changing parameters, consequently these equations were modified using a constant exponential ”forgetting factor” α to allow parameter adaptation. It is easy to check (see [6]) also the validity of the following recursive parameter estimator: γˆt = γˆt−1 +
(5)
The determinant |Vzz(t) | as well as λt can be evaluated recursively ([6]) too. For numerical realization of the model statistics (2)-(5) see discussion in [6]. Texture gradient for each BTF texture resolution level k is represented by the parametric matrix γˆ (k) ∀k. These parametric estimates are combined into the resulting parametric matrix: Θ = [ˆ γ (k) ∀k] . (6)
p(Yr |Ys ∀s ∈ Ir ) =
γˆt−1
V˜xw(t−1)
−1 T λt−1 = Vyy(t−1) − Vzy(t−1) Vzz(t−1) Vzy(t−1) .
(2)
The following notation is used in (2):
V˜t−1
where k is a common constant. All statistics related to a model Mj (3)-(5) are computed using the contextual neighbourhood Irj . The solution of (3) uses the following notations: ψ(t) = α2 ψ(t − 1) + 1 , (4)
(3)
σj2 E{r,l r−s,j } = −σj2 asj 0
if (s) = (0, 0) and l = j, if (s) ∈ Irj and l = j, otherwise.
and σj , asj ∀s ∈ Irj are unknown parameters. Parameter estimation of the GMRF model is complicated because either Bayesian or ML estimate requires iterative minimization of a nonlinear function. Therefore we use the pseudolikelihood estimator which is computationally simple although not efficient. The pseudo-likelihood estimate for as parameters evaluated for an image index lattice I. The pseudo-likelihood estimate for als parameters has the form #−1
" γˆrj
=
[ajs
∀s ∈
0-7695-2521-0/06/$20.00 (c) 2006 IEEE
Irj ]
=
X ∀s∈I
ZsT Zs
X ∀s∈I
ZsT Ys ,
,
where j = 1, 2, Zs = [Ys+t is again (6).
∀t ∈ Is ] . The feature vector
2.3. Gabor Features The Gabor filters [1], [10] can be considered as orientation and scale tunable edge and line (bar) detectors and statistics of Gabor filter responses in a given region are used to characterize the underlying texture information. A two dimensional Gabor function g(r) : 75.2 49.2 73.3 80.5
75o 24.4 27.4 67.2 69.0
aver. 64.9 50.2 73.9 79.8
Table 1. Classification performance comparison in [%] for the BTF test set.
method P (correct) rr88 rr100
Gabor 0.71 0.70 0.72
Steerable 0.77 0.75 0.77
CAR 0.81 0.80 0.82
GMRF 0.85 0.84 0.85
robust for extreme illumination angle differences than the alternatives. The proposed methods are fast and numerically robust so it can be used in an on-line image retrieval system. However further work is still needed to optimize random models used, incorporate spectral information and to test the performance on noisy texture data.
Acknowledgements This research was supported by the EC project no. FP6507752 MUSCLE, grants No.A2075302, 1ET400750407 of the Grant Agency of the Academy of Sciences CR and parˇ tially by the MSMT grant 1M0572 DAR.
References
Table 2. Estimated probability of correct classification and recall rate (rrn ) for n textures retrieved.
texture set. This table proves both presented methods to be more robust to illumination changes than Gabor or steerable pyramid based features. Tab.2 demonstrates estimated performance from 105 test sets with randomly selected per class trainee illumination angle and retrieval recall rate averaged over all possible textures. The average improvement over both alternatives is between 4 − 14%. The presented algorithms properly found most required BTF textures illuminated from different angles than the single per class training texture while being approximately two times faster than the fastest alternative. The GMRF model performs slightly better on our data than the CAR model.
5. Conclusions We proposed two novel fast and accurate illumination invariant textured image retrieval methods based on the texture gradient modelling. The gradient is modelled using either an adaptive simultaneous regression model or the GMRF model. Both models use spatial correlation from neighbouring data. A possible parallel implementation of both algorithms is straightforward, e.g. in the CAR model every image row and column can be processed independently by its dedicated processor. The preliminary test results of the presented algorithms are encouraging, the proposed methods were mostly able to find the correct texture class irrespectively of its illumination for all our experimental BTF textures and they outperformed the alternative methods. Additional advantages are that they do not require mutual registration of textures which is serious problem itself, they are faster and more
[1] A. Bovik. Analysis of multichannel narrow-band filters for image texture segmentation. IEEE Trans. on Signal Processing, 39(9):2025–2043, 1991. [2] P. Burt and E. Adelson. A multiresolution spline with application to image mosaics. ACM Trans. Graphics, 2(4):217– 236, 1983. [3] H. Chen, P. Belhumeur, and D. Jacobs. In search of illumination invariants. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2000, volume 1, pages 254 – 261. IEEE, IEEE, June 2000. [4] O. Drbohlav and M. Chantler. Illumination-invariant texture classification using single training images. In Texture 2005., pages 31–36, October 2005. Heriot-Watt University. [5] M. Haindl and V. Havl´ıcˇ ek. Prototype Implementation of the ´ Texture Analysis Objects. Technical Report 1939, UTIA AV ˇ Praha, 1997. CR, ˇ [6] M. Haindl and S. Simberov´ a. Theory & Applications of Image Analysis, chapter A Multispectral Image Line Reconstruction Method, pages 306–315. World Scientific Publishing Co., Singapore, 1992. [7] D. Jacobs, P. Belhumeur, and R. Basri. Comparing images under variable illumination. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 2000, volume 1, pages 610 – 617. IEEE, June 1998. [8] B. S. Manjunath and W. Y. Ma. Texture features for browsing and retrieval of image data. IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(8):837–842, 1996. [9] J. Meseth, G. M¨uller, and R. Klein. Preserving realism in real-time rendering. In OpenGL Symposium, pages 89–96. Eurographics Association, Switzerland, April 2003. [10] T. Randen and J. H. Husoy. Filtering for texture classification: A comparative study. IEEE Trans. on Pattern Analysis and Machine Intelligence, 21(4):291–310, April 1999. [11] E. Simoncelli and J. Portilla. Texture characterization via joint statistics of wavelet coefficient magnitudes. In Fifth IEEE Int’l Conf on Image Proc, volume I, Chicago, 4-7 1998. IEEE Computer Society.
0-7695-2521-0/06/$20.00 (c) 2006 IEEE