Learning from Negative Example in Relevance Feedback for Content-Based Image Retrieval(*) M. L. Kherfi1, D. Ziou1 and A. Bernardi2 CoRIMedia, Faculté des sciences, Université de Sherbrooke, Sherbrooke, Qc, Canada J1K 2R1. email: {kherfi, ziou}@dmi.usherb.ca 2 Laboratoires universitaires Bell, 2020, rue Université, 25e étage, Montréal, Qc, Canada H3A 2A5. email:
[email protected] 1
Abstract In this paper, we address some issues related to the combination of positive and negative examples to perform more efficient image retrieval. We analyze the relevance of negative example and how it can be interpreted. Then we propose a new relevance feedback model that integrates both positive and negative examples. First, a query is formulated using positive example, then negative example is used to refine the system’s response. Mathematically, relevance feedback is formulated as an optimization of intra and inter variances of positive and negative examples.
1. Introduction In content-based image retrieval, relevance feedback (RF) is a powerful tool that allows users to express their needs. Rather than asking the users to specify a weighted combination of features such as color, shape and texture, it is easier for them to specify these features implicitly via example images. User can choose images that will participate in the query and weight them according to their resemblance to the sought images. The results of the query can then be refined repeatedly by specifying more relevant images. The goal of RF is to learn from user interaction to define the best similarity measures[12], to identify the target images and to select the most representative features. Although many studies have focused on how to learn from user interaction [6,8,9,12], very few have evoked the relevance of negative example. We think that negative example can be highly useful for query refinement since it allows to determine features not desired by the user in order to discard images enhancing these features. In [5], the authors show that when they use (*)
The completion of this research was made possible thanks to NSERC and Bell Canada’s support through its Bell University Laboratories R & D program
only positive feedback, the major improvement occurs at the first feedback step; whereas the improvement with positive and negative feedback is noticeable for the first four steps where the results continuously get better. In this paper we give interpretations of negative example and show how it can be combined with positive example to achieve search refinement. The paper is organized as follows: in Section 2, we examine some existing RF models. Two main complementary goals motivate our work. The first one, developed in Section 3, concerns how to learn from the contrast between positive and negative examples in order to identify important and unimportant features. The second goal, developed in Section 4, is to apply the results of Section 3 to define a new RF model that supports both positive and negative feedback. Implementation details and experimental results are given in Section 5.
2. Related work Relevance feedback using positive example has been considered by many authors [9,11,12]. We will limit ourselves to the two following models which attempt to guess the ideal point the user is looking for and to identify the most important features. In [9], Yoshiharu et al. minimize the ellipsoid distance function: N r r r r D = ∑ π n ( xn − q ) T W ( xn − q )
(1)
n =1
where xrn is the feature vector of the nth image, N is the number of example images, π n is the relevance degree that r the user assigns to the nth image, q is the ideal query, and W is a computed dispersion matrix. In [11], Rui et al. decompose each image into a set of feature vectors. They
propose to minimize the global dispersion of example images:
N
I
n =1
i =1
r r r r J = ∑π n ∑ ui ( xni − qi )T Wi ( xni − qi )
(2)
r where I is the number of features in each image, xni is the r ith feature vector of the nth example image, and qi is the ith feature of the ideal query. To each feature i, a scalar weight ui and a matrix Wi are automatically associated in order to give more weight to concentrated features. We show in Section 3 that this model doesn’t support negative example. Few authors have considered negative example. We can distinguish two families of models. In the first family, negative example images are chosen automatically from the database. In [5], some of the non-relevant images to the previous query are considered as negative example. In [7], authors consider that the image database could be partitioned into two classes: relevant images which constitute positive example, and non-relevant images that constitute negative example. In the second family, the user is asked to specify negative example images. In [6], authors use a Bayesian model where they penalize the classes under which negative example scores well. Picard et al.[8] choose image sets which most efficiently describe positive example under the condition that they don' t describe negative example well. Belkin et al. [4] propose to enhance all features of positive example as well as features of negative example that don' t appear in positive example. In the next section, we will show that this approach can lead to an ambiguity and mislead the retrieval process.
3. How to consider negative example In knowledge discovery in databases, most learningfrom-examples algorithms use positive example to perform generalization and negative example to perform specialization [2]. These algorithms try to extract decision rules, including characteristic rules and discrimination rules. A characteristic rule is an assertion which characterizes a concept satisfied by all or most of a target (retrieved objects). A discrimination rule is an assertion which discriminates a concept of the target from the rest of the database [2]. This can be applied to image retrieval in the manner described hereafter. Characteristic rules can be extracted from positive example images by the identification of all relevant features. More importance must be given to such features in the retrieval process and images enhancing them should be retrieved. On the other hand, discrimination rules can be extracted from the contrast between positive and negative examples. Important features that are not common to the two classes are good discriminators, and hence must be given more importance; however, common features (those present in both positive and negative examples) are not good discriminators, and must be penalized.
Applying this principle as it is can lead to an ambiguity: relevant features of both positive and negative example are simultaneously important (since they appear in positive example) and unimportant (since they are bad discriminators). To illustrate, consider that the user gives an image of a blue car as positive example and an image of a red car as negative example. Then, according to the above principle, the feature "Blue" is important as it is a good discriminator. The feature "Red" is also a good discriminator but on the opposite side, i.e. images enhancing it should be discarded. How about the feature Shape (Car)? It is important since it is present in positive example, but also not important since it is a bad discriminator. To overcome these ambiguities, one may consider compromising by assuming average importance to such a feature; however, this is not a good solution since it preserves the ambiguity. In our example, the user may be interested in cars; by assigning less importance to this feature, images of objects other than cars might be retrieved. We hence propose to perform a sequential refinement in two steps: - In the first step, the user can choose only positive example. This allows the system to determine all characteristic features that every image must possess to be returned as an answer at this step. - In the second step, refinement is performed only on the images returned in the first step (rather than all the database images). Here, the user can specify negative example images, positive example images having been already specified in the first step. Restricting the set of images ensures that we don’t neglect important features of positive example since they have been considered in the first step. The second step applied to the resulting image set (which can be seen as a more homogeneous data set than the entire database) serves to refine the query by determining three kinds of features: - Desired features (which appear only in positive example). These must be enhanced and images highlighting them should be favored. - Undesired features (which appear in negative but not in positive example). These must also be enhanced but images highlighting them should be discarded. - Common features (which appear in both positive and negative examples). These are bad discriminators and must be neglected by giving them little weight. If we consider the above example, the user can give an image of a blue car in the first step; hence, there is a high probability that all returned images at this step will contain cars. In the second step, the query is refined on the basis of color by giving the red car as negative example (the blue car is retained as positive example). Here, the discriminator features, i.e., "Red" and "Blue", are enhanced, while the common feature "Car" is neglected. This can be done without the risk to inhibit the user’s need to retrieve a car , since this has already been considered in the first step.
4. Our model As explained in the previous section, in the first step, only positive example images can participate in the query. Relevance feedback in this step can be performed by minimizing the global dispersion J of expression (2), as in [11]. In the second step, both positive and negative example images participate in the refinement. We show in this section that minimizing the global dispersion of images (including positive and negative examples) will not lead to the desired results, after which we propose a new model that supports both positive and negative refinement. When we have both positive and negative examples, the global dispersion of Equation (2) can be written as:
[
] [
I 2 Nk r r T r r J = ∑ ui ∑∑π nk ( xnik − qi ) Wi ( xnik − qi ) i =1
k =1 n=1
]
(3)
where k=1 for positive example and k=2 for negative example. In [10], Rui et al. propose to allocate negative relevance degrees to negative example images and to compute the parameters ui, Wi, and qi which minimize the global dispersion. Let us analyze the consequences of such an approach. If we separate the positive example from the negative example in Equation (3), and use negative values for the relevance degrees of negative example images, then we can write the global dispersion as:
[
] [
]
[
] [
]
N1 N2 I I r r T r r r r T r r J = ∑ui ∑π n1 (xni1 − qi ) Wi (xni1 − qi ) − ∑ui ∑| π n2 | (xni2 − qi ) Wi (xni2 − qi ) i=1
n=1
i=1
n=1
where | π n2 | is the absolute value of π . The last equation shows that the global dispersion is nothing but the dispersion of positive example minus the dispersion of negative example. Hence, by minimizing the global dispersion, even if the model of [10] moves the global query average q towards positive example and away from negative example, two problems emerge. First, minimizing the global dispersion will lead to minimize the dispersion of positive example but with respect to the global query average q rather than the positive example average x 1 . This will not give an optimal minimization of the positive example dispersion; and hence, the relevant features of positive example will not be given enough importance. Second -and this is the big problemminimizing the global dispersion will lead to maximize the dispersion of negative example. This implies that they neglect the relevant features of negative example. Hence, their retrieval system will not be able to discard the undesired images. See Figure 1 for an illustration. In order to give our formulation of the problem, let us introduce, for each feature, the average of positive r r example xi1 , and the average of negative example xi2 , in 2 n
the global dispersion J of Equation (3). We can write it as: I 2 N r r T r r r r r r J = ∑ui ∑∑π nk [( xnik − xik ) + ( xik − qi )] Wi [( xnik − xik ) + ( xik − qi )] k
i =1
or also
k =1 n=1
I r r 2 Nk r r r r I 2 Nk r r J = ∑ui ∑∑πnk (xnik − xik )T Wi (xnik − xik ) + ∑ui ∑∑πnk (xnik − xik )T Wi (xik − qi ) i=1 k=1 n=1 i=1 k=1 n=1
I r r r r r r 2 Nk r r I 2 Nk + ∑ui ∑∑π nk ( xik − qi )Wi ( xnik − xik )T + ∑ui ∑∑π nk ( xik − qi )T Wi (xik − qi ) i=1 k=1 n=1 i=1 k=1 n=1
Nk Nk r Now, using the fact that xik = (∑ π nk xrnik ) / ∑ π nk , we can n =1
n =1
easily show that the second and the third terms of J in the last equation are zero. Thus 2 Nk I r r rk rk J = ui π nk ( x ni − xik )T Wi ( x ni − xik ) i =1 k =1 n =1
∑ ∑∑
r r I 2 Nk r r + ∑ ∑ ∑ π nk ( x ik − q i ) T W i ( x ik − q i ) = A + R i =1 k =1 n =1
The term A expresses the positive example internal dispersion, i.e., how positive example images are close to each other, added to the negative example internal dispersion, i.e., how negative example images are close to each other. The term R expresses the discrimination between the two sets, i.e., how positive example is far from negative example. According to our objectives detailed in the previous section, we want to: 1) minimize the dispersion of positive example; 2) minimize the dispersion of negative example; and 3) maximize the distinction between them. See Figure 2. To do so, we define a new intra dispersion Ac and a new inter dispersion I 2 N r r r r Rc as follows: A = α u π k (x k − x k )T W (x k − x k ) c
∑ i =1
Rc = negative example
x2
∑∑ k
i
i
k =1 n =1
I
∑ α u ∑ π~ i =1
i
positive example
q
n
2
x1
Figure 1. The effect of Rui’s model on the dispersions.
i
k =1
k
ni
i
i
ni
i
r r r r ( x ik − q i ) T W i ( x ik − q i )
negative example
x2
positive example
q
x1
Figure 2. The effect of our model on the dispersions.
Where π~1 is the sum of weights for positive example and π~ 2 is the sum of weights for negative example. Parameters ui and Wi depend on each feature’s dispersion. The parameter αi ensures that discrimination features will receive more importance than common features. To meet our goals, we must: minimize the intra dispersion Ac ; and maximize inter dispersion Rc. These two conditions are equivalent to the following: Maximizing the inter dispersion, under the constraint that the sum of inter and intra dispersion is constant. This will automatically minimize the intra dispersion. Consequently, parameters computation can be formulated as a constraint optimization problem: Maximize Rc under the constraint Ac+Rc=1. Two other normalization constraints are
I
1
∑u
introduced:
i =1
= 1 and det(Wi ) = 1 ∀i . By using
i
Lagrange multipliers, we have to maximize: I I 1 L = Rc − λ ′( Ac + Rc − 1) − λ (∑ − 1) − ∑ λi (det(Wi ) − 1) i =1 ui i =1 where we consider that π~1 = π~ 2 = 1 . The optimal solutions are the following: 2 Nk I r r r r f j where fi = −αi ∑∑π nk ( xnik − xik )T Wi ( xnik − xik ) , ui = ∑ j =1
fi
k =1 n=1
1 hi
−1 Wi = (det(CovAi′)) CovAi′ , where hi is the dimension of the ith feature, and the (r,s)th element of the intra covariance matrix CovAi′ is
Cov Ai′rs =
∑ ∑ 2
Nk
k =1
n =1
π nk ( x nik − xik )( x nik − xik ) r
∑ ∑
r
2
Nk
k =1
n =1
s
s
π nk
The parameter αi is used to identify discrimination features from common features. It must enable us to allocate more weight to features for which positive example is clearly distinct from negative example, and less weight to features for which they are confused. A good estimate for this parameter can be given by a distinction measure between the average of positive example and the average of negative example. The closer these averages, the smaller αi must be. For example, if our features are color histograms, αi can be computed using the χ2 distance measure for histograms used in [3]: r r α i = χ 2 ( xi1 , xi2 ) =
∑ s
r r ( xi1 ( s ) − xi2 ( s )) 2 r1 r xi ( s ) + xi2 ( s )
5. Implementation and experiments Tests were performed on 10000 images from The Pennsylvania State University [1]. This database includes several families of images taken under different illumination conditions. For each image, we partition the HSI color space into 33 = 27 subspaces. To each subspace, corresponds a binned color histogram as a feature. We performed many tests for retrieval and refinement. Even when positive and negative examples are not readily distinguishable, our system succeeded in identifying discrimination features and sorting resulting images according to these features. Figure 3.1 and 3.2 give an
Figure 3.1 first step results
Figure 3.2 Second step results
example of a 2-step query. In the first step, an image of a green tree under the blue sky is given as positive example. The first nine resulting images are given in Figure 3.1 where we notice the presence of two images of a brown bird on a green tree under the blue sky. In refinement step, the same image (tree under the sky) is conserved as positive example, and an image of a bird on a tree under the sky is chosen as negative example. Results (Figure 3.2) show that images of birds are discarded and that all retrieved images contain only trees under the sky.
Conclusions In this paper, we studied the relevance of negative example in content based image retrieval. We gave details and justifications on how it can be combined with positive example to identify important features to be used in retrieval process. This leads us to propose a new model for positive and negative feedback that we tested on a diversified database.
References [1] J. Z. Wang, J Li, G. Wiederhold, SIMPLIcity: Semanticssensitive Integrated Matching for Picture LIbraries. PAMI. vol 23, no.9, pp.947-963, 2001. [2] J. Han, Y. Cai, and N. Cercone. Knowledge discovery in databases: An attribute-oriented approach. 18th Int. Conf on VLDB. 547-559. Vancouver, 1992. [3] R. Brunelli and O. Mich. Histogram analysis for image retrieval. Pattern Recognition. 34-8. 2001. [4] N. J. Belkin, J. Perez-Carballo, C. Cool, S. Lin, S. Y. Park, S. Y. Rieh, P. Savage, C. Sikora, H. Xie, and J. Allan. Rutgers’ TREC-6 interactive track experience. 6th Text Retrieval Conference. 597-610. 1998. [5] H. Müller, W. Müller, D. M. Squire, S. Marchand-Maillet, and T. Pun. Strategies for positive and negative relevance feedback in image retrieval. T.R. 00.01, Computer Vision Group, Computing Centre, Univ. of Geneva, 2000. [6] N. Vasconcelos and A. Lippman. Learning from User Feedback in Image Retrieval Systems. NIPS’99, Denver, Colorado, 1999. [7] Nastar C, Mitschke M, and Meihac C. Efficient Query Refinement for Image Retrieval. IEEE CVPR. 547-552. Santa Barbara, 1998. [8] R. Picard, T. P. Minka, and M. Szummer. Modeling user subjectivity in image libraries. MIT Media Lab Perceptual Computing TR #382. 1996. [9] Y. Ishikawa, R. Subramanya, and C. Faloutsos. Mindreader: Query databases through multiple examples. 24th VLDB Conf. 433-438. New York, 1998. [10] Y. Rui et al. Efficient indexing, browsing and retrieval of image/video content. PhD thesis. Department of Computer Science, Univ. of Illinois at Urbana-Champaign. 1999. [11] Y. Rui and T. S Huang, Optimizing Learning in Image Retrieval, IEEE CVPR, Hilton Head, Sc, USA, 2000. [12] A. Trouvé and Y. Yu. Metric Similarities Learning Through Examples: An application to Shape Retrieval. 3rd EMMCVPR Int. Workshop. 50-62. Sophia-Antipolis, 2001.