Improving the retrieval performance of content-based image retrieval systems: The GIVBAC approach Lars Br¨ocker, Manfred Bogen GMD - German National Research Center for Information Technology Schloss Birlinghoven, D-53754 Sankt Augustin Lars.Broecker,
[email protected] Armin B. Cremers Institute of Computer Science III, University of Bonn R¨omerstraße 164, D-53117 Bonn
[email protected] 1. Introduction
Abstract The retrieval performance of content-based image retrieval (CBIR) systems still leaves much to be desired, especially when the system is serving as an interface to an image collection covering many different topics. The problem of missing semantical information about the images leads to great numbers of false matches because of misleading similarities in the visual primitives that are retrieved.
CBIR systems have begun to see wider use in the last few years. Even big search engines like Altavista or Lycos now have subsystems that let users search especially for images. Although the search-interfaces are necessarily very restricted in their functionality, the result sets are often adequate for the information need. In such environments people are used to big result sets with a lot of false matches, since that is the usual response of an internet search engine. But in the case of an information system for a more restricted data set, like the image part of a digital library or the image collection of an agency, users are not so tolerant towards the overall quality of result sets. The drawback of common CBIR systems still is: The more diverse the topics covered in the collection, the higher is the risk of reporting false matches because of similarities in the visual primitives that have no equivalent in the semantics. This has been called the semantic gap in [5] and the chances that this gap will be bridged in the foreseeable future are slim.
This paper introduces an approach called GIVBAC, that tries to reduce the number of false matches in query result sets. It relies heavily on user feedback on retrieval results with regard to user-definable thematic groups. Feedback is used globally, i.e. the feedback of one user has influences on the behaviour of the whole system. The weight of individual votes is a parameter of GIVBAC so that it can be adjusted to the level of trust the user base is given, i.e. higher values in a closed user group and lower values in an open group with anonymous users. The paper presents results of an evaluation comparing the performance of an unmodified CBIR system with a system that is modified so that it uses GIVBAC as an interface to the users.
Thus, systems are desired, which impose additional structure onto the image database without the need for manual annotation. The benefits gained are discussed in [2]. GIVBAC goes one step further. Instead of generating a static set of groups before the first use of the system, new groups can be created at any time (even by users themselves, if the proprietors permit that). Whether an image becomes a member of one of these groups or not depends on feedback given by the users. If many users deem an image to be a fitting member of a group and vote accordingly, the rank of the image in the group improves, which in turn improves the ranking of the image in result lists for queries regarding this group. This holds true for images that many
Keywords
Content-based image retrieval, user feedback, image databases, digital libraries 1
users think of as a false match for a group. The ranking of these images worsens accordingly so that it may in time be omitted from the visible part of the result lists for that group. This voting process distinguishes GIVBAC from other approaches. Nearly every system in this field that incorporates user feedback does so in the confines of relevance feedback (see e.g. [6, 8]). This technique has its roots in information retrieval on texts. The scope of feedback in this area is confined to one query only. This makes sense, because different users may have differing levels of knowledge in one area, so that an introductory text may be very relevant for one user, while it is not relevant anymore to another since there is no new information in it for him. The situation in CBIR is not the same, however. People are often looking for images they have seen before or that are similar to their example image. So the individual level of knowledge on the topic has lesser subjective impact on the assessment of query results which in turn allows the utilization of user feedback outside of the confines of individual queries.
2. Approach GIVBAC has to satisfy several requirements: The main requirement is independence from any specific CBIR system, since the approach is supposed to be used in addition to an already existing information system. In order to provide a stable service it needs to be independent of features that could become deprecated in a new release of the CBIR system used. Since the composition of the signatures is often a trade secret of the vendor, the approach has to build its own data structures that keep score of modifications to the data. In addition the data from user feedback has to be processed inside of the client. As a direct consequence of the independence requirement follows the choice to implement the client as the interface between information system and the users. Every interaction between user and information system will be handled by this interface.
2.1. Grouping the collection Since automatic clustering using the CBIR system is not feasible (because of the high error margin) the approach depends on human input for the generation of groups. These can be created through a simple query, the results and parameters of which are saved. These groups contain two membership lists, the hotlist and the coldlist. The hotlist contains all images that represent relevant results, while the coldlist contains the mismatches.
To which of these lists an image of the group belongs depends on the value of one parameter, , with . Values of gain the image inclusion in the hotlist, while images with are considered part of the coldlist. New members of any group start with an initial value of , so that in new groups initially all images technically belong to the hotlist. Since the initial value of the parameter does not modify the results in any way, this leads not to more false matches than before. The actual splitting into different lists starts with the second feedback on an image.
2.2. Query Processing Figure 1 shows the steps necessary for a query under the GIVBAC approach. Whether they are carried out on the server, inside of GIVBAC or directly depend on user-actions. Blue colored boxes indicate a step visible to the user and depending on user input, whereas yellow boxes indicate steps that are transparent to the user. A query that makes use of the group-concept differs from a normal query only insofar as the group is specified as an additional parameter. This parameter remains in the interface while the rest of the query is passed on to the CBIR subsystem that performs the query. The results are sent to the interface where they are processed in a unit that is described in detail in section 2.3. The modified list of results is displayed so that the user can give his feedback. The data from this feedback is then used to perform updates on the internal weights of the groups (see section 2.4), which are stored on the database management system. Finally the user can either initiate another iteration of the query or end his search.
2.3. Modification of results A query using groups is processed as follows: The parameters of the query are given to the underlying CBIR system, which performs a ”normal” query on the database. The results from this query are then analyzed by the interface in a module. The ratings for images belonging to the group in question are modified according to these rules: 1. Let r be the rating the image received from the CBIR system. 2. Let m 3. Let r’
Figure 1. Necessary steps for a query using GIVBAC
Step 2 is necessary since GIVBAC uses its own scale on which similarity is measured. This is done to ensure independence from vendors. A rating of 0 means complete congruence to the example, whereas a rating of 100 stands for complete dissimilarity. The variable m denotes the radius of the biggest interval having r as midpoint and being entirely inside the interval [0,100]. Step 3. then computes the new rating for the image by multiplying the size of the interval (2 times m) by and adding that value to the value of the left edge of the interval. This step gives images from the hotlist a better rating while images from the coldlist are punished. After all of the results have been processed, the threshold is applied to the modified result set. Only the images satisfying the threshold are then presented to the user.
2.4. Processing user feedback When the results are displayed, the users have the opportunity to give feedback on every image. Feedback is given with regard to the group, i.e. positive feedback indicates that the image is a relevant image for the group that was selected at the start of the query (and vice versa for negative feedback). The effect of feedback is a modification of the value of . If feedback is given on an image that is not a member of the group in question, then it is added to the hotlist of the group with regardless of the type of feedback it received. If an image is already part of the group, is actualized
differently depending on the type of feedback: positive negative
These two functions effect the necessary modifications on the parameter . Negative feedback causes the parameter to grow whereas positive feedback lessens it. The additional parameter s that is introduced in these functions allows scaling of the impact of one piece of feedback in relation to the size of the expected user basis. In a system with named users or a highly specialized environment the possibility of wantonly false feedback is relatively small. Thus greater trust can be placed in the users. Considering a bigger audience, e.g. a search-engine on the WWW, the possibility of abuse is much higher, so less trust is put into individual ratings. This makes s effectively a gauge for the trustworthiness of the users. The smaller the value for s, the bigger the impact of individual ratings on the new value of . This wraps up the necessarily short explanation of the approach. No assumptions are made about special features that a CBIR system might offer, since the features themselves are not used in the approach, only the rating that is generated by the system. Since the approach uses its own scale for the purpose of ranking images, implementations of the approach need to do the necessary calculations to transform the scale of the host
system into the internal version.
3. Evaluation In order to evaluate the approach outlined in section 2 a prototype has been implemented and tested against an image database consisting of about 1500 images. Since GIVBAC is designed to be an interface to CBIR systems, a CBIR system was needed. The prototype acts as an interface to an Oracle 8i system running the Virage Search Engine, that is part of the InterMediacartridge from Oracle. This engine uses four different features, namely global and local color, texture and shape, all of which can be weighted in a query. Results are rated on a scale from 0 to 100, 0 meaning full match. Users can limit the results with a threshold that omits all images with a rating higher than itself. The evaluation consisted of five performance tests, each one measuring the changes in performance for one particular thematic group of images. The members of these groups were determined before the tests in order to enable application of recall and precision measures (for a definition of these measures, see [1]). The groups covered the following topics: cards, people, planets, pyramids and rhinoceros. The tests consisted of 25 queries to GIVBAC for each group. Every query used an example from the group chosen at random. The weights for the features were freely selected by the test persons, just as in a real scenario. User feedback was then collected for each of these queries. The queries which generated the groups were used as references against which the changes in performance could be measured. The references for the groups equal the performance the Oracle system would reach given the query parameters. Every fifth query was followed by a query using the reference image and parameters. The actual results at these points were then plotted in a recall-precision diagram alongside the reference query. The user group for the tests consisted of five named users, so s was set to a relatively small value, namely 1.01 . Larger values for this parameter would lead to the same results but would need more iterations to get there. The diagrams in figure 2 show the results of the tests. The results for GIVBAC are significantly better than the reference results, showing the feasibility of the approach. While it was to be expected that each group shows increases in precision, the increases of recall deserve a closer look. The increase stems from the fact that users gave positive feedback on images that were not part of the initial groups which led to better ratings for these images. Since an unmodified system would not have retrieved these images, this is an additional advantage when using GIVBAC.
The biggest gain can be seen in the group of images depicting rhinos. The size of the result set dropped nearly 50 % in the course of only 25 queries, shown in the big increase in precision. But while the overall size of the result set is falling still the share of recall could be expanded to 90 % of the relevant images in the group. But even the group depicting cards benefits from GIVBAC although it is well suited for general CBIR since nearly all of these images have the same structure. The gains in precision are not that high but after 25 queries the system shows the first 80% of the relevant images of the group without a false match. The results of the performance tests show the feasibility of the approach. The improvement covers precision as well as recall, so that the result sets are at least as good as in an unmodified system. The tests also shows that GIVBAC works even under difficult suppositions, although this may necessitate a bigger number of iterations until satisfactory results are reached.
4. Conclusion This paper described the outlines of an approach that aims at improving the performance of CBIR systems. This is done by adding structures that model thematic content of the database and using user feedback to modify membership of images in groups in that structure. In this way feedback is not only used to get better results for one user but to improve results for all following users. The approach presented is used as an interface to already existing CBIR systems and their databases and is independent of a specific CBIR system. That allows for easy adaptation to changes in the underlying system or an exchange for a new one. The approach uses thematic groups that can be generated by the owners or users of the system. Feedback is given on the relevance of results according to one of these groups. An additional feature of the approach is its scalability with regard to the expected size of the user base. Evaluation results are given and they show the possible gains. Both precision and recall are increased, between 18% and 48% for precision and 10% to 20% for recall. Further work will include inclusion of a client using this approach into a digital library developed at our department so that it may be used by a wider audience. This will allow collecting of long-time data on the performance of the client and user feedback on the client itself. While being most suited for the image parts of digital libraries in which some degree of control over the use of the system can be exerted, application of the approach is possible in search engines on the WWW, due to the scaling factor. Usage in such an environ-
ment would necessitate additional efforts in the areas of right management (such as: Who is allowed to create new groups? Should feedback of named users count for more than that of anonymous users? etc), which is outside the scope of this work.
5. Acknowledgements This paper is a brief synopsis of the master thesis of the main author [3]. Thanks go especially to Dr. Jens E. Wolff at the University of Bonn for helpful discussions and advice on the thesis.
References [1] Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern Information Retrieval, p. 75 – 80. Addison-Wesley, 1999 [2] M. Borowski, L. Br¨ocker, S. Heisterkamp, J. L¨offler. Structuring the Visual Contents of Digital Libraries Using CBIR Systems. In 2000 IEEE Conference on Information Visualization, p. 288 – 293. IEEE, 2000 [3] Lars Br¨ocker. Design und Implementierung eines Verfahrens zur Verwendung von Benutzerbewertungen zur Verbesserung des Anfrageverhaltens eines CBIR-Systems. Master Thesis at the Rheinische Friedrich-Wilhelms-Universit a¨ t Bonn, 2001 [4] A. Del Bimbo. Visual Information Retrieval. Morgan Kaufmann Publishers, 1999 [5] John Eakins, Margaret Graham. Content-based Image Retrieval. University of Northumbria at Newcastle, 1999 [6] David Squire, Wolfgang M¨uller, Henning M¨uller. Relevance Feedback and Term Weighting Schemes for Content-Based Image Retrieval. In Visual Information and Information Systems, LNCS 1614, p. 549 – 556. Springer-Verlag, 1999 [7] Remco C. Veltkamp, Mirela Tanase. ContentBased Image Retrieval Systems: A Survey. Technical Report UU–CS–2000–34. Department of Computer Science, Utrecht University, 2000 [8] M. E. J. Wood, N. W. Campbell, B. T. Thomas. Iterative Refinement by Relevance Feedback in Content-Based Digital Image Retrieval. In Proceedings of the 6th ACM International Conference on Multimedia, p. 13 – 20. ACM Press, 1998
Figure 2. The results of the evaluation, shown in a recall-precision diagram for each group of images