Adaptive Information Filtering: Detecting Changes ... - Semantic Scholar

Report 6 Downloads 117 Views
Adaptive Information Filtering: Detecting Changes in Text Streams Carsten Lanquillon

lngrid Renz

DaimlerChrysler Research and Technology D-8901 3 Ulm, Germany Phone: +49 731 505 2809

DaimlerChrysler Research and Technology D-8901 3 Ulm, Germany Phone: +49 731 505 2351

[email protected]

[email protected]

ABSTRACT

Classification problems can be solved by applying supervised learning techniques which learn from a set of given examples and can then be used to determine the class of new, unseen observations. A comparison of somecommon learning algorithms for text classification is given in [3]. The authors emphasizethat modeling classifiers in static domains is generally sufficiently well controlled. Here, the focus is to cope with dynamically changing domains.

The task of information filtering is to classify documents from a stream as either relevant or non-relevant according to a particular user interest with the objective to reduce information load. When using an information filter in an environment that is changing with time, methods for adapting the filter should be considered in order to retain classification accuracy. We favor a methodology that attempts to detect changes and adapts the information filter only if inevitable in order to minimize the amount of user feedback for providing new training data. Yet, detecting changes may require costly user feedbackas well. This paper describestwo methods for detecting changeswithout user feedback. The first method is based on evaluating an expected error rate, while the secondone observes the fraction of classification decisions made with a confidence below a given threshold. Further, a heuristics for automatically determining this threshold is suggestedand the performance of this approachis experimentally explored as a function of the threshold parameter. Some empirical results show that both methods work well in a simulated change scenario with real world data.

The application of supervised learning algorithms for classification problems is basedon the essential assumption that the distributions of the training data and the new data are somewhat similar. Even if this assumption may hold at first, it may become invalid in a long-term application. It should rather be expected that the content of the incoming texts changes as time proceeds. In this case,it is inevitable to adaptthe applied information filter to the new situation in order to retain classification accuracy. We favor a methodology that attemptsto detect changesand adapts the information filter only if inevitable in order to minimize the amount of required user feedback. Alternatively, an information filter could be relearned in regular intervals no matter if changes really occur. This, however, requires the user to regularly provide new training texts and thus to read many non-relevant texts. This is prohibitive since the task of information filtering is to reduce information load.

Keywords Information filtering, text classification, statistical quality control, change detection.

The following section lists some related work and statesthe differencesto our approach. Section 3 describesthe types of changeswe are looking at and in which way they effect classification performance. Two methods for detecting changes are described in Section 4. Section 5 briefly discussestwo adaptation strategies. Empirical results for both methods are presented in Section 6. And finally, Section 7 summarizes the results and presents some open questions.

1 INTRODUCTION The primary task of information filtering is to reduce a user’s information load with respect to his or her interest. The filter is supposedto remove all non-relevant documents from an incoming stream,such that only relevant documentsare presentedto the user. Thus, information filtering can be described as a binary classification problem. In this paper, we only consider text documents and regard information filtering as a specific instance of text classification.

2 RELATED WORK Concerning the detection of changes,the objective of this paper is similar to the task of topic detection and trucking (l”DT) [2]. Yet, TDT is defined as an unsupervised learning problem. Hence, there are no relevance classeswith respect to a particular user interest. Furthermore, an important issue in this paper is to minimize the amount of extra work to be done by the user. Therefore, the primary interest is not to detect the occurrence of changesas early as possible but to detect changes when they have a significant effect on the performanceof the classifier. Once new topics have been detected, the tracking of new topics in TDT and adapting a classifier have further similarities.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advant -age and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists. requires prior specific permission and/or a fee. CIKM ‘99 11199 Kansas City. MO, USA 0 1999 ACM l-5911 3-075-9/99/0010...95.00

538

Klinkenberg did someinteresting researchon adaptive information filtering [4]. However, while Klinkenberg focuseson changesin the user interest, we examine changes in the content of a text stream. The effect on the performance of an information filter is very similar, though. Klinkenberg also tries to cope with dynamic aspects by detecting changesfirst. He monitors indicators which are based on classification results and generally require the true class labels of the new texts in order to be evaluated. Although his adaptive approachesachieve promising results, in our setting they are intractable becauseproviding true class labels for all new texts which have been kept from the user is prohibitive. Allan’s work on incremental relevance feedback for information filtering [ I] also considers aspectsof changing user interests rather than changes in the text stream. Allan uses relevance judgments on texts presentedto the user in order to generatea better query or maintain its quality over time. The application of relevance feedback is also crucial when changesoccur in a text stream. Yet, this approach cannot be used to prevent relevant texts being kept from the user, which will be a major issue in this paper.

DYNAMIC

ASPECTS

For the sakeof simplicity, we assumethat each text can be uniquely associatedwith exactly one topic. Further, each topic must be either relevant or non-relevant with respect to a particular user interest. Looking at a streamof incoming texts, we consider the following types of changes: l

new topics arise

l

existing topics disappear

l

existing topics change (i.e. the content).

preprocessing

@

text properties

4

classification

feature vector classification

properties

class label Figure 1: Indicators derived from the filtering process. Even if the performance of a classifier may be assessablein a static domain, its estimation may becomeobsolete in dynamically changing domains. Thus, it is inevitable to continuously observethe performance of a classifier. However, evaluating classification results in order to detectchangescommonly requires knowledge of the true class labels. Providing these constitutes a lot of extra cost which conflicts with the objective of information filtering becausetexts that are kept from the user must then be read nonetheless.

Lewis did some seminal work on autonomous text classification systems. He suggest to monitor the performance in an ongoing fashion and adapt to changes if necessary. He also describes the problem of evaluating performance (effectiveness) measuresand proposesan approach that is designed for classifiers that respond to new documents with probabilities of class membership. He estimates the system’s performance basedon these probabilities and the classifier’s decision about the class membership and thus does not require the true class labels to be specified. One of our change detection method is basedon this idea.

3

natural text

4

DETECTING

CHANGES

This section explores a methods for detecting changeswithout user feedback. However, if a change is detected and it is advisable to adapt the classifier, extra work for providing new training examples may still be required. First, we briefly motivate the usageof quality control charts for detecting changesas deviations from expectedbehavior. This requires indicators to be derived from the underlying process. Hence, possible indicators for text classification systemsare described. Further we discuss the problem of evaluating common performance indicators and introduce two alternative ways of assessingperformance indicators.

4.1

Statistical Quality Control

For detecting changes, we assume that the text stream is divided into batches with respect to its chronological order. Even if information filtering is regardedas an on-line processwhere eachtext is classified as it arrives, constructing batches is quite natural since, for example, all texts arriving in the course of a day or a week can be grouped together. The value of any indicator that could be used to detect changeswill then be calculated separatelyfor each batch. Using batchesoffers the advantageto detect changesby observing the deviation from the indicator value of the current batch to the mean derived from indicator values of past batches. While averaging and thus reducing stochastic perturbations, less attention is given to outliers.

A changing topic can be interpreted as the superposition of two similar topics, and one of these topics disappearswhile the other arises. Furthermore, a change due to a disappearing topic does not have any serious effect on the performance unless this topic is similar to an existing topic but belongs to the other relevance class or unless too many obsolete topics make modeling the information filter difficult. Therefore, in the following we will focus on changes due to new topics only.

We will use a Shewhart corztrol chart for detecting changes. The idea is to test whether or not a single observation of an indicator v is within acceptablevariation. More formally, we assumethat a changehas occurred at time t if vt > fi+39, according to the threesigma control limit of the Shewhart chart, given v as an estimate of the indicator mean and sV as an estimate of the corresponding standarddeviation. See [8] for an introduction to statistical quality control.

Assuming forced recognition, a text of a new topic can either be classified correctly or be misclassified. If it is already classified correctly, the new topic is hard to notice at all. But with regard to classification performance it is not even necessary.An non-relevant text that is classified as relevant is rather easy to detect if we assume feedback for all texts presented to the user. However, it is difficult to realize that a relevant text is erroneously kept from the user. This problem is crucial to the application of information filters. How can we be guaranteedthat relevant texts are not permanently kept from the user?

4.2

Indicators

for Text Classification

We consider two categoriesof indicators that can be derived during the text classification process as depicted in Figure 1:

539

l

l

Tex?properties: The indicator characterizes a current subset (batch) of the text stream, e.g. class distribution or frequencies of words. CIussification properties: The indicator is based on final or intermediate classification results, e.g. performancemeasures such as the error rate.

Indicators derived from text properties basically concern the preprocessing of texts. Natural texts are transformed into feature vectors which can then be handled by common classifiers. Observing text properties, there is a chance of detecting changes even before actually classifying any new texts and the user can be warned that the upcoming decisions may be uncertain. Note that in some cases these indicators will be very important. Due to changes, the feature vector of a new text could contain very few or, by contrast, too many features which are selected for text representation. In this case, new texts will be difficult to handle by the classifier. Thus, indicators baseson text properties may be necessaryfor detecting certain changes,and we will explore them in future research.In the following, however, we focus on indicators which are derived from classification properties. Recall that information filtering has been defined as a binary text classification problem where the decision is whether or not a text is relevant for a certain user. Assume that a set of N texts has been classified by a text classification system and the true class labels are given by an expert. The relationship between the classification decisions and the true class labels can be summarized in a contingency table as shown in Figure 2. For example, entry a is the number of texts that the information filter presentedto the user and that are in fact relevant. As mentioned in section 3, the values for entries a and b may be known since the corresponding texts are presentedto the users. However, the values for entries c and d are generally unknown. Common performance measuresin information retrieval are recall and precision. In terms of the contingency table, they are defined as: 0 recall = a/(a + c) l

4.3 Assessing Performance Indicators As demonstratedabove, evaluating performance indicators is commonly not feasible since the true class labels are usually not available. In the following, two methods that try to overcome this evaluation problem are suggested.

4.3.1 Estimation of Expected Performance In order to estimatethe expectedperformance of a classifier on new data, Lewis [6] models the unknown expert judgment (the true class labels) by a Bernoulli (O/l) random variable, Z,, with parameterpi for each classified text ti giving the probability that Z; will take on the value 1. The event Zi = 1 occurs if the current text is actually relevant, and Z; = 0 otherwise. The value of eachpi is assumedto be the classifier’s estimate of the probability of class membership. Further, it is assumedthat each document is judged independently of all others. Lewis provides a general expression for an expected performance measureas the sum of performance values that would be obtained for each of the 2N possible combinations of the expert judgments for all texts of the current batch, weighted by the probability of each judgment as determined by the classifier. For large N, it is not feasible to directIy evaluate this expression. For the expected error rate, for example, a simpler expression is given by

precision = a/(a + b)

In words, precision is an estimate of the probability that a text presented as relevant to the user is indeed relevant. Assuming feedback for all texts presentedto the user (entries a and b) is available, it can always be evaluated. Even if precision can be used to detect changes under certain circumstances, it is likely to fail in detecting changes in case texts of relevant topics are classified as nonrelevant becausethe precision does not take into account texts kept from the user as non-relevant (entries c and d). Since the detection of this type of change was found to be of major importance, we will not use precision other than for presenting empirical results. Likewise, recall is an estimate of the probability that the filter lets through relevant texts to the user. Hence, it could be used to detect an increasing number of relevant texts kept back from the user. However, it can only be calculated when the true class labels of all texts are available. Another performance measurethat is commonly used for classification problems is the error rate. Its calculation also requires the true class labels of all texts: l

Figure 2: Contingency table for N = a + b + c + d decisions.

error rate = (b + ~)/(a + b + c + d)

See [6] for a brief overview of further performance measures.

540

%xpected

=

1 N

-

N c r=l

(1 - 2d,)p, +d;

(1)

where N is the current batch size and d, is the classifier’s decision for text t, with d, = 1 if t, is classified as relevant, and 0 otherwise. This approach requires the classifier to respond to new texts with probabilities of class membership. Generally, however, classifiers may respond with confidence scores for each class that need not be estimates of this probability. Hence, the obtained expression need not estimate the expected performance indicator as intended by Lewis when applied to classifiers that do respond with probabilities of class membership. Yet, this expression still representsan indicator for the averageconfidence of all decisions per batch and can thus be used to detect changeswithout any user feedback.

4.3.2 Virtual Rejects Instead of evaluating or estimating any common performance measure, we will now derive an indicator from intermediate classilication results that does not require the true class labels to be specified either. The method is basedon the approach described in [5].

We assumethat a confidence score for each class can be obtained from the classifier. A change which would require adaptation is suspectedwhen the current classification confidences drop significantly below the previously observedconfidences.

though, that usually the design goal in incremental learning is to produce a model that does not depend on the sequencein which the training examples are presented (e.g., see [1 11). For our problem, however, this may not be the case. The possibilities of updating a classifier crucially depends on the selected learning algorithm including the preprocessing of texts.

We will observe the winning confidence scores(i.e. the confidence for the assigned class) for all texts in the current batch that will be classified as non-relevant. We define a virtual reject class V which consists of all texts classified as non-relevant with a confidence score that is below a threshold 8. Let N be the current batch size and reject(t;) = 1 if text t; E V, and 0 otherwise. The reject indicator is evaluated as h-eject

=

k

5

reject(t,)

In this paper, the main focus is on exploring the change detection method. Therefore we are satisfied having detected a change and skip the adaptation phase.

6 EMPIRICAL

(2)

i=l

When vreject increasessignificantly as determined by the Shewhart control chart, we conclude that the classification confidences have decreasedand a change might have occurred.

The key problem of this approachis to define the confidence threshold II. The objective is to find a value that best separatesthe distributions of confidence scores for each relevance class independent of the actual classification. This corresponds to finding the point of intersection of the density functions of two overlapping distributions. For approximating the value of the threshold 0 we proceed as follows. We perform k-fold cross-validation on the training texts to get unbiasedconfidence scores.The scoresobtained for the nonrelevant class are split into two sets Srel and S,,, of size Nrel and Nnon for texts in fact being relevant and non-relevant, respectively. Let n,,on(8) denote the number of scores in S,,, below 0 and n,,r (8) the number of scoresin Srel above B as functions of 8. We then determine 0 such that

kd@) x nnon(e) N rei

N non

(3)

Note that the reject class is only virtual, i.e. the classifier is still forced to decide whether or not a text is relevant, for two reasons. First, determining a good threshold for the reject class with the objective to detect changes need not be optimal for deciding about the class labels since it is not the winning confidence score being of interest but the distributions of confidence scoresindependently for each relevance class. Second, this makes it easier to compare the performance of this information filter to other approaches.

5 ADAPTING TO CHANGES An information filter must be adaptedto changesin order to retain classification accuracy. Basically, there are two ways of adapting a classifier. On the one hand, the classifier can be completely relearned from scratch basedonly on a currently representativetraining set. On the other hand, an existing classifier can be updated basedon somecurrent examples. The difficulty of the first approach is to provide a truly representative training set for the current situation, see [9] for example. A simple approach that often works well in practice (basically depending on the batch size) is to assume that the examples of the most recent batch are representativefor the current situation. Updating a classifier based on some new examples brings up the question of how to combine the knowledge inherent to the existing classifier with new examples in order to yield a better classifier. This is commonly done by incremental learning algorithms. Note,

RESULTS

The experiments made for the evaluation of our methodsare based on a subsetof the data setof the Text REtrieval Conference (TREC). This data set consists of English business news texts from different sourceswhich are assigned to one or more categories (topics). Table 1 shows the 10 selected topics with the number of texts assigned to them. Categories 001 and 003 are defined to be relevant while the remaining 8 categoriesrepresentnon-relevant topics. For the experiments, two change scenarios are simulated. The texts are randomly split into 21 batches. Each batch contains 332 texts with some of the texts being discarded. The first batch serves as the initial training set while the remaining 20 batchesrepresentthe temporal development. The texts of each batch are classified as if they were new texts of an incoming stream. Based on these texts an indicator may be evaluated for the purposeof detecting changes. Finally these texts may be used for adapting the information filter. In the first scenario (Scenario A), an abrupt change (shifr) is simulated between the two relevant categories 003 (Joint Ventures) and 001 (Antitrust CasesPending). Concerning relevant texts, batches0 to 13 contain only texts of topic 003. In batch 14, there are texts of both relevant topics, and finally from batch 15 on, there are only texts of topic 001. Each batch contains the same composition of texts of the non-relevant topics. Thus, there are 58 relevant and 274 non-relevant texts of different topics in each batch. In the second scenario (Scenario B), a new relevant topic slowly arises(drift) while another relevant topic is always present. In each batch, there are 40 relevant texts of topic 003. Starting from batch 12, a gradually increasing number of texts of the relevant topic 00 1 is added, finally yielding a total of 98 relevant texts per batch. In order to keep the batch size constant, sometexts of the non-relevant topic 100(U.S. Protectionisth4easures)are omitted. Thus the number of non-relevant texts per batch gradually drops from 292 to 234. For the task of classification, a simple similarity-based classifier is applied which is a variant of Rocchio’s method for relevancefeedback [IO] and is described as the Find Sitnilur method in [3]. The Topic 001 003 004 005 006 054 100 128 142 180

1 Name of the Topic 1 Antitrust CasesPendine Joint Ventures Debt Rescheduling Dumping Charges Third World Debt Relief Satellite Launch Contracts U.S. Protectionist Measures Privatization of State Assets Impact of Gov. Regulated Farming ... [ Ineffectiveness of U.S. Embargoes... l Total

1 Texts I 400 842 355 483 528 343 808 635 2 422 1 210 1 7026

Table 1: Selectedtopics of the TREC data set.

541

Figure 3: No adaptation to changes (Scenario A).

Figure 5: Relearning after each batch (Scenario A).

Figure 4: No adaptation to changes(Scenario B).

Figure 6: Relearning after each batch (Scenario B).

classifier models each of the two relevanceclasseswith exactly one prototype. Each prototype is the average(or centroid) of all vector representation of texts of the corresponding class. For representing texts as vectors, some stop words (like “and”, “or”, etc.) and words that occurred fewer than five times in the training set are removed. From the remaining set of words, a maximum of 1000 words per relevance class is selectedaccording to the mutual information measurealso described in [3].

in regular intervals no matter if there are any changes. Figures 5 and 6 show the performance of this benchmark approach. In both scenarios, the former level of performance can be regained after the change has occurred. Hence, the assumption that the texts of one batch are truly representative for learning the information filter holds for our experiments. While there is no attempt to detect changes,the required user feedback for providing new training data is prohibitive due to adapting after each batch.

In the following, results of four approachesaveragedover 10 trials with different random seedsare presentedfor eachchange scenario. The first approach does not adapt to changes while the other three approachesare adaptive. One of them is not feasible in practice because it requires complete user feedback and therefore only serves as a benchmark. The remaining two approaches use the change detection methods described in Section 4.

The first feasible approach monitors the expected error rate. Figures 7 and 8 show that the performance is about as good as that of the benchmark approach. In Scenario A the change is detected in each trial. Yet there are three false alarms altogether. Since the change is more gradual in Scenario B, several adaptationsthroughout the change are acceptable. Though the changes are reliably detected in all trials of both scenarios, the curves labeled expected error) show that the changesonly have a weak impact on the values of the indicator.

Figures 3 and 4 show the performance of the approach that does not adapt to changes. The information filter is learned basedon the training texts of the first batch and is then left unchanged. Initially, the performance in both scenarios is quite good. After the change has occurred in each scenario, however, the performance significantly decreasesand is no longer acceptable. The performance for Scenario B also shows that precision cannot detect this kind of change.

The other feasible approachmonitors the indicator derived from the virtual rejects. The results are shown in Figures 9 and 10. Again, the performance can keep up with that of the benchmark approach. Except for one false alarm in Scenario A, the information filter is adapted once per trial as indicated by the peaks of the curve labeled reject. Note, that the change in the indicator values due to the change in the text stream is larger than that of the expected error indicator.

As mentioned in the introduction, an alternative yet infeasible approach for coping with changes,is to relearn the information filter

542

0.3 -

Figure 9: Monitoring virtual rejects (Scenario A).

Figure 7: Monitoring expected error (Scenario A).

0.3 c ..

0.0 L 0

8 2

I

6

I

b&

12

14

13

13

I 20

Figure 8: Monitoring expected error (Scenario B).

Figure 10: Monitoring virtual rejects (Scenario B).

Tables 2 and 3 summarize the results for both scenarios, showing averagevalues of recall, precision and error rate over ail batchesas well as the averagenumber of adaptationsperformed in each trial. Both adaptive approachesreliably detect the change and can thus regain the initial level of performance. They differ only slightly with respectto the performance measures,

all reject indicator values observedbefore the change hasoccurred. The larger the difference the better is the ability to detect changes. Note, for example, that a change requires a difference of at least three sigma in order to be detected (see Section 4).

The following experiment shows the performance of the approach monitoring the virtual rejects as a function of the threshold parameter 0. From the previous change Scenario A, three batchesbefore the change and three batchesafter the change are taken. However, this time the indicator is only observed and no adaptation is performed in casea change is suspected. In order to assessthe performance of the changedetecting method, we compare the average of the reject indicator Vrejectbefore and after the change. The difference is expressed in multiples of the standard deviation (sigma) estimated for the reject indicator from Approach No Adaptation Relearn Expected Error Virtual Rejects

Recall 70.62% 94.28% 90.14% 91.34%

Precision 61.09% 78.07% 72.91% 75.21%

Error ~1 11.37Yo 0.0 5.96% 20.0 7.49% 1.3 6.81% 1.1

Figure 11 showsthese differences as a function of the threshold parameter 0 as well as the three-sigma control limit. First, the differencesare increasing with increasing threshold values 19since more misclassified texts fall in the region below the given threshold. At the end, mostly correctly classified texts are captured by further increasedvalues of 8. The difference is sufficiently large for being detected,i.e. it is larger than three times the standarddeviation, approximately between 0 = 0.12 and theta = 0.21. For this setting, @should have been selectedwithin these bounds in order to lead to a successful change detection. In all trials, our heuristics provided values for 8 between 0.165 and 0.185. This fits exactly in the acceptable range and can thus lead to a successful change detection.

Approach No Adaptation Relearn Expected Error Virtual Rejects

Table 2: Average results over all batchesfor Scenario A.

Recall 77.32% 94.08% 89.52% 87.47%

Precision 68.95% 72.91% 70.84% 71.57%

Error 11.35% 6.74% 8.19% 8.25%

p 0.0 20.0 1.4 1.0

Table 3: Average results over all batchesfor Scenario B.

543

for coping with this problem is to model each class with several rather than one prototype. Here, text clustering may be useful for determining topics inherent to each relevance class. Furthermore, in future research indicators based on text properties will also be applied for detecting changes. So far only an adaptation strategy that relearns an information filter from scratch was applied. However, approachesthat update existing filters basedon some new training data should also be considered becausethey provide more potential for further reducing the amount of required user feedback. In this context, text clustering may also be used in order to group new texts helping the user to label thesetexts. Furthermore, the idea of active learning as applied to text classification in [7] should be considered. am 0.02 0.04 0.08 o.oB 010 012 cl.11“J&p,‘”

0.20 0.22 0.2. 0.26 0.28 0.30 0.32 0.34

REPERENCES Figure 11: Increaseof

Vr,je,-t

in multiples of sigma

Ul .I. Allan. Incremental relevance feedback for information fil-

tering. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, Zurich, Switzerland. 1996.

Note, however, that all differences due to changes barely exceed the three sigma control limit even though the simulated change is abrupt and rather distinct.

VI J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop,1998.

7 CONCLUSION

[31 S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive

The experiments carried out here and the results in [5] show that detecting changesis possible without user feedback. The proposed performance indicators reliably detected the simulated changes though there were a few false alarms. The amount of extra work required to be done by the user in order to retain the desired classification accuracy can be significantly decreased.

learning algorithms and representationfor text categorization. In Proceedings of the Seventh International Conference on Information and Knowledge Management, 1998.

r41 R. Klinkenberg and I. Renz. Adaptive information filtering: Learning in the presence of concept drifts. In Learning for

Text Categorization, pages 33-40, Menlo Park, California, 1998. AAAI Press.

Our suggestedheuristics for determining the threshold parameter8 required for evaluating the reject indicator is basedon the assumption that the similarity of existing relevant texts to the representation of the non-relevant class is prototypical for any topic (i.e. also novel topics) that is known not to be non-relevant. Obviously, this need not be true in general since any new topic may be closer or further in feature spacefrom the known non-relevant texts than the existing relevant topics. For this specific setting, however, first experiments show that the heuristics yields good values with regard to the change detecting ability. Yet, more experiments in different settings must be carried out in order to further justify the suggested approaches.

PI C. Lanquillon. Information filtering in changing domains. In Proceedings of the IJCA1’99 Workshopon Machine Learning for Information Filtering, Stockholm, Sweden, 1999.

@I David D. Lewis. Evaluating an optimizing autonomous text

classification systems. In Proceedings of the Eighteenth Annual International ACM-SIGIR Conference OFZ Research and Development in lnformation Retrieval (SlGIR’95), pages 246-254, July 1995.

[71 David D. Lewis and William A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth

Note that the change scenario examined here exhibits only a rather drastic change in the domain. In real-world applications, however, changes usually are much slower and less radical and therefore more difficult to detect. It is a well-known weaknessof the Shewhart control chart that small changesmay not be detected.Therefore, CUSUM control charts could be used in addition (see [8]). This technique accumulatesdeviations from an expected value and can thus detect small changes which occur successively while the Shewhart control chart test whether only a single observation of an indicator is within acceptablevariation. In future research,further evaluations with more realistic change scenarios will follow.

Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), pages 3-12, London, 1994. Springer-Verlag.

PI D.C. Montgomery. Introduction to Statistical Quality Control. Wiley, New York, 3rd edition, 1997.

[91 G. Nakhaeizadeh, C. Taylor, and C. Lanquillon. Evaluating

usefulness for dynamic classification. In Proceedings of The Fourth International Conference on Knowledge Discovery & Data Mining, pages87-93, New York, 1998.

A major problem of our methods is that even quite radical changes need not have a discernible effect on the confidence scores.Further experiments show, for example, that the scoresof new texts may be similar to those of texts of existing topics of the non-relevant class even if they belong to previously unseentopics. This problem especially occurs when the non-relevant class consists of many diverse topics. Future research should reveal whether indicators basedon text properties can be of any help in this situation. Another idea

DOI J.J. Jr. Rocchio. Relevance feedback in information retrieval.

In G. Salton, editor, The SMART Retrieval System: Experiments in Automatic Document Processing, pages 313-323. Prentice Hall, I97 1.

[Ill

544

P. E. Utgoff. Incremental learning of decision trees. Machine Learning, 4:161-186,1989.