Evaluation Measures for TCBR Systems M.A. Raghunandan1, Nirmalie Wiratunga2, Sutanu Chakraborti3, Stewart Massie2, and Deepak Khemani1 1
Department of Computer Science and Engineering, Indian Institute of Technology Madras, India
[email protected],
[email protected] 2 School of Computing, The Robert Gordon University, Scotland, UK {nw,sm}@comp.rgu.ac.uk 3 Tata Research Development and Design Centre, Pune, India
[email protected] Abstract. Textual-case based reasoning (TCBR) systems where the problem and solution are in free text form are hard to evaluate. In the absence of class information, domain experts are needed to evaluate solution quality, and provide relevance information. This approach is costly and time consuming. We propose three measures that can be used to compare alternate TCBR system configurations, in the absence of class information. The main idea is to quantify alignment as the degree to which similar problems have similar solutions. Two local measures capture this information by analysing similarity between problem and solution neighbourhoods at different levels of granularity, whilst a global measure achieves the same by analyzing similarity between problem and solution clusters. We determine the suitability of the proposed measures by studying their correlation with classifier accuracy on a health and safety incident reporting task. Strong correlation is observed with all three approaches with local measures being slightly superior over the global one.
1 Introduction Textual case-based reasoning (TCBR) systems often contain problems and solutions in the form of natural language sentences [1]. Unlike classification domains where a solution (class label) is associated with a group of cases, problem descriptions map onto unique solutions. Other forms of CBR systems such as in configuration and design domains are also less likely to contain groups of cases with identical solutions. However with problem decomposition, these tasks are often resolved using classification approaches [2]. Evaluation methods for text classification tasks are well studied, and measures such as accuracy, or information retrieval measures such as precision and recall are commonly employed [3,4]. These measures evaluate the correspondence between the actual and proposed class labels. With TCBR systems comparison of actual and proposed textual solution content is hard due to variability in vocabulary usage and uniqueness of solutions. Intuitive measures of relevance to the problem, instead of solution comparisons based on exact matches, are required. K.-D. Althoff et al. (Eds.): ECCBR 2008, LNAI 5239, pp. 444–458, 2008. © Springer-Verlag Berlin Heidelberg 2008
Evaluation Measures for TCBR Systems
445
The applications of evaluation measures in TCBR are four-fold. Firstly, they help predict the effectiveness of the system in solving unseen problems. A challenge here is to arrive at competent estimates without relying on human relevance judgements, which are hard to obtain. Secondly, during the design phase of a CBR system, decisions have to be taken regarding the contents of the knowledge containers [5] in relation to case representation, indexing and similarity measures, adaptation, and case authoring. Evaluation measures are useful in guiding these design choices. Thirdly, evaluation measures are useful in facilitating maintenance of the case base. Examples of maintenance tasks are the deletion of noisy or redundant cases or features, addition of representative cases and refinement of case representation or similarity measures. Visualization tools, which we briefly explore in this paper, are specifically relevant in this context. Finally, evaluation measures can be used to measure and compare the complexity of different TCBR problem domains, for a given choice of representation. Empirical evaluation measures in TCBR remain an open area of research. Although a few measures have been proposed, to date no systematic comparison across these measures has been reported. Recent work reported in the literature exploit the basic tenet of CBR that “similar problems have similar solutions”. The general approach involves devising measures that quantify the extent to which this assumption holds good given different system configurations. Local alignment measures capture the similarity of problems and solutions in the neighbourhood of individual cases. In contrast global alignment compares case clusters generated separately from the problem and solution spaces, derived from the entire case base. In this paper we provide a meta-evaluation of TCBR alignment measures by analyzing correlation of evaluation measures with standard evaluation measures such as accuracy. Such a study presupposes that textual cases are also labelled with class information. In our experiments, we use a real-word dataset from the medical health and safety incident reporting domain. Each case consists of both a textual problem and solution but importantly also has an incident-related class label associated with it. We have created several case bases from this domain with different levels of difficulty. This provides CBR systems exhibiting a range of accuracy values thereby facilitating the correlation study with local and global textual alignment measures. Section 2 discusses related work in evaluation and visualization. Section 3 presents the main idea of this paper, namely the relationship between classification accuracy and problem-solution space alignment. Next, Section 4 describes the local alignment measures, followed by the global alignment measure in Section 5. An evaluation of alignment measures on a real-world health and safety reporting task is presented in Section 6, with conclusions in Section 7.
2 Related Work Estimating data complexity is a fundamental task in machine learning. In supervised classification tasks, which have been the focus of most work reported so far, data complexity helps in explaining behaviour of different classifiers over diverse datasets. It is also useful in parameter tuning, an example being selecting feature combinations that minimize complexity of the classification problem. Several approaches for measuring complexity have been proposed: statistical ones based on overlap between probability
446
M.A. Raghunandan et al.
distributions of classes [6], information theoretic approaches based on estimating information needed to encode all feature values in a given class with class labels [7], graph theoretical approaches based on constructing Minimum Spanning Trees connecting data points to their nearest neighbours, and observing the number of adjacent points sharing opposite class labels [8], and approaches based on splitting the feature space into hypercubes (hyperspheres) and measuring class purity of these hypercubes [9]. Many of these approaches can be extended to complexity evaluation of supervised text classification problems, with allowance for the fact that textual data is characterized by high dimensionality, and often leads to sparse representations. Unfortunately, most real world TCBR tasks cannot be neatly mapped onto run-of-themill classification tasks. Hence, from a practical TCBR perspective, two other relatively less well studied problems turn out to be interesting. The first: can we estimate the complexity of a text collection, without resorting to any additional information (like class labels)? The second: Can we estimate the complexity of given text collections, each of which has a problem and solution component? In case of the first problem, complexity would measure the clustering tendency within the collection. The work by Vinay et al [10] falls into this category; they use the CoxLewis measure to test whether the data points are well-clustered, or spread randomly in the feature space. In the context of the second problem, complexity reflects the degree to which we can expect the system to be competent in answering a new problem, by retrieving solutions to similar problems in the repository. This involves study of alignment between problem and solution components of texts, and our current work belongs to this category. Local measures of case base alignment have been independently studied by Lamontagne [11] and Massie et al [12] in the context of very different TCBR applications. Global measures explored by Chakraborti et al [13] have not been evaluated beyond a supervised setting. Our work attempts to fill in the void and propose a global alignment measure, which can be compared against the two local alignment measures in a level playing field. Absence of readily available human judgments makes evaluation a formidable problem in TCBR, which researchers often tend to sweep under the rug; or alternatively, make grossly simplifying assumptions (like availability of class knowledge) which are defeated in the real-world. In addition to proposing and comparing new alignment approaches, our current work adopts a novel evaluation strategy, which is central to our comparative empirical analysis. The study consolidates past and novel work, so that TCBR researchers can choose the appropriate measure based on their domain nuances, and also propose further enhancements.
3 Problem and Solution Alignment In CBR the “similar problems have similar solutions” assumption is often taken for granted, whereas, in fact it is a measure not only of the suitability of CBR for the domain but also of the competence of the system design. Essentially quantifying the alignment between problem and solution space allows us to measure the degree to which the similarity assumption is respected by a CBR system. The complexity of a domain in CBR is a measure of the difficulty of the problem being faced. Complexity is dependent on the extent to which a natural structure exists
Evaluation Measures for TCBR Systems
447
within the domain over which we have no control. However, complexity also depends on the contents of the knowledge containers in terms of the availability of cases and of the case representation and similarity knowledge chosen, over which we have at least some control. In designing a CBR system choices are made about the contents of the knowledge containers with the aim of making some aspects of the underlying natural structure apparent. This process results in clusters of cases being formed within the problem space that identify groups of recurring concepts, as shown in the problem space of Figure 1. In Information Retrieval data complexity has been estimated by measuring the extent to which documents in a dataset form into clusters as opposed to a random distribution [10].
Fig. 1. Similarity assumption and alignment
However, CBR is a problem-solving methodology and cases consist of problemsolution pairs. This adds another layer to complexity. There is no guarantee that a strong structure in the problem space will lead to a competent CBR system. For example, consider an accident reporting system in which the problem representation captures and organises problems in relation to “type of industry”. The system may be poor at identifying solutions to similar accidents that occur equally across all industry sectors. Here the problem and solution spaces are not aligned to the same concepts. The solution space also has a natural structure and forms clusters that are determined by the chosen design configuration (as depicted in Figure 1). In unsupervised tasks the structure or concepts in the solution space emerge from the nature of the domain and the knowledge chosen to represent the solution. A competent CBR system requires a strong relationship between the concepts identified in the problem and solution spaces. It is this problem-solution alignment that we aim to measure in this paper. We use a relatively large Health and Safety dataset in this paper where each case is composed of three parts: problem and solution description together with the class label. This allows us to compare concepts identified in the problem and solution spaces with those identified by the domain expert as class labels. By comparing the correlation between alignment measurements and accuracy we are able to provide a
448
M.A. Raghunandan et al.
more comprehensive evaluation of local and global techniques for measuring alignment in unsupervised problems. In the next sections we compare and contrast three measures of case base alignment: Case Cohesion; Case Alignment and Global Alignment. The first two are local alignment measures, which first calculate the alignment of a case in its neighbourhood, and then aggregate the alignment values of all cases to obtain a value for the entire case base. The third is a global measure, which calculates the alignment value for the case base as a whole.
4 Local Alignment Measures for TCBR Case Alignment [14] measures the degree of alignment between problem and solution spaces by using similarity measures between the cases both on the problem and on the solution side, in the neighbourhood of a given case. The similarity measure chosen is the same as that of a normal retrieve operation. In our experiments, we used the cosine similarity measure. The number of cases in the neighbourhood is a parameter which can be varied to obtain the best results. Case Cohesion [11] is less granular in that it measures the overlap between the problem and solution spaces in terms of the number of common neighbours in each space for a given case. Neighbours are ascertained by set similarity thresholds applicable to each of problem and solution neighbourhoods. The Global Alignment Measure looks at the regularity in the problem and solution spaces by analysing the problem and solution term-document matrices. It differs from local alignment in that the case base is considered as a whole, to calculate the alignment. 4.1 Case Alignment Case alignment measures how well the neighbourhood of cases are aligned on the problem and solution side. A leave-one-out test is used to calculate alignment of each target case, t, with reference to its k nearest neighbours, {c1,...,ck}, on the problem side. Align(t,ci) = 1 – (Dsoln(t,ci)-Dsmin)/(Dsmax-Dsmin) The function Dsoln(t,ci) is a measure of distance between the target case t and the case ci on the solution side, and Dprob(t,ci) is that on the problem side. The initial neighbours, {c1,...,ck}, are identified using Dprob(t,ci). Here Dsmin is the minimum distance from t to any case on the solution side, and Dsmax is the maximum. Align(t,c2), Align(t,c3),...,Align(t,ck) are calculated in the same way. The case alignment for the case t in its neighbourhood is a weighted average of the individual alignments with each of the cases: CaseAlign(t) = ( ∑ i=1 to k (1-Dprob(t,ci))*Align(t,ci) ) / ( ∑ i=1 to k (1-Dprob(t,ci)) ) An alignment value closer to 1 would indicate that the problem and solution spaces are well aligned around the case, whereas a value closer to 0 would indicate poor alignment.
Evaluation Measures for TCBR Systems
449
Fig. 2. a) Case alignment calculation b) Case base alignment profile
Figure 2(a) illustrates the alignment computation for a target case t, with 3 nearest neighbours from the problem side, {c1,c2,c3}. Notice how the relative distances of problem space neighbours differ in the solution space. It is this difference that is being captured by the alignment measure and normalised by the nearest and farthest neighbour distances in the solution space. A plot of the alignment values for all the cases sorted in increasing order gives the local alignment profile of the case base. An example alignment profile for a set of 50 cases from a health and safety case base is shown in Figure 2(b). The area under this curve provides an average local alignment score for a given case base: CaseBaseAlign = ∑ CaseAlign(ci) / N where N is the number of cases in the case base. Like with complexity profiles used in classification tasks [12], alignment profiles provide useful insight into individual cases as well as groups of cases that exhibit similar alignment characteristic. We expect that such profiles can be exploited for maintenance purposes in the future, for instance, the focus of case authoring could be informed by areas of poor alignment. 4.2 Case Cohesion Case cohesion is a measure of overlap between retrieval sets in the problem and solution side. This is measured by looking at the neighbourhood of a target case in both the problem and the solution side. We retrieve cases which are close to the case in the problem as well as solution side, within some threshold, to form the sets RSprob and RSsoln. The degree to which RSprob and RSsoln are similar is an indication of the cohesion of the case. Cases which have RSprob and RSsoln identical will have a strong cohesion, and those which have RSprob and RSsoln completely different will have weak cohesion. This concept is defined below. The nearest neighbour set of a case, t, on the problem and solution sides are given by: RSprob(t) = { ci∈CB: Dprob(t,ci) < δprob } RSsoln(t) = { ci∈CB: Dsoln(t,ci) < δsoln }
450
M.A. Raghunandan et al.
δprob and δsoln are the distance thresholds on the problem and solution side respectively. Functions Dprob and Dsoln are as defined in Section 4.1 and compute pair-wise distances on the problem and solution side. The intersection and union of these two retrieval sets are then used to calculate case cohesion for a case: CaseCohesion(ci) = | RSprob(ci) ∩ RSsoln(ci)|/| RSprob(ci) ∪ RSsoln(ci) | Then case base cohesion is the average case cohesion of cases in the case base: CaseBaseCohesion = ∑ CaseCohesion(ci) / N where N is the number of cases in the case base. Figure 3(a) illustrates the cohesion calculation for a case target. Like case alignment here distances are computed to identify nearest neighbours not only in the problem space but also in the solution space. However unlike with case alignment here distances are not directly utilised in the cohesion computation. Instead it is the retrieval sets that are compared. As a result case cohesion scores are less granular compared to alignment scores. This can be clearly observed when comparing the cohesion profile for a case base in Figure 3b with its alignment profile in Figure 2b.
Fig. 3. a) Case cohesion calculation b) Case base cohesion profile
5 Global Alignment Measure A global alignment measure derives alignment scores by comparing problem and solution space clusters. Although local case base profiles provide a map of well aligned and poorly aligned areas of the case base they do so without showing interesting patterns, regularities and associations in the case base in relation to the features. For this purpose, work in [13] has adopted a “case base image” metaphor as the basis for a Global Alignment Measure: whereby a textual case base is viewed as an image generated from its case-feature matrix such that visually interesting associations are revealed. Unique to this approach is that clusters can be used to observe both cases as well as features. Importantly for alignment this approach can be used to form and compare clusters in both the problem and solution space.
Evaluation Measures for TCBR Systems
451
5.1 Image Metaphor Consider the trivial case-feature matrix in Figure 4 taken from [15]. Here examples of nine textual cases and their corresponding binary feature vector representation appear in a matrix where shaded cells indicate presence of a keyword in the corresponding case. Very simply put, such a matrix, with shaded cells, is the “case base as image” metaphor. A 2-way stacking process is applied to transform such a matrix into its clustered image where useful concepts in the form of underlying term associations or case similarity patterns can be observed.
Fig. 4. Binary vector representation of 9 cases taken from the Deerwester Collection [15]
5.2 Case-Feature Stacking Algorithm The aim of the 2-way stacking algorithm is to transform a given matrix such that similar cases as well as similar features are grouped close together. Given a random initial ordering of cases and features, an image of the case base is obtained by row stacking followed by column stacking (see Figure 5). Given the first row, stacking identifies the most similar row from the remaining rows and makes that the second row. Next the row that is most similar to the first two rows is chosen to be the third row. But in calculating this similarity, more weight is given to the second row than the first. This process continues till all rows are stacked. A similar iterative approach is applied to column stacking, but this time, columns are selected instead of rows. Essentially column stacking employs feature similarity while row stacking is based on case similarity. The weighted similarity employed here ensures that more recently stacked vectors play a greater role in deciding the next vector to be stacked. This ensures continuity in the image whereby a gradual transition in the image is achieved by a decaying weighting function. Unlike a step function which only considers the most recent stacking and disregards previously stacked vectors a decaying function avoids abrupt changes across the image. Figure 6 shows the resultant images for the matrix in Figure 4 after row and column stacking is applied. 5.3 Stacked Global Alignment Measure Alignment can be measured by comparing the stacked matrices resulting from problem and solution side stacking. The matrix Mp obtained by stacking the problem side
452
M.A. Raghunandan et al.
Fig. 5. The stacking algorithm
Fig. 6. Images from Deerwester Collection after row stacking and column stacking
shows the best possible ordering of the problem space and Ms similarly from the solution side. In order to establish alignment between Mp and Ms a third matrix is generated Msp by stacking the solution space using the case ordering obtained by problem side stacking. Figure 7 demonstrates how case stacking from the problem side is enforced on the solution side to obtain matrix Msp (the matrix in the middle). Here the 5 cases are ordered similar to that with Mp while the 4 features are ordered according to Ms. The more similar the case ordering of Msp to that of Ms the greater the alignment between problem and solution space. We quantify this alignment by measuring the average similarity between neighbouring cases in matrix Ms and Msp.
Fig. 7. Three matrices used in the global alignment measure
Evaluation Measures for TCBR Systems
453
The average similarity of a case to its neighbours in the matrix (except the first) is given by: Sim(ci) = ∑ j=1 to k sim(ci, ci-j)* (1/j) This is a weighted sum of similarity values to the k previously allocated cases in the matrix with the weight decreasing exponentially. sim(ci,cj) is the cosine similarity function in our experiments. The average similarity value for the matrix is: Sim(M) = (∑ i=2 to N Sim(ci) ) / (N-1) We calculate the average similarity values for the matrices Ms and Msp, to get Sim(Ms) and Sim(Msp). The global alignment value is now calculated as: GlobalAlign = Sim(Msp) / Sim(Ms) Ms is the optimally stacked matrix, and Msp is the matrix obtained by arranging the cases according to the problem side ordering. Hence, Sim(Ms)≥ Sim(Msp). For well aligned datasets, the best problem side ordering should be close to the best solution side ordering and the GlobalAlign ratio will be close to 1. On the other hand, for poorly aligned datasets, there will be considerable difference between the problem and solution side orderings and the GlobalAlign ratio should be much less than 1.
6 Evaluation and Experimental Results We wish to evaluate the effectiveness of the local and global alignment measures. To do this we evaluate how well each measure correlates with classifier accuracy, over 10 case bases. A strong correlation indicates that the measure is performing well and provides a good indication of case base alignment. In our experiments we use cosine similarity to measure pair-wise similarity between cases in the alignment algorithms and, in conjunction with k-NearestNeighbour (k-NN) with weighted voting, to calculate classifier accuracy. A leaveone-out experimental design is used and for each algorithm a paired series of results for alignment and accuracy are calculated and compared to give the correlation coefficient. 6.1 Dataset Preparation The evaluation uses the UK National Health Service (Grampian) Health and Safety dataset, which contains incident reports in the medical domain. The cases consist of the description of the situation or incident, and the resolution of the situation, or what action was taken when the incident occurred. The cases also have a class label called “Care Stage,” which can be used as a solution class for this dataset. The NSHG dataset contains a total of 4011 cases distributed across 17 classes. We generated 10 different datasets, each containing about 100 cases and varying number of classes, from 2 to 10. The cases are preprocessed using the OpenNLP library, available as a part of the jCOLIBRI [16] framework. First, cases are organized into paragraphs, sentences, and tokens, using the OpenNLP Splitter. Stop words are removed; words are stemmed and then tagged with part-of-speech. The remaining
454
M.A. Raghunandan et al.
features are pruned to remove terms which occur in less than 2% of the cases. After these operations, the vocabulary size of each dataset is not more than 200. 6.2 Evaluation Methodology Alignment measures are compared to each other by their correlation to problem side accuracies. How confident are we that problem side accuracy based on class labels can be used so? To answer this, we analysed classifier accuracies on problem and solution sides. Essentially a leave-one-out test on each case base was used to calculate classifier accuracy for problem and solution sides separately. We use weighted 3-NN with the problem-side representation using class labels as pseudo-solutions, to calculate the problem-side accuracy (a similar approach was followed for solution side accuracies). The 10 datasets and their problem and solution-side accuracies appear in Figure 8. The acronym of each dataset includes the number of classes and approximate size of the case base. For example, the acronym CB3-100 indicates a case base with 3 classes and a total of 100 cases.
Fig. 8. Problem and Solution side accuracy values
The problem-side accuracies vary from a low 47% (10-class) to a high 93% (2class) and highlights different complexities of our case bases. The solution side accuracy is a measure of confidence associated with the class labels of each dataset. These remain mostly constant and high (i.e. from 90% to 98%) indicating that the solution side concepts have a strong relationship with the class labels. Hence we are justified in our use of class labels to calculate classifier accuracy on the problem side. We now have an objective measure of the datasets, which can be used to judge our alignment measures. Essentially this allows us to correlate alignment values obtained for all the datasets, with their problem side classifier accuracy values. 6.3 Results with Local and Global Alignment The case alignment measure for each dataset was calculated with different values of the parameter k, as explained in Section 4.1. Fig 9 shows a graph of Case Alignment
Evaluation Measures for TCBR Systems
455
Fig. 9. Comparison of Accuracy and Case Alignment values
values plotted against accuracy for all the datasets. Generally the values of Case Alignment follow the values of accuracy closely for all values of k, i.e., the case bases with low accuracy also have low alignment, and vice-versa. We found positive correlation with all values of k with maximum correlation, i.e., 0.935, with k=9. Unlike case alignment, with case cohesion (Section 4.2) we need to specify two parameters: the problem and solution neighborhood thresholds, δprob and δsoln. Figure 10 plots the correlation with accuracy for case cohesion with different values of δprob and δsoln. Each correlation point on the graph is obtained by comparing cohesion values for all 10 datasets with the accuracy values for a given δprob and δsoln. Here the x-axis consists of 4 δprob threshold values and each graph line corresponds to results with 1 of 4 δsoln threshold values. Generally cohesion is sensitive to the threshold values and best correlation (0.983) is achieved with δsoln=0.8 and no correlation (i.e. close to zero) with δsoln= 0.4. Figure 11 provides a closer look at the 10 cohesion values obtained for each dataset with δsoln=0.8, for different values of δprob. Best correlation of 0.983 is achieved with δprob=0.4 and δsoln=0.8.
Fig. 10. Cohesion-Accuracy correlation with different δprob and δsoln threshold values
456
M.A. Raghunandan et al.
Fig. 11. Comparison of Accuracy and Cohesion values
The global alignment measure for each dataset was calculated as explained in Section 5. Figure 12 plots these values against the accuracy values for different k. Here the k parameter refers to the number of previously stacked cases that contribute to the weighted similarity computation. As with the local measures, global alignment also has good correlation with accuracy. However like cohesion it is also sensitive to the k parameter in that correlation increases with k at first, from 0.63 (with k=3) to 0.84 (with k=30), and decreasing thereafter.
Fig. 12. Comparison of Accuracy and Global alignment values
6.4 Discussion on Evaluation Results Generally strong correlation with accuracy is observed with all three alignment measures. The best correlation values were 0.935 for the case alignment measure, 0.983 for the case cohesion measure, and 0.837 for the global alignment measure. Overall, the local measures result in better correlation than the global measure. This is because local measures are better able to capture k-NN’s retrieval performance. However positive correlation with the global measure is encouraging, because unlike local
Evaluation Measures for TCBR Systems
457
measures its visualisation aspect (i.e. the case base image metaphor) creates interesting opportunities for TCBR interaction and maintenance. All three approaches require parameter tuning. Case cohesion is very sensitive to the threshold values (δprob and δsoln) and so these must be chosen carefully. The global alignment measure is slightly less sensitive, whilst the case alignment measure shows very little variation with k and therefore is most consistent of the three.
7 Conclusions Evaluation is a challenge in TCBR systems. However to our knowledge there have been no comparative studies of evaluation measures. Here we consolidate existing and novel TCBR evaluation measures in a correlation study with accuracy. Results show strong correlation thus allowing us to conclude that such measures can be suitably applied to evaluate TCBR systems, in the absence of class knowledge. Local measures showed stronger correlation with accuracy compared to the global alignment measure. It is our observation that the local case alignment measure is most consistent because it is least sensitive to parameter tuning. In future work evaluation measures could be utilized to optimize TCBR systems. In particular we would be interested in applying alignment measures as a fitness function for feature weighting or selection algorithms. The measures could also be applied for maintenance, in particular to identify neighbourhoods with poorly aligned problem and solution concepts. Local profiles and stacked images of case bases provide useful insight with potential for interactive knowledge acquisition tools.
Acknowledgements RGU and IIT exchanges are funded by UKIERI. The dataset for this work was provided by NHS Grampian, UK.
References 1. Weber, R., Ashley, K., Bruninghaus, S.: Textual CBR. Knowledge Engineering Review (2006) 2. Wiratunga, N., Craw, S., Rowe, R.: Learning to adapt for case based design. In: Proc. of the 6th European Conf. on CBR, pp. 421–435 (2002) 3. Bruninghaus, S., Ashley, K.: Evaluation of Textual CBR Approaches. In: AAAI 1998 workshop on TCBR, pp. 30–34 (1998) 4. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proc. of European Conf. on ML, pp. 137–142 (1998) 5. Richter, M.: Introduction. In: Case-Based Reasoning Technology: From Foundations to Applications, pp. 1–15 (1998) 6. Glick, N.: Separation and probability of correct classification among two or more distributions. Annals of the Institute of Statistical Mathematics 25, 373–383 (1973) 7. Wallace, S., Boulton, D.M.: An information theoretic measure for classification. Computer Journal 11(2), 185–194 (1968)
458
M.A. Raghunandan et al.
8. Marchette, D.J.: Random Graphs for Statistical Pattern Recognition. Wiley Series in Probability and Statistics (2004) 9. Singh, S.: Prism, Cells and Hypercuboids. Pattern Analysis & Applications 5 (2002) 10. Vinay, V., Cox, J., Milic-Fralyling, N., Wood, K.: Measuring the Complexity of a Collection of Documents. In: Proc of 28th European Conf on Information Retrieval, pp. 107–118 (2006) 11. Lamontagne, L.: Textual CBR Authoring using Case Cohesion. In: 3rd TCBR 2006 Reasoning with Text, Proceedings of the ECCBR 2006 Workshops, pp. 33–43 (2006) 12. Massie, S., Craw, S., Wiratunga, N.: Complexity profiling for informed case-base editing. In: Proc. of the 8th European Conf. on Case-Based Reasoning, pp. 325–339 (2006) 13. Chakraborti, S., Beresi, U., Wiratunga, N., Massie, S., Lothian, R., Watt, S.: A Simple Approach towards Visualizing and Evaluating Complexity of Textual Case Bases. In: Proc. of the ICCBR 2007 Workshops (2007) 14. Massie, S., Wiratunga, N., Craw, S., Donati, A., Vicari, E.: From Anomaly Reports to Cases. In: Proc. of the 7th International Conf. on Case-Based Reasoning, pp. 359–373 (2007) 15. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by Latent Semantic Analysis. JASIST 41(6), 391–407 (1990) 16. JCOLIBRI Framework, Group for Artificial Intelligence Applications, Complutense University of Madrid, http://gaia.fdi.ucm.es/projects/jcolibri/jcolibri2/index.html