Cautious Induction in Inductive Logic Programming - CiteSeerX

Report 4 Downloads 145 Views
Cautious Induction in Inductive Logic Programming Simon Anthony and Alan M. Frisch Department of Computer Science, University of York, York. YO1 5DD. UK. Email: fsimona, [email protected]

Abstract. Many top-down Inductive Logic Programming systems use a

greedy, covering approach to construct hypotheses. This paper presents an alternative, cautious approach, known as cautious induction. We conjecture that cautious induction can allow better hypotheses to be found, with respect to some hypothesis quality criteria. This conjecture is supported by the presentation of an algorithm called CILS, and with a complexity analysis and empirical comparison of CILS with the Progol system. The results are encouraging and demonstrate the applicability of cautious induction to problems with noisy datasets, and to problems which require large, complex hypotheses to be learnt.

1 Introduction Within the Inductive Logic Programming (ILP) paradigm [4, 2], many of the topdown (or re nement-based ) systems, such as FOIL [5] and Progol [3], employ a greedy, covering approach to the construction of a hypothesis. Informally, from the training sample and background knowledge, the learner searches a space of clauses to nd a single clause which covers some subset of the positive examples, and is \best" with respect to some quality criteria. This is shown conceptually in gure 1, where the +'s represent positive examples from a training sample, and the set labelled c represents the set of positive examples covered by the clause c. c

Fig. 1. Clause c covering a subset of the positive examples. Once found, c is added to the hypothesis and the positive examples it covers are removed from the training sample. This process is then repeated with the

remaining examples and background knowledge to nd a further clause c0 , as shown in gure 2. Clause c0 is also added to the hypothesis and the positive examples it covers are removed from the training sample. This process continues until the hypothesis covers all the positive examples. Notice that a repeated search of the space of clauses is necessary, and that each search is with respect to an increasingly small set of positive examples. Therefore, the selection of a clause is dependent on the clauses already included in the hypothesis. For instance, the selection of c0 was based upon its coverage of the positive examples left uncovered by c. In other words, repeatedly selecting the locally \best" clause is assumed to determine the globally \best" hypothesis. Covered by c

Remaining positives

c’

Fig. 2. Clause c covering a subset of the remaining positive examples. 0

This assumption will not hold in general. For instance, consider the seven positive examples in the problem depicted above. The space of clauses might contain clauses d and d0 which, if hypotheses containing fewer clauses are preferred, together form a \better" hypothesis, covering all the positive examples. This is shown in gure 3. d’ d

Fig. 3. Two clauses, d and d , covering the seven positive examples. 0

Cautious Induction is an approach which performs a single, complete search through the space of clauses and retains a nonempty set of candidate clauses, all of which meet some quality criteria. A subset of these candidate clauses is then

selected to form a hypothesis which covers all the positive examples. Informally, the task of searching for the candidate clauses has been separated from the task of choosing which clauses to include in the hypothesis. Accordingly, our conjecture is that cautious induction avoids the dependency on the order of clause selection, and can allow \better" hypotheses, such as the hypothesis fd; d0 g in the example above, to be found. This conjecture is supported by the presentation of an algorithm called CILS (Cautious Inductive Learning System) which implements this approach. CILS is validated by a complexity analysis and an empirical comparison against the Progol system. In particular, this comparison highlights the potential of cautious induction for learning complex hypotheses, and also for learning in the presence of noise. The rest of this paper is organised as follows: the learning setting is de ned in section 2. Re nement is introduced in section 3 together with a description of CILS. Section 4 introduces Progol and reports both a theoretical and an empirical comparison against CILS, and we draw some conclusions in section 5.

2 The Learning Setting Unless otherwise noted, this paper will assume the following learning setting, which we shall refer to as the normal setting. The teacher selects a target concept and provides the learner with a nite, nonempty training sample of examples, each of which is correctly labelled1 as either positive or negative. From this training sample and any available background knowledge the learner constructs a hypothesis of the concept. Since our representation is that of logic programs, examples are atoms built over a target predicate and the background knowledge is a nite set of de nite clauses2. The hypothesis is represented as a nite set of nonrecursive de nite clauses whose heads are built over the target predicate and whose bodies consist of literals built over predicates de ned in the background knowledge. It will be useful to introduce the following relationship between a hypothesis and an example, known as coverage.

De nition 1 (Coverage) Let B denote background knowledge, T a training sample and c a clause. An example e 2 T is said to be covered by c, with respect to B , if B [ fcg j= e, and uncovered otherwise. Example e is said to be covered by a hypothesis H , with respect to B , if H contains at least one clause which covers e, and uncovered otherwise.

In this setting, a hypothesis H must satisfy the following two conditions with respect to the background knowledge and a training sample T : { Completeness: H must cover every positive example in T , and { Consistency: H must leave every negative example in T uncovered. 1 2

We shall later relax this assumption to allow noisy data. Throughout this paper clauses shall be represented as a set of literals.

A hypothesis satisfying these conditions is said to explain, or to be correct with respect to, that training sample. Additionally, we shall require that each clause within a hypothesis must cover at least one positive example (to avoid redundancy). Since more than one hypothesis may explain a given training sample, we introduce the following two quality criteria by which one hypothesis H is to be preferred to another.

{ H should be as accurate as possible in classifying unseen examples, and { H should be as small as possible, under the assumption that simpler explanations are more likely to be correct.

The question of how these two criteria are to be measured is postponed until section 4.3. We conclude this section with one nal notational convention: the cardinality of a set of objects S shall be denoted jS j.

3 The Search for Clauses We now turn to consider the task of nding the clauses which make up a hypothesis. From the outset it will be useful to de ne the size of a clause as the total number of literals in its head and body, and to introduce subsumption in order to allow us to talk of the generality and speci city of clauses.

De nition 2 (Subsumption) Let c and c0 denote two clauses. Clause c subsumes clause c0 i there exists a substitution  such that c  c0 . We say that c is more general than c0 , and, conversely, that c0 is more speci c than c.

3.1 Conventional Re nement Search by re nement can be viewed as the search of a tree3 of clauses. Informally, search begins with a set of initial clauses and an evaluation function is used to select a clause from this set. If this clause satis es some goal condition the search terminates, otherwise the selected clause is removed. Provided the evaluation of the clause meets some quality criteria (veri ed by a pruning function), a re nement operator is applied to generate a set of re nements (incremental specialisations) which are added to the set. An incremental specialisation of a clause c is typically de ned to be the addition of a literal, built over a predicate de ned in the background knowledge, to the body of c. Each such predicate is often accompanied by a nonempty, nite set of usage declarations (sometimes referred to as mode declarations) which impose type and mode restrictions on the literals that are to be added to a clause. More precise de nitions of these concepts are provided below. 3

Re nement is often viewed as the search of a graph. By allowing a clause to appear more than once, we simplify our presentation to talk in terms of trees.

De nition 3 (Usage Declarations) Let c denote a clause and let p denote

either the target predicate or a predicate de ned in the background knowledge. A usage declaration for p speci es the functional structure that each argument of p must have, and restricts the symbols that may occupy positions within each argument when a literal, built over p, is added to c. Positions that are to be occupied by a variable symbol are given one of the following two modes: { Positive { indicating that this position must contain a variable symbol already present in c, or { Negative { indicating that this position may contain any variable symbol. Positions that are to be occupied by a constant are given the mode declaration Hash. Each of these mode declarations is accompanied by a type declaration. A literal is said to satisfy a usage declaration, with respect to a clause c, if its functional structure matches that of the declaration, and the positions in its arguments are occupied by appropriate symbols of the correct mode and type.

Although very similar to the Mode declarations used by Progol, usage declarations do not include a recall . The recall is described in section 4.1.

De nition 4 (Initial Clauses) Let p denote the target predicate of a learning problem and U a set of usage declarations for p. The set of initial clauses, denoted IC , is the nonempty set of all bodiless clauses whose heads are built over p and satisfy U (up to variable renaming).

De nition 5 (Re nement Operator) Let c denote a clause, U a set of usage declarations and pred(B ) the set of predicates de ned in the background knowledge B . Furthermore, for a predicate p 2 pred(B ), let litU (c; p) denote the set of literals built over p that satisfy U with respect to c. A re nement operator  is a function from a single clause to a set of clauses, each member of which contains an additional body literal. (c) is therefore de ned as: (c) = fc [ flg j l 2 litU (c; p) and p 2 pred(B )g The set of all clauses that can be generated by the repeated application of  to a clause from IC is known as the re nement space. Each clause in this space is known as a re nement. Typically a teacher speci ed size parameter is also used, restricting the size of the re nements and ensuring that the re nement space is nite.

De nition 6 (Evaluation Function) Let T denote a training sample and c a

clause. An evaluation function returns a real number for c based on a number of attributes of c, typically: { the number of positive and negative examples from T covered by c, and { the size of c. Given a set of clauses C , best(C ) denotes a clause from within C which maximises the evaluation function.

The implementation of the evaluation function dictates the strategy adopted by the re nement operator in searching the re nement space. Typically this function is de ned to allow an A* strategy to be used.

De nition 7 (Pruning Function) Let c denote a clause. The boolean func-

tion prune(c) is true if it is expected that c and all its subsequent re nements do not meet the quality criteria for a hypothesis clause, and false otherwise. This judgement is usually made by deciding if the value of eval(c) is likely to increase or decrease with further re nement.

De nition 8 (Goal Condition) Typically the goal condition of a search by

re nement is satis ed when a consistent clause is selected, and that clause can be shown to be the \best" clause in the re nement space, with respect to the quality criteria.

A top-down learning system is distinguished by its implementation of each of these concepts. These implementations are combined to produce a strategy for searching the re nement space, as shown in the following algorithm.

Algorithm 9 (Search by Re nement) Let best denote an evaluation func-

tion,  a re nement operator, prune a pruning function, and IC the set of initial clauses in some learning problem. The following algorithm searches a re nement space and returns the rst clause satisfying the goal condition:

C IC c best C c C C ? fcg prune c false C C [ c c best C

let = let = ( ) while does not meet the goal condition let = if ( )= then let = () let = ( ) end while return

c

The algorithm will always return a clause since IC is nonempty by de nition.

3.2 Cautious Re nement in CILS Recall that in cautious induction a single, complete search of the re nement space is performed, and a set of candidate clauses that satisfy the quality criteria are retained. Since we are no longer looking for an individual clause in the re nement space the search terminates when all clauses in the re nement space have been either examined or pruned. Consequently, CILS requires rather di erent pruning conditions from other learning systems. In deciding whether to prune a clause CILS examines both the coverage and size of the clause, and, in particular, its relationship to the candidate clauses that have already been found. Both of these criteria exploit CILS' evaluation function, which is de ned below. The trade-o used between the coverage and size of a clause is similar to that employed by

other learning systems such as Progol [3], and provides an e ective yet simple way of guiding the search towards clauses of a suitable generality. Algorithm 10 (CILS' Evaluation Function) Let c denote a clause, p(c) and n(c) the number of positive and negative examples from training sample T covered by c, and sz(c) the size of c. The evaluation function eval is given by: eval(c) = p(c) ? (n(c) + sz(c)) Algorithm 11 (CILS' Pruning Function) Let c denote a clause and eval denote CILS' evaluation function. Furthermore, let p(c) denote the number of positive examples from training sample T covered by c, sz(c) the size of c, and size a teacher speci ed restriction on the size of any clause in the re nement space. The function prune(c) is true if any of the following conditions hold for c: { p(c) < sz(c) in which case eval(c) < 0, and for any subsequent re nement c0 of c, eval(c0 ) < 0, { sz(c) > size, or { a \better" candidate clause has already been found. If the selected clause c covers a set P of positive examples, then a \better" clause c0 covers a set P 0 of positives where sz(c)  sz(c0 ) and P  P 0 . Each predicate de ned in the background knowledge is accompanied by a nite, nonempty set of usage declarations which, together with a size parameter, form the syntactic bias used by CILS. Where the usage declaration requires one or more constants to appear in a literal, each positive example is used to nd potential constants. A new literal is then generated for each distinct alternative. Since CILS examines or prunes every re nement, its search by re nement di ers from algorithm 9, and is presented below. Algorithm 12 (CILS' Search by Re nement) Let Can denote the set of candidate clauses, C a set of clauses and T + the set of positive examples from a training sample T . Furthermore, let best denote CILS' evaluation function, prune CILS' pruning conditions,  CILS' re nement operator and IC the set of initial clauses. CILS' search by re nement proceeds as follows:

C IC Can T C6 ; c best C C C ? fcg prune c false c eval(c)  0 Can Can [ fcg

let = let = + while = let = ( ) let = if ( )= then if is consistent and let = otherwise let = () end while return

C C [ c

Can

then

Each positive example is itself included as a candidate clause since it has an evaluation of 0. This also ensures that a complete hypothesis can always be generated from the set of candidate clauses. Currently a greedy cover-set algorithm is used to construct the hypothesis from the set of candidate clauses. Despite this admittedly greedy aspect of CILS, notice that each candidate clause has been found independently of the others.

4 Comparison of CILS with Progol This section compares, with a complexity and an empirical analysis, CILS against the ILP learner Progol [3]. Progol has been successfully applied to a variety of real-world applications and is regarded by many as the state-of-the-art in ILP systems. To aid our comparison of these algorithms an introduction to Progol is given in the following subsection.

4.1 The Progol Algorithm Progol adopts a greedy, covering approach to the construction of a hypothesis. To avoid generating clauses which cover no positive examples, Progol bounds each re nement space from below by a most speci c clause, ?, constructed from a single positive example4 . ? is required to cover the positive example e used in its construction, and, since every re nement must subsume ?, they must all cover e. This is shown in gure 4. Thus Progol uses two algorithms during learning: one to construct ?, and one to search the ensuing re nement space. The following two subsections outline these algorithms in order to give some insight into our evaluation of Progol.

Construction of ? Since ? is most speci c, it contains all the literals that

may appear in any re nement. The target predicate, and each predicate de ned in the background knowledge are each accompanied by a usage declaration, and a positive integer known as the recall 5 . For recall r, all possible literals that satisfy a usage declaration are each added at most r times to ?, with a bound placed on variable depth to ensure this process terminates.

Search by Re nement An A* search strategy is used to search the re nement space. In particular, since any clause in this space must subsume ?, the literals in the body of such a clause must be a subset of those in the body of ?. However, notice that if a binding between variables in a literal in ? is split when that literal is added to a clause, the resulting re nement will still subsume ?. Progol allows variable bindings to be split in this way, provided that each resulting literal 4

In fact, Progol treats a training sample as a sequence of examples, and ? is constructed from the rst example in this sequence. 5 Given a literal l which satis es a usage declaration, the recall determines how many solutions of l are found, and each is added as a separate literal to the body of ?.

IC Refinement space

⊥ Examples e

Fig. 4. The re nement space in Progol, bounded from above and below. satis es a usage declaration for the predicate over which it is built. Therefore, informally, Progol's re nement operator passes left to right over the literals in the body of ?. For a particular literal l and clause c being re ned, the set of re nements of c consists of clauses of the form c [ fl0 g, where each l0 is a copy of l having a di erent split of the variable bindings in its arguments. In spite of the di erences in the re nement operator, Progol's goal condition, evaluation function and pruning conditions are of the form described in section 3.1.

Criticisms of Progol Whilst the use of ? ensures that each re nement covers at least one positive example, a number of other problems are introduced. We outline two that are of particular interest to us here: { any constant symbol in a literal in a re nement must be present in the corresponding literal in ?. Thus all constant symbols must be determined from a single positive example. { if the training sample is noisy, ? may be constructed from a noisy positive example, possibly adding an unwanted clause to the hypothesis and a ecting the remainder of the hypothesis' construction. 4.2 Analysis of Algorithm Complexity

This subsection compares the space complexities of CILS and Progol. The complexity bounds presented are slightly simpli ed to improve their readability. It will be useful to begin by introducing the concepts of Bell Numbers and DDBF trees.

Bell Numbers and DDBF -trees De nition 13 (Bell Numbers) The Bell Number B(m) is the number of dis-

tinct partitions of m objects. B (m) is de ned by the following expression: 8 if m = 0 > Pm?1 m ? 1 B (i) otherwise : i=0

i

Bell numbers allow us to determine the cardinality of the set of initial clauses

IC , as is demonstrated by the following theorem.

Theorem 14 (The cardinality of IC ). Let p denote a target predicate having at most j positions in its arguments, and let n (0  n  j ) denote the number of positions in p that are to contain constant symbols. Furthermore, let T denote a set of n-tuples of possible constant values for the positions in p. The cardinality of IC is given by: B(j ? n):jT j Proof. Consider the (j ? n) positions in the arguments of p as a set S , and a variable symbol as a partition of S . Up to variable renaming, there are B(j ? n) distinct partitions, by variable symbols, of its (j ? n) positions. For each distinct partition there are jT j di erent n-tuples for the n constants, giving B(j ? n):jT j unique literals built over p in IC . 2

Recall that the re nement space searched by CILS can be represented as a tree. However, it turns out that the branching factor of this tree varies with its depth since the number of distinct variables in a clause (and hence the number of literals satisfying a usage declaration) increases as more literals are added to that clause. In order to capture this phenomenon, we introduce the concept of a Depth-Dependent Branching Factor tree (DDBF -tree). Such a tree is shown in gure 5.

De nition 15 (DDBF -tree) A DDBF -tree is a tree whose branching factor at a given node is dependent on the depth of that node.

Depth i=0 i=1

... ...

i=2

Fig. 5. A DDBF -tree with branching factor 3(i + 1) at depth i. For branching factor bi at depth i, it can be seen that the number of nodes at a given depth n (n > 0) is at most (bn?1 )n . Hence the total number of nodes in a DDBF -tree, up to and including those nodes at depth n, is at most n:(bn?1 )n .

Space Complexity of CILS CILS' re nement space is a forest of DDBF trees, the cardinality of the forest being precisely the cardinality of the set IC . Furthermore the cardinality of CILS' re nement operator when re ning a clause of size s is the branching factor of the DDBF -tree at depth s. Therefore the space complexity of CILS is the total number of nodes in each of the DDBF trees. By determining an upper bound on the cardinality of CILS' re nement operator we can determine the space complexity of CILS. The proofs of the following theorem and its corollary are given in appendix A. Theorem 16 (Complexity of CILS' Re nement Operator). Let c denote a clause being re ned, let sz(c) denote the size of c and let j denote an upper bound on the number of positions in the arguments of the target predicate and all predicates de ned in the background knowledge. Furthermore, let U denote a set of usage declarations and T + the set of positive examples from training sample T . The cardinality of CILS' re nement operator is upper bounded by: (sz(c):j )j :B(j ):jT + j:jU j Corollary 17 (Space Complexity of CILS). Let s denote a teacher speci ed restriction on clause size and let j , U and T + be de ned as in theorem 16. The space complexity of CILS is upper bounded by: (s:j:B(j ):jT + j:jU j)js+1 In the worst case, CILS must search each node in this re nement space. In addition, a subset of the resulting set of candidate clauses must be selected to form the nal hypothesis. Space Complexity of Progol By comparison, we present the following two theorems based on those given by Muggleton [3] for the space complexity of Progol. Theorem 18 (Space complexity of ? ([3], theorem 26)). Let jU j denote the cardinality of the set of usage declarations U , and let r denote an upper bound on the recall associated with these declarations. Furthermore, let i denote a teacher speci ed upper bound on variable depth in ?, and let j denote an upper bound on the number of positions in the arguments of any predicate. The size of ? is upper bounded by: (r:jU j:j )2ij Theorem 19 (Space complexity of Progol ([3], theorem 39)). Let s denote a teacher speci ed restriction on the size of a clause, j?j denote the size of ? given in theorem 18, and let j denote the upper bound on the number of positions in the arguments of any predicate. The size of the re nement space searched by Progol is upper bounded by: j?js :j:sj

Notice that in the worst case, Progol must construct ? and search the ensuing re nement space once for each positive example in the training sample.

Discussion The expression bounding the space complexity of CILS reveals

two unfortunate aspects of the algorithm. Firstly, the size of the re nement space grows polynomially in the cardinality of the set of positive examples. This results from the constant selection algorithm used in CILS, which examines each positive example to nd constants to include in a re nement. In contrast, Progol uses the recall r of each usage declaration in the construction of ? to generate r potential tuples of constants. This di erence is negligible where the recall is equal to the cardinality of the set of positive examples. Secondly, the re nement space searched by CILS contains some redundancy since a separate tree is searched for each member of IC . Progol avoids this diculty by using ? as a lower bound on the re nement space. Recently, Srinivasan [6] has developed an induce-max operator for Progol which constructs and searches a re nement space for each positive example in the training sample. A greedy subset-cover algorithm constructs a hypothesis from the clauses found. Although similar to our approach, two important di erences remain:

{ the problem of constants in ? as noted in section 4.1, and { a repeated search of re nement spaces is still performed. The time required by induce-max is the same as that for the worst case of Progol, although notice that one re nement space is always constructed for each positive example. In conclusion, notice that the space complexity of CILS is of a similar order to that of Progol and both are exponential in j and s, and polynomial in jU j, jT +j and r. However, although CILS' re nement space is likely to be larger in general, CILS only constructs one such space. Notice also that the time required by Progol is dependent on the number of re nement spaces constructed and searched. In other words, Progol runs in time proportional to the number of clauses in the hypothesis whilst CILS runs in time independent of this. Hence, CILS is likely to run faster than Progol where the number of clauses in the hypothesis exceeds the proportion by which CILS' re nement space is larger than Progol's.

4.3 Empirical Comparison The Dataset CILS and Progol were compared using a reduced mutagenesis

dataset, made available in the Progol4.1 distribution [1]. The goal of learning is to identify the properties of a compound that make it active as a mutagen [7]. The target predicate is active/1, with background knowledge consisting of around 13,000 clauses, the majority of which are atomic. A noisy training sample of 101 positive and 60 negative examples is presented to each learner by the teacher. Until now we have assumed training samples to be free from noise, and so we pause to explain how each algorithm is extended to handle noisy examples.

Handling Noise Progol, unlike CILS, was originally designed to handle noisy data. In Progol, the completeness requirement for clauses is relaxed, with a clause being required to cover no more than a teacher speci ed positive integer of negative examples. This setting is known as the noise parameter. A similar noise parameter was introduced to CILS, allowing a clause to be added to the set of candidate clauses if it covers no more than the speci ed number of negative examples. Experimental Procedure and Results A 10-fold cross-validation test was

carried out on the data using a variety of settings for the noise and size parameters. The algorithms were compared by considering:

{ the average time taken to construct the 10 hypotheses at a particular noise

setting. Note that these runtimes are for an SGI Indy workstation. Additionally, note that Progol is written in C whilst CILS is written in Sicstus Prolog (version 3.0). { the average quality of the 10 hypotheses at a particular noise setting. Quality is measured in terms of the average number of clauses in the 10 hypotheses, together with their average accuracy in predicting the seen negative, and the unseen positive and negative examples.

The results of the experiments with the size parameter set to 2 are tabulated in table 1. Each line represents the average performance of the 10 hypotheses produced at each given level of noise.

Discussion Most signi cant is the markedly di erent e ects of the noise parameter on the hypotheses produced. Higher levels of noise allow CILS to construct smaller, more general hypotheses covering more positive and negative examples than those of Progol. This behaviour stems from the greedy approach to hypothesis construction used by Progol, in which a number of re nement spaces are searched, each with respect to progressively fewer positive examples. Since clauses found in these latter spaces have fewer positive examples to cover, they must also cover fewer negative examples if they are to satisfy Progol's goal condition. In contrast, the cautious search of a single re nement space in CILS allows all clauses to make equal use of the relaxed consistency condition. Two important di erences are apparent in the time taken to construct hypotheses by the two systems. Firstly, CILS' runtime was independent of the setting of the noise parameter whilst Progol's runtime increased as the level of noise decreased. This behaviour was anticipated from the complexity analysis of the previous subsection since the size of the hypothesis is inversely related to the noise setting. Secondly, despite its implementation in Prolog, CILS was signi cantly faster than Progol at all levels of noise. This results from the re nement spaces searched by CILS and Progol being of a similar size whilst the hypothesis contains a large number of clauses.

Progol Seen Unseen Noise Secs Size NC(%) PC(%) NC(%) TC(%) 15 169.0 34.4 90.0 61.4 81.7 68.9 10 194.0 37.2 93.0 55.4 80.0 64.6 5 368.8 48.3 98.5 49.5 88.3 64.0 2 406.9 51.5 100.0 49.5 91.7 65.2 1 406.9 51.5 100.0 49.5 91.7 65.2 0 406.9 51.5 100.0 49.5 91.7 65.2 Noise 15 10 5 2 1 0

Secs 63.5 58.1 57.1 57.1 57.4 57.5

CILS Seen Unseen Size NC(%) PC(%) NC(%) TC(%) 19.1 59.8 75.2 51.7 66.5 36.6 89.3 58.4 81.7 67.1 37.0 90.1 55.4 85.0 66.5 47.0 95.4 49.5 85.0 62.7 49.1 97.2 49.5 88.3 64.0 51.5 100.0 47.5 93.3 64.6

Table 1. Empirical Comparison of CILS against Progol. (Notes: Seen and Unseen refer to the examples that were used to construct and test the 10 hypotheses respectively, and Size refers to the average number of clauses in these hypotheses. PC, NC and TC indicate the average number of positive, negative and total examples correctly labelled by the 10 hypotheses.)

5 Conclusions The encouraging comparison of CILS against Progol has demonstrated the potential of cautious induction. In particular this analysis has highlighted its applicability to problems involving:

{ noisy datasets, since the search by re nement is not centred on a single, possibly noisy, positive example, and

{ large, complex hypotheses, since CILS runs in time independent of the number of clauses in a hypothesis.

We propose to continue this work in several directions. Firstly, we are currently investigating ways in which the redundancy in the re nement operator can be removed. This may be possible through the use of di erent forms of syntactic bias. Also, we are interested in altering the constant selection algorithm to allow the algorithm's time and space complexity to become independent of the cardinality of the training sample. This is likely to involve statistical sampling of positive examples from which to nd the possible constants. Finally,

further analysis and experimentation on a wider range of datasets should provide a clearer insight into the bene ts of cautious induction over conventional greedy approaches.

References 1. S. Muggleton. Progol 4.1 distribution, release notes and example sets. 2. S. Muggleton. Inductive logic programming. New Generation Computing, 8:295{ 318, 1991. 3. S. Muggleton. Inverse entailment and Progol. New Generation Computing, 13:245{ 286, 1995. 4. S. Muggleton and L. De Raedt. Inductive logic programming { theory and methods. Journal of Logic Programming, 19-20:629{679, 1994. 5. J. R. Quinlan. Learning logical de nitions from relations. Machine Learning, 5:239{ 266, 1990. 6. A. Srinivasan. P-Progol 2.1 distribution, release notes and example sets. 7. A. Srinivasan, S. Muggleton, M. J. E. Sternberg, and R. D. King. Theories for mutagenicity: A study in rst-order and feature-based induction. Technical Report PRG-TR-8-95, Oxford University Computing Laboratory, 1995.

Acknowledgements We thank David Page and Alistair Willis for their comments and suggestions concerning this work. The rst author is funded by an EPSRC studentship, and by a travel award from Data Connection Ltd.

A Proofs of theorem 16 and corollary 17 Proof (Theorem 16). Each of the sz(c) literals in c contains at most j distinct variables and so there are at most sz(c):j variables in c. Each new literal (constructed from a particular usage declaration  in U) to be added to c will require at most j existing variables. There are sz(jc):j ways of choosing j variables from those already in c which is upper bounded by (sz(c):j )j . The new literal will also require at most j new variables and at most j constant symbols. Since there are jT + j di erent tuples of constant symbols, there are at most B (j ):jT + j combinations of constants and new variable symbols, from theorem 14. Therefore there are (sz(c):j )j :B (j ):jT + j literals that satisfy each of jU j usage declarations, giving an upper bound on the cardinality of CILS' re nement operator of:

(sz(c):j )j :B (j ):jT + j:jU j

2

Proof (Corollary 17). From the expression for the total number of nodes in a DDBF -tree and the result of theorem 16, the space complexity of CILS is upper bounded by: (B (j ):jT + j):s(((s ? 1):j )j :B (j ):jT + j:jU j)s which, when simpli ed, gives an upper bound on CILS' space complexity of:

(s:j:B (j ):jT + j:jU j)js+1

2

This article was processed using the LATEX macro package with LLNCS style