Generating Numerical Literals During Re nement Simon Anthony and Alan M. Frisch Department of Computer Science, University of York, York. YO1 5DD. UK. Email: fsimona,
[email protected] Abstract. Despite the rapid emergence and success of Inductive Logic
Programming, problems still surround number handling|problems directly inherited from the choice of logic programs as the representation language. Our conjecture is that a generalisation of the representation language to Constraint Logic Programs provides an eective solution to this problem. We support this claim with the presentation of an algorithm called NUM, to which a top-down re nement operator can delegate the task of nding numerical literals. NUM can handle equations, in-equations and dis-equations in a uniform way, and, furthermore, provides more generality than competing approaches since numerical literals are not required to cover all the positive examples available.
1 Introduction Many of the concepts that we might wish to learn involve numbers. However, within the Inductive Logic Programming (ILP) paradigm [6], such numerical reasoning is poorly handled|a problem inherited from the choice of logic programs as the representation language. Several other researchers have addressed this problem. Within the representation language of conventional logic programs, Muggleton and Page [5] examined the use of higher-order logic within Progol, allowing equations to appear as arguments to higher-order predicates. More recently Srinivasan [8] has built-into Progol a form of lazy evaluation allowing constants in literals to be left unevaluated until re nement. In a positive only learning setting, Karalic's FORS system [3] constructs hypotheses by tting low error regression lines through disjoint subsets of the training sample. Sebag and Rouveirol [7] have considered extending the representation language to Constraint Logic Programs (CLPs), and examine the generation of constraints to discriminate between features of a single positive and several negative examples. A more detailed examination of several of these existing approaches is provided in Section 4. This paper addresses the problem within a top-down (or re nement-based ) setting, and advocates the generalisation of the representation language to that of CLP. An algorithm called NUM is presented which generates literals that are built over numerical predicates for addition to clauses. NUM therefore interacts with a re nement operator in the following way. The re nement operator continues to generate literals built over non-numerical predicates as before. However,
if a literal is to be built over a numerical predicate, this task is delegated to NUM. NUM returns a set of literals built over this numerical predicate which the re nement operator can use to produce re nements. By design, this interaction should be possible with most existing re nement operators in top-down ILP systems. The rest of this paper is organised as follows: Section 2 provides an introduction to CLP, and de nes the learning setting used throughout the paper. NUM is presented, by example, in Section 3, with Section 4 comparing our approach to existing work on number handling in ILP. We conclude in Section 5, and discuss further extensions and applications of this approach.
2 Background 2.1 The CLP Scheme
The CLP scheme [1] extends conventional logic programming by introducing a set (a signature) of interpreted predicate and function symbols, whose interpretation is xed by an associated -structure D. D consists of a set D, known as the computation domain, and an assignment of relations and functions on D to the symbols in which respects the arities of these symbols. Somewhat improperly, we shall consistently refer to uninterpreted predicate symbols as predicate symbols, and interpreted predicate symbols as constraint symbols. It will be useful to introduce the following notation and terminology. The existential closure of a logical formula shall be denoted 9 and the universal closure of shall be denoted 8. A D-model of a formula is a model of the formula with the same domain as D and the same interpretation for the symbols in as D. Thus, for formulas and , we write j=D to denote that is true in all D-models of . The set of de nable, uninterpreted predicate symbols shall be denoted . An ordinary atom has the form p(t1 ; :::; tj ), where the t1 ; :::; tj are terms and p 2 . Likewise, a constrained atom has the form p(t1 ; :::; tj ), where the t1 ; :::; tj are terms and p 2 . An atom is either an ordinary atom or a constrained atom and a literal is an atom or a negated atom. A CLP-program is a set of constrained clauses of the form h b1 ; :::; bn , where h is an ordinary atom and the b1 ; :::; bn are atoms. The pair h; Di is known as a constraint object and a constraint symbol is said to belong to any constraint object in which it appears in the signature . A constraint object X represents a particular instance of the CLP scheme, denoted CLP(X ), and de nes a CLP language. In order to implement a CLP language a constraint solver is required that is capable of solving conjunctions of constraints built over the constraint symbols in . For instance, Jaar et al. have developed and implemented the object < in the language CLP( C1 1 + C2 no ? 3:75 C2 t(1; 6) 6 > C1 1 + C2 yes 0:25 C1 3:75 C2 t(2; 4) 4 > C1 2 + C2 no 0:25 C1 3:75 C2 Notice that so far the c-variables are both lower bounded, and that the resulting numerical literal will cover three negative examples. The algorithm then considers each of the remaining positive examples in turn, and substitutes each into the literal schema to produce a constraint. These constraints are again added to the constraint store only if the resulting store is satis able. If a constraint cannot be added the corresponding positive example will be left uncovered by the resulting numerical literal. The eects of these constraints on C1 and C2 are shown in the following table. Positive Constraint
t(1; 4:5) 4:5 C1 1 + C2 t(0:5; 2:5) 2:5 C1 0:5 + C2 t(2; 6) 6 C1 2 + C2
Satis able C1 C2 yes 0:25 C1 < 1:5 3:75 C2 < 5 yes 0:25 C1 < 1:5 3:75 C2 < 5 yes 0:6_ C1 < 1:5 4 C2 < 5
Negatives Positives
y
7
4
1 -3
0 x
3
Fig. 3. Hypothesis enclosing the region containing the positives in training sample: C. Therefore from the choice of the pair ft(?1; 3); t(0; 3:5)g, the algorithm produces the literal schema: 6_ C1 < 1:5 Y C1 X + C2 where 04: C2 < 5 Other possible literal schemas are found by examining the other possible generating subsets. One solution of each of these schemas, together with those found for other instances of the linear in-equation usage declaration make up the set of literals returned to the re nement operator. NUM can then be called again to carry out the same process for each instance of the usage declaration for the quadratic in-equation. In particular, this results in the discovery of the following literal schema:
Y pow(X; 2) + C3 where 1 C3 < 2 Solutions of this schema cover all the positive examples and the two negatives left uncovered by the linear in-equation schema found above. Since none of the literal schemas introduce new variables, a learning system can be expected to nd a complete and consistent hypothesis similar to the one shown below and in Figure 3. t(X; Y ) Y pow(X; 2) + 2; Y 1 X + 4
3.3 Inexact Equations and Approximation Constraints As described thus far, NUM handles perfect data and nds exact relationships between the positive examples. In order to allow inexact relationships to be found, we introduce the approximation constraint.
Negatives Positives
30
y
20
10
0 -3
0 x
3
Fig. 4. A training sample for an approximate quadratic equation.
De nition 5 (Approximation Constraint) The approximation constraint, denoted '/2, is de ned as an equation which holds, for some X , Y and error term , if Y = X . Thus numerical literals built over '/2 can be solved, for some teacher speci ed value , as: Y ' X , (Y ? X ) ^ (Y + X ) Consider, for example, nding an inexact quadratic relationship between a set of positive examples. A training sample of 30 positive and 30 negative examples was generated from the following approximate constraint with error = 1.
Y ' 2 pow(X; 2) + 3 This training sample together with the corresponding inexact quadratic equation is shown in Figure 4. From a suitable usage declaration, the following literal schema can be found: C1 < 2:039 Y ' C1 pow(X; 2) + C2 where 12::927 756 C2 < 3:149
3.4 The NUM algorithm We conclude this section with a more formal presentation of the algorithm and an analysis of its complexity.
Algorithm 6 (The NUM algorithm) Let u denote a usage declaration for
a numerical literal having n hash-mode symbol declarations and let T + denote the set of positive and T ? the set of negative examples from a training sample T . Furthermore, let Lits denote a set of literal schemas, and let constrain, inverse and solve denote the following three functions:
{
constrain(l; e) substitutes the variables in literal l with the corresponding
values in example e, and adds the resulting constraint on the c-variables in l to the constraint store only if the resulting constraint store is satis able. { inverse(l) returns a copy of the numerical literal l, with its constraint symbol replaced with its inverse. This copy involves no renaming of variables. { solve(l) returns a solution of the numerical literal l with respect to the constraint store. The algorithm is de ned as follows:
u
n
input a usage declaration having hash mode symbol declarations let = + let = 1 n 1 n for each instance of for each 0 be a copy of let with all variables renamed for each +
Lits ; GenSet ffe ; :::; e g j e ; :::; e 2 T g lit u S 2 GenSet lit lit e 2S constrain(lit0; e+ ) end for let = inverse( ? for each ?
inv
e 2T
lit0 )
constrain(inv; e?)
end for for each
e+ 2 (T + ? S ) constrain(lit0; e+ )
end for let end for end for return
Lits = Lits [ fsolve(lit0)g
Lits
Notice that in the case of equations, unique values are found for the cvariables from the generating subset. Such equations can therefore be found by this algorithm within a positive only learning framework. Notice also that this algorithm can be used to nd the set of examples covered by the clause being re ned following the addition of a numerical literal to its body. Theorem 7 (Complexity of NUM). Let u denote the usage declaration for a numerical literal, juj the number of instances of u, and n the number of hash mode symbol declarations in u. Furthermore, let T + denote the number of positive examples in the training sample T . The complexity of the algorithm is upper bounded by: +n
juj:jT j :jT j
Proof. Recall +that examples are atomic, as de ned in Section 2.2. Therefore, j T j there are n ways of choosing subsets of cardinality n from a set T +, which
is upper bounded by jT +jn . Multiplying this by the number of instances of u and the cardinality of the training sample gives the above result and completes the proof. 2 Notice that the complexity of NUM increases polynomially with the size of the training sample. From this observation we are currently investigating if we can predict the number of generating subsets needed to be con dent of nding all the required numerical literals from a single usage declaration. Notice also that one generating subset is required for each occurrence of a numerical literal in a hypothesis. Therefore, if the hypothesis contains m numerical literals, each having at most n hash-mode positions in their corresponding usage declarations, then the training sample must contain at least m n positive examples in order to nd this hypothesis.
4 Comparison to Other Work Some of the main approaches to number handling in ILP are compared to our approach with respective advantages and dierences highlighted.
4.1 Higher-Order Predicates in Progol Progol [4] is an ILP system which nds hypothesis clauses by searching a re nement space which is bounded from below by a most speci c clause ?, constructed from a single positive example5. Since all clauses in the re nement space must subsume ?, the body of each clause must consist of some subset of the literals in the body of ?. Accordingly, any constant symbols appearing in a literal in a hypothesis clause must be present in the corresponding literal in ?, and hence derived from a single positive example. Muggleton and Page [5] consider the use of higher-order logic to allow numerical reasoning in Progol. A higher-order predicate, for instance num/1, is de ned in the background knowledge with one clause for each possible form of equation. In their work, the authors restrict themselves to the following four forms of equation: Y = C1 X + C 2 Y = C1 pow(X; C2 ) Y = pow(C1 ; X ) C1 = pow(X; 2) + pow(Y; 2) The clauses de ning num/1 must guess the constant values for these equations from the single positive example used to construct ?. For instance, consider the following clause, similar to that provided by Muggleton and Page, for linear equations: num(Y = M * X + C) :- not(Y==X), M1 is Y / X, round(M1, M), C1 is Y - M * X, round(C1, C).
5
In fact, Progol views the training sample as a sequence, and ? is constructed from the rst example in this sequence.
Clearly, in the general case, this clause is unlikely to nd the equation of a line relating a number of positive examples. However, this rather obvious shortcoming is not immediately apparent in the examples presented by Muggleton and Page since: 1. the restricted forms of equations considered by the authors only contain a small number of constants, and 2. the training samples are ordered to ensure that the rst positive example would yield suitable values, when rounded, for the constants in the desired equation. For instance, given this de nition of num/1, Progol will be unable to nd the simple linear equation (t(X; Y ) Y = 3 X + 4) from the following positive examples regardless of the order in which they are presented. t(?2; ?2) t(?1; 1) t(0; 4) t(1; 7) t(2; 10) Furthermore, this approach cannot handle in-equations and dis-equations since there is no clear way of determining the values of constants, other than by random selection. It is also incapable of handling more general forms of equations, particularly those involving more constants.
4.2 Lazy Evaluation in Progol
In order to overcome Progol's commitment to constants in ?, Srinivasan and Camacho [8] have recently developed what they term lazy evaluation of literals. Background knowledge predicate symbols to be lazily evaluated are added as literals to the body of ? with Skolem constants occupying any positions in which a constant symbol is to appear. These literals shall be called lazy literals. During re nement, a lazy literal from ? is added to a clause only if: 1. constants can be found to replace the Skolem constants (for instance linear and nonlinear regression techniques can be used to nd constants for equations), and 2. the resulting clause is both consistent and complete. The completeness requirement in the second of these two points highlights a problem with this approach. It is assumed that sucient non-lazy predicate symbols are available to discriminate positive examples not related by the same lazy literal. Thus, only one lazy literal may appear in any clause. For instance, consider the problem of nding a hypothesis to explain the training sample shown in Figure 5, where the body of ? contains only the single lazy literal: Y = sk1 X + sk2 The completeness requirement necessitates tting a line through all the positive examples, producing a very poor t. For this relationship to be captured more accurately by Srinivasan and Camacho, the background knowledge must contain sucient non-lazy predicate symbols to identify a feature common to one set of related positive examples and absent from the other.
Positives Regression
y
10
5
0 0
5 x
10
Fig. 5. Fitting a regression line through a set of positive examples.
4.3 Disjunctive Version Spaces Sebag and Rouveirol [7] have also examined the possibility of using CLP to improve number handling in ILP. They attempt to reformulate the search of the re nement space as a Constraint Satisfaction Problem (CSP), and thus delegate the complexity of this search to existing constraint solvers. In particular, their work focuses on the problem of generalising a single positive example against a number of negative examples. These examples are represented as clauses, with no background knowledge. A hypothesis (a single clause) must subsume the positive example e+ (since it is a clause) and none of the negatives. This discrimination is achieved by adding to the body of the hypothesis at most one of the following, for each negative example e? : 1. a literal present in the body of e+ but not e? , or 2. a numerical literal containing variables already present in the hypothesis clause. The latter of these is of interest to us here. Sebag and Rouveirol restrict a numerical literal to be a simple equation or in-equation of the form: X < c or X ? Y < c (for some constant c). These forms are built-in, and in contrast to our approach and those involving Progol above, no mechanism exists for the teacher to specify particular forms of numerical literals. As a result, neither nonlinear relationships, nor relationships in more than two dimensions can be found.
5 Conclusions A continuing problem within ILP has been the poor handling of numbers oered by ILP systems. Using the improved number handling capabilities of the CLP
scheme we have addressed this problem by providing an algorithm capable of uniformly handling a variety of forms of numerical reasoning. In particular this algorithm is independent of a particular learning system and has been shown to be more general than several competing approaches to number handling in ILP. We propose to continue this research in two further directions. The rst is to incorporate this algorithm into one, or possibly a variety of existing ILP systems and compare its performance against other proposed approaches to number handling on several large datasets. We are particularly interested in using our approach to analyse economic and nancial data. Secondly, we hope to consider the bene ts of using other constraint objects in learning. These would allow a similar treatment of other, possibly non-numerical, computation domains. Of particular interest are the boolean and nite domain constraint objects.
References 1. J. Jaar and M. J. Maher. Constraint logic programming: A survey. Journal of Logic Programming, 19-20:503{583, 1994. 2. J. Jaar, S. Michaylov, P. Stuckey, and R. H. C. Yap. The CLP(