IT { 302 Laboratoire d'Informatique Fondamentale de Lille
Publication
IT { 302
Set-Based Analysis for Logic Programming and Tree Automata P. Devienne, JM. Talbot and S. Tison fdevienne,talbot,tisong@li .fr
mai 1997
c L.I.F.L. { U.S.T.L. LABORATOIRE D'INFORMATIQUE FONDAMENTALE DE LILLE U.R.A. 369 C.N.R.S. UNIVERSITE DES SCIENCES ET TECHNOLOGIES DE LILLE U.F.R. d'I.E.E.A. B^at. M3 { 59655 VILLENEUVE D'ASCQ CEDEX Tel. (+33) 3 20 43 47 24 { Telecopie (+33) 3 20 43 65 66 { E-mail direction@li .fr
Resume
L'analyse statique de programmes vise a extraire de ceux-ci des proprietes, permettant des implantations plus ecaces et plus s^ures. Une des proprietes pouvant se reveler ineressante est la semantique "computationnelle", c'est-adire ce que calcule le programme. Pour des raisons de decidabilite, seule une approximation de cette semantique peut ^etre consideree. L'analyse ensembliste [Hei92a] est une methode a la fois elegante et precise permettant le calcul d'une telle approximation. Dans le cadre de la programmation logique, celle-ci peut ^etre consideree comme de l'inference de type (au sens de [MR85]). Dans [FSVY91], une presentation plus simple de l'analyse ensembliste est proposee : celle-ci est basee sur des transformations de programmes logiques et des algorithmes operant sur des automates d'arbres alternant. Cependant, les auteurs ne traitent principalement que de la veri cation de type (i:e: le test d'appartenance a la semantique approchee du programme). Nous presentons dans cet article une nouvelle methode d'analyse ensembliste en reutilisant la transformation de programmes majeure proposee dans [FSVY91]. Les automates d'arbres seront notre outil pour a la fois calculer et representer le resultat de l'analyse ensembliste. Ceci permet une approche globale et coherente de l'analyse ensembliste tant en conservant la simplicite de l'approche presentee dans [FSVY91] et donne une caracterisation de la complexite a la fois du probleme et de notre methode. De plus, l'utilisation des automates d'arbres devrait permettre une implantation ecace, contrairement aux conclusions de [FSVY91].
Mots-Cles : Programmation Logique, Analyse Ensembliste, Automates d'Arbres, Inference de Types.
Abstract
Compile-time program analysis aims to extract from a program properties useful for ecient implementations and sofware veri cation. A property of interest is the computional semantics of a program. For decidability reasons, only an approximation of this semantics can be computed. Set-based analysis [Hei92a] provides an elegant and accurate method for this. In the logic programming framework, this computation can be related to type inference [MR85]. In [FSVY91], a simpler presentation based on program transformation and algorithms on alternating tree automata is proposed. Unfortunately, the authors focussed on type checking (i:e: a membership test to the approximate semantics). We propose in this paper a new method to achieve set-based analysis reusing the main transformation described in [FSVY91]. The main tool for both computation and representation of the result of set-based analysis is tree automata. This leads to a global and coherent presentation of the problem of set-based analysis combined with the simplicity of [FSVY91]. We obtain also a complexity characterization for the problem and our method. This tree automata approach should lead to an ecient implementation, contrary to the rst conclusions of [FSVY91].
Keywords : Logic programming, Set-based analysis, Tree automata, Type inference.
1
1 Introduction Static-analysis aims to extract run-time properties from a program. Such properties of every kind can be very useful in the dierent steps of program development such as debugging (typing, array bounds checking,...) or optimization (dead code and redundancy elimination, specialization, ...). Abstract interpretation [CC77] [CC92b] provides a general framework for this kind of analysis. This framework has just to be \instantiated" according to the properties of interest [CC92a] [Deu91][CH92]. The basic idea is to replace the domain of computation with an abstract domain and to perform a (terminating) abstract computation over this latter. One of the most important properties of a program is its computational semantics which is known not to be computable. Thus, for decidability and eciency reasons, only an approximation of the semantics could be computed. Dierent kinds of approximation have been considered [XW88] [GdW92] giving methods with various accuracies. An elegant method for semantics approximation is the set-based analysis [Hei92b]. It has been introduced for dierent programming paradigms such as imperative [Hei92b], functional [Hei94] and logic [HJ90] programming. The main feature of this analysis is that the only approximation consists in ignoring inter-variable dependencies, that is reasoning about the value of a variable (at a point of the computation) with respect to the value of another one is not allowed. This leads to associate with each variable of the program the set of possible values taken by this variable during the computation. Although set-based analysis could be expressed as an abstract interpretation [CC95], its formulation out of this framework keeps more natural. In the logic programming paradigm, set-based analysis involves computing an upper approximation of the success set of a program. This approach could be related to the notion of typing de ned in [Mis84] and [MR85]. Unlike other works concerning type systems based on the extension of the logical language [MO84], [Zob87], the type of a logic program is simply de ned as a recursive (regular) superset of its logical consequences. The intuitive interpretation is that this superset contains the atoms that may succeed for the considered program. For logic programming, set-based analysis of a logic program is de ned as the least set-based model of this program. A set-based model is a set of ground atoms closed under application of the ground instances of the clauses of the program and de ned as the instances using a set-substitution (which maps a variable to a set of ground terms) of the heads of the clauses of the program. To compute a nite presentation of the least set-based model of a program, a set of quanti ed set expressions is extracted from the program. The least solution of those expressions is exactly the least set-based model of the input program. Quanti ed set expressions are resolved into a regular tree grammar for which the generating language is the least solution of those expressions. Unfortunately, solving such quanti ed set expressions is performed using a rather complicated algorithm and complexity aspects of this latter are not addressed. It led the authors to give a simpler but less accurate method using set constraints [Hei92a], [HJ92] . A dierent approach is taken in [FSVY91]. The idea developed in this paper is that the least set-based model of a logic program can be seen as the exact Herbrand model of an approximate logic program. This approximate program, called type program, has a speci c syntactic form which limits its computional power. An algorithm for computing the least model of such programs based on tree automata transformations is described, but the result of those transformations is not equivalent to a \tree grammar" representation. 2
We propose in this paper a simple method to compute the least set-based model of a logic program. As in [FSVY91], it is based on the \equivalence" between the least set-based model of a logic program and the least Herbrand model of the associated type program and on tree automata. Because of its easy handling and its expressive power, we extend the use of tree automata formalism to computation and representation of the least model of a type program. The following section is devoted to a detailed presentation of previous works about set-based analysis for logic programs. In section 3, quasi-automaton programs are introduced as a particular form of type programs. In the same section, the main tool of our method, that is tree automaton, is de ned. Our method is described as an system of inference rules over tree automata in section 4: starting from an \empty" automaton, the application of the system leads to a x-point automaton. The language recognized by this automaton corresponds to the least Herbrand model of the considered quasi-automaton program and so, to the least set-based model of the input program. Subsections 4.2 and 4.3 deal respectively with correctness and complexity of the method. Finally, an short example will be exposed in section 5.
2 Preliminaries 2.1 Notations
TERM(; V ar) is the set of terms built over the function symbols of the signature and the set of variables Var. TERM() is the set of ground terms. t will denote the tuple (t1 ; ::; tn ). Notations and main results used in this paper concerning logic programming are mostly taken from [Llo87]. Few other notations are xed as follows: Let P be a logic program and c a clause of P , Pred(P ) is the set of predicate symbols occurring in P . head(c) (resp. body(c)) will denote the head (resp. the set of atoms of the body) of c and Body only(c) will denote the set of variables that occur only in the body-part of c. Subst(V,E) is the set of substitutions having V a nite set of variables for domain and ranging over a set E and nally, Var(e) is the set of variables which occurs in e (where e is either a term, an atom, a set of terms or a set of atoms).
2.2 Set-based Analysis of Logic Program
The main feature of set-based analysis is that the only approximation done during the computation consists in ignoring inter-variable dependencies. This leads to associate with each variable of the program the set of possible values taken by this variable during the computation. In the logic programming framework, this notion is captured with set-substitution. A set-substitution maps variables onto sets of ground terms and can be extended to terms in a canonical way: (a) = fag (f (t1 ; ::; tn )) = ff (s1; ::; sn ) j s1 2 (t1 ); ::; sn 2 (tn )g An exception to this de nition is the empty set-substitution f: 8t 2 TERM(; Var); f(t) = ?. A set-based interpretation I of a logic program P is a set of ground atoms de ned as setinstances of the heads of clauses of P :
I=
[
c2P
c (head(c)) where c is a (possibly empty) set-substitution. 3
A set-based model I of a logic program P is a set-based interpretation of P s.t. for all ground instance H B1 ; ::; Bn of a clause of P , fB1; ::; Bn g I implies H 2 I . Set-based analysis of a logic program P amounts to compute the least set-based model of P (lsbm(P)). The method described in [HJ90] to compute the least set-based model of a logic program could be summarized as follows: a set of quanti ed set expressions is extracted from the program P . The least solution of those expressions is exactly the least set-based model of P , which can be computed by resolving them into a tree grammar. The main drawback of such a method is that the solving algorithm is rather complicated and its complexity is unknown. A very simpler approach is taken in [FSVY91]. The authors proposed a membership testing algorithm for the least set-based model. It is mostly based on the transformation of a logic program into a weaker one called type program: As usual, it is assumed that two dierent clauses of the logic program do not share variables. Let p0 (t0 ) p1 (t1 ); ::; pm (tm ) be a clause of a logic program; for each variable Xi which occurs in p0 (t0 ), a new monadic predicate symbol xi is created. Roughly speaking, the success set of the predicate xi will be the possible values taken by the variable Xi during the computation. For each predicate symbol p, a new function symbol fp is introduced and atoms of the form p(t) are replaced by type(fp (t)), where type is a new monadic predicate symbol. Finally, each occurrence of the variable Xi of the head of the clause type(fp0 (t0 )) will be distinguished from the other occurrences by adding an index to this variable: for instance, Xij for the j th occurrence of the variable Xi . type(fp0 (t~0 )) will be the resulting atom after these renamings. The last part of the transformation consists in expressing the type of the head of the clause with the \type" of its variables (i:e: the values possibly taken by those variables) and the \type" of those variables with the type of the body-part atoms. type(fp0 (t~0 )) x1 (X11 ); ::; xk (Xkl ) xi (Xi ) type(fp1 (t)); ::; type(fpm (tm )) If the head of the clause is ground then the transformation proceeds as if there is a variable X in the head 1 . For each head-only variable X in a clause c, the atom all(X ) is added to the body of the clause c. all is a new predicate symbol succeeding for all ground term built over the signature of the original logic program 2 .
Example 1
The logic program in gure 1 will be transformed into the type program in gure 2: type(fp(a; a)) p(a; a) type(fp(b; a)) p(b; a) type(fq (X )) x(X ) q(X ) p(X; X ) x(X ) type(fp (X; X )) q(g(Y; Y )) q(Y ) type(fq (g(Y 1 ; Y 2 ))) y(Y 1 ); y(Y 2 ) y(Y ) type(fq (Y ))
Figure 1
Figure 2
The main point is that the exact semantics of the so-computed type program is almost the same as the set-based approximate semantics of the original program. p(t) 2 lsbm(P ) i Ptype j= type(fp (t)) The authors notice that type programs are particular cases of proper programs, which simulate 2-way alternating tree automata. They deduce from that an EXPTIME algorithm for This trick simulates the empty substitution f. This last point ensures that function symbols representing predicate symbols of the original program occur only on the top of terms in the success set of the type program. 1 2
4
\type checking", that is deciding if a ground atom is a logical consequence of such a program. A type inference algorithm is also proposed, based on the transformation algorithm from 2-way alternating tree automata into 1-way alternating tree achieved with an EXPTIME and EXPSPACE algorithm. It computes a logically equivalent program in reduced-regular form with clauses as: - p(a) where a is a constant symbol - p(f (X1 ; ::; Xn)) p1 (X1 ); ::; pn (Xn ) where Xi 's are all dierent - p(X ) p1 (X ); ::; pn (X ) The two rst kinds of clauses can be viewed as transition rule of a tree automata and the last one as intersection between tree languages. However, this representation is not equivalent to that used in [HJ90]. To obtain a similar result, \intersection" clauses must be cancelled. For this, an exponential-time algorithm is needed (since the tree automata intersection problem is known to be EXPTIME-complete [FSVY91]). Combining this algorithm with the \proper to reduced regular transformation" algorithm provides a doubly-exponential time algorithm. Starting from this, our goal will be to compute in a more ecient way the least model of a type program.
3 Quasi-Automaton Programs and Tree Automata In this section, two main notions are introduced. The rst one is a syntactic restriction on type programs we are going to consider and the second one the basic tool of our method.
3.1 Quasi-Automaton Programs
Type programs have a rather simple syntactic form. However, this form is not suitable enough for our purpose. Therefore, few syntactic restrictions are added on type programs by de ning the class of quasi-automaton programs. De nition 1 A quasi-automaton3 program has clauses such as: - p(f (X1 ; ::; Xm)) p1 (X1 ); ::; pm (Xm ) where the Xi 's are all dierent. - p(X ) p1 (t1 ); ::; pm (tm ) where X occurs in the body-part of the clause. One should notice that the rst kind of clauses includes facts p(a) for nullary function symbols (i:e: constants). A type program Ptype can easily be transformed (using a linear time and space algorithm) into a \equivalent" quasi-automaton program Pauto by adding new predicate symbols in such a way that: 8p 2 Pred(Ptype ); Ptype j= p(t) i Pauto j= p(t) Example 2 For the type program in the example 1 type(fp (X; Y )) pa (X ); pa (Y ) type(fp (X; Y )) pb (X ); pa (Y ) type(fq (X )) x(X ) x(X ) type(fp(X; X )) type(fq (X )) rg (X ) rg (g(X; Y )) y(X ); y(Y ) y(X ) type(fq (X )) pa (a) pb (b)
3 The word \automaton" comes from the form of the rst kind of clause and \quasi", of course, from the second one.
5
Computing the least set-based model of a logic program is now closely related to the computation of the least (Herbrand) model of a quasi-automaton program. Tree automata provide an elegant method for that.
3.2 Tree Automata
The class of considered tree automata can be seen as a slight enhancement of the original one [GS84].
De nition 2 A n-ranked tree automata (TA) A is a tuple (; Q; F ; S ) where is a set of function symbols, Q is a nite set of states, F = (F1 ; ::; Fn ), is a tuple of sets of nal states (Fi Q) and S is a set of transition rules of the form: f (q1 ; ::; qm ) ! q where f 2 is a m-ary symbol and fq; q1 ; ::; qm g Q As a particular case, rules for constant symbols look like a ! q. For our purpose, a restricted class of such automata is sucient. This class is the set of deterministic and complete TA and will be denoted TAdc.
De nition 3 A TA is said to be deterministic i 8ri ; rj 2 S , s.t. i 6= j , ri and rj don't have the same left-hand side. complete i 8f 2 , 8q1 ; ::; qm 2 Q, 9q 2 Q s.t. f (q1 ; ::; qm ) ! q 2 S A deterministic and complete tree automata (TAdc) runs over ground terms built with function symbols of . This run is formally de ned using a function runA from TERM([Q) (i:e: the set of terms built over and Q where states are considered as constants) onto Q s.t.
De nition 4 For all t 2 TERM( [ Q), runA(t) = q i t !A q where !A is the transitive closure of the move function !A de ned from TERM( [ Q) onto itself as t !A t0 i t = T [l], t0 = T [r] and l ! r 2 S . The language recognized by a n-ranked TAdc A is a tuple (L1 ; ::; Ln) where Li is a set of ground terms de ned by 8t 2 TERM(); t 2 Li i runA (t) 2 Fi Another crucial point is the notion of reachability for a state: a state q is said to be reachable in a TA A i there exists a term t in TERM() s.t. runA (t) = q. This de nition implies, in particular, that if q 2 Fi and q is reachable, then Li is non-empty. Reachable(A) will denote the set of reachable states in a TA A. A partial ordering A relation can be de ned on TAdc built over the same signature and the same set of states Q as an extension of a partial ordering on Q.
De nition 5 Let A = (; Q; F ; S ), A0 = (; Q; F 0; S 0 ) 2 TAdc and Q be an ordering relation on Q, A A A0 i 8lhs; (lhs ! q 2 S ^ lhs ! q0 2 S 0 ) ) q Q q0 It is now easy to design a method for building a TAdc Af which recognizes the \success set" of a quasi-automaton program.
6
4 The Method The main idea of our method is rather simple: starting from a quasi-automaton program P , the rst step consists in xing the de nition part of a TAdc that is , Q and F . This will be done only by considering the predicate symbols that occurs in P and in such a way that a component of the tuple recognized by TAdc will correspond to the success set of a (monadic) predicate of P . The second step starts from an \empty" TAdc and transforms the right-hand side of rules according to the clauses of P until a x-point is reached.
4.1 Encoding and algorithm
More formally, the rst step of the method is described as follows: Let P be a quasi-automaton program, fp1 ; ::; pn g the set of predicate symbols that occur in P and A = (; Q; F ; S ) be a deterministic and complete TA s.t. is the set of function symbols of P , Q = f0; 1gn and F = (F1 ; ::; Fn ) s.t. for each i, Fi is the set of states having 1 on their ith component. The aim of the second step is to built an automaton Af = (; Q; F ; Sf ) s.t. 8t 2 TERM(); runAf (t) 2 Fi i P j= pi (t) Since, for the considered automata, , Q and F are xed, from now no distinction will be made between an automaton and its set of transition rules. Our algorithm is presented as a system of inference rules R depending on P which de nes a relation !R on complete sets of transition rules. For convenience, a few notations are introduced: S0 will denote the set of transition rules having f0gn for right-hand side. for q; q0 2 Q, switch(q,i)=q0 s.t. 8j; q0 2 Fj , (q 2 Fj _ i = j ) 4 switch func(f (q1 ; ::; qm ) ! q; c) = f (q1 ; ::; qm ) ! q0 with 8 switch(q; i ) i c = p (f (X ; ::; X )) p (X ); ::; p (X ) >< im m 1 m i1 1 0 i0 and 8 j 2 f 1 ; ::; m g ; q 2 F j i 0 j q = >: q elsewhere
switch proj(f (q1 ; ::; qm ) ! q; c) = f (q1 ; ::; qm ) ! q0 with
8 switch(q; i ) i c = p (X ) p (t ); ::; p (t ); >> ih h i 1 0 i fq1 ; ::; qm g Reachable(S) and >< 9 2 Subst(Body only(c),Reachable(S)) q0 = > s.t. 8j; runS (([X=q] )tj ) 2 Fij >> : q elsewhere 0
1
And then, we can de ne R according to the dierent kind of clauses of a quasi-automaton program as the set of the two following inference rules:
S [ frg S [ switch func(r; c)
if c = pi0 (f (X1 ; ::; Xm ))
pi1 (X1 ); ::; pih (Xh ) 2 P
4 The notation switch(q ,i)=q just means that q is the same as q except (eventually) on the ith component which is set to 1 in q . 0
0
0
7
S [ frg S [ switch proj(r; c)
if c = pi0 (X ) pi1 (t1 ); ::; pih (th ) 2 P Intuitively, the rst inference rule means that if t1 ; ::; tm are ground terms and for each j , tj belongs to the ij th component of the recognized language (i:e: the success set of the predicate symbol pij ), then f (t1 ; ::; tm ) must belong to the i0th component of the recognized language (i:e: the success set of the predicate symbol pi0 ). The second rule is a bit more complicated: the right hand-side of the selected rule is used to instantiate the \head" variable and other states are chosen to instantiate body-only variables (the "reachability" condition ensures at this step that states represent eectively ground terms). If each so-instantiated term is a \success" for the corresponding predicate symbol, then terms represented by the right hand-side of the selected rule must be a "success" for the predicate symbol pi0 (i:e: the right hand-side of this rule must be nal for the i0th component of the recognized language). The result of this algorithm is a TAdc Sf s.t. S0 !R Sf and 8S ; Sf !R S ) S = Sf
4.2 Con uence and Termination
We are going to show in this section that this method terminates and that Sf is unique. This will be done by de ning a suitable partial ordering relation Q (extended on automata as described in section 3.2) and then using the properties of nite lattices.
De nition 6 Let q; q0 2 Q, q Q q0 i 8i; q 2 Fi ) q0 2 Fi This partial ordering is extended on a partial ordering A on TAdc; this implies in particular that (TAdc; A ) is a nite (complete) lattice with S0 as least element. Moreover, it is easy to see because of the de nition of switch that !R is a monotonic relation on this lattice and, by de nition Sf is the least xed-point of this relation. Therefore, since this lattice is nite, the algorithm terminates.
4.3 Correctness
In this section, we are going to deal with correctness of the method. We start by proving a main lemma for soundness. It states a invariant property for !R .
Lemma 1
(Part 1) For each transition rule f (q1 ; ::; qm ) ! q 2 S ,
8i (1 i m); Ii = fx j qi 2 Fx g and I = fy j q 2 Fy g ^ )^ ^ for all ft1 ; ::; tm g ; px (ti ) ^ P j= py (f (t1 ; ::; tm )) 1im x2Ii
y2I
(Part 2) 8t 2 TERM (; V ar); V ar(t) = fX1 ; ::; Xl g = V
8 2 Subst(V; Q); 8 2 Subst(V; TERM()) Let runS (t) = q,
P^
^
^
1il fjjXi 2Fj g
pj (Xi ) j=
8
^ fjjq2Fj g
pj (t)
Let S be a tree automaton for which (Part 1) and (Part 2) hold, if S !R S 0 , then (Part 1) and (Part 2) hold for S 0 .
Proof :
For (Part 1) : According to the form of the clause c used for this transition:
c = pk0 (f (X1 ; ::; Xm )) pk1 (X1 ); ::; pkm (Xm ): the property remains true for unchanged rules. Let f (q1 ; ::; qm ) ! q 2 S , the transition rule transformed into f (q1 ; ::; qm ) ! switch(q; k0 ) in S 0 . By de nition of switch func, 8i 2 f1; ::; mg, qi 2 Fki , so ki 2 Ii . Let I 0 = fy j switch(q; k0 ) 2 Fy g = I [ fkg. So, ^ ^ ^ P^ px(ti ) j= py (f (t1 ; ::; tm )) 1im x2Ii
holds since (Part 1) holds for S ,
P^
^ ^
1im x2Ii
and P ^
px(ti ) j=
^ ^
y2I
^ y2I
0
py (f (t1 ; ::; tm )) holds for S 0
px(ti ) j= pk0 (f (t1 ; ::; tm )) 1im x2Ii V V V since 1im x2Ii px(ti ) j= 1im pki (ti ) holds and since the head of the clause c is
linear. c = pk0 (X ) pk1 (s1 ); ::; pkh (sh ): the property remains true for unchanged rules. Let f (q1 ; ::; qm ) ! q be the transition rule in S transformed into f (q1 ; ::; qm ) ! switch(q; k0 ) in S0. Since (Part 1) holds for S , it holds for S 0 i:
^ ^
1im x2Ii
px (ti ) ^ P j= pk0 (f (t1 ; ::; tm ))
Let 0 2 Subst(Body Only(c); Reachable(S )) be the substitution used by !R ; by de nition of reachability, there exists 0 from Body Only(c) onto TERM() s.t. 8Y 2 Body Only(c), runS (0 (Y )) = 0 (Y ). Let = 0 [X=q] and = 0 [X=f (t1; ::; tm )]. We must now prove that 8j; pkj (sj ), by using (Part 2). By the condition of swith proj, 8j; runS (sj ) 2 Fkj . V If Y 2 Body Only(c), then fjjY 2Fj g pj (Y ) holds since runS (Y ) = Y and (Part 2) holds for S . V For X , fjjX 2Fj g pj (X ) must hold. By noticing that X = q and XV= f (t1 ; ::;Vtm ), (Part 1) can be applied on the considered rule underV the hypothesis that 1im x2Ii px (ti ) and since (Part 1) holds for S by assumption, fjjq2Fj g pj (f (t1 ; ::; tm )) Therefore,
^ ^
1im x2Ii
px (ti ) ^ P j= pk0 (X ) holds
For (Part 2) : we are going to prove that if (Part 1) holds for S , then (Part 2) holds for S . By induction on the structure of t:
if t looks like f (X1 ; ::; Xm ). In this case, (Part 2) holds since it is implicated by (Part 1), for the rule f (X1 ; ::; Xm ) ! q and for ti = Xi , and so, for t = f (t1 ; ::; tm ). 9
if t looks like f (s1 ; ::; sm ). In this case, we should notice that t = f (s1 ; ::; sm ). By induction assumption, (Part 2) holds for sk , for all k. So, let runS (sk ) = qk , for all k ^ ^ ^ pj (sk ) pj (Xi ) j= P^ 1il fjjXi 2Fj g
fjjqk 2Fj g
Let q = runS (t) = runS (f (s1 ; ::; sm )) and let us consider the transition rule f (q1 ; ::; qm ) ! q 2 S , since (Part 1) holds for this rule, then (Part 2) do so in this case.
2
As an easy corollary, Theorem 1 [Soundness] Let Sf be the least x-point of !R, 8t 2 TERM(); 8i; runSf (t) 2 Fi ) P j= pi (t)
Proof :
It is easy to see that properties (Part 1) and (Part 2) of lemma 1 hold for S0 . So, by lemma 1, they hold for Sf . Then, this theorem (for Sf ) is obviously the restriction of lemma 1 (Part 2) for ground terms (TERM()). 2 We deal now with the completeness of the method: Let Sf be the computed automaton that is the least x-point of !R on (TAdc; A ).
Lemma 2 8 c = pk0 (t0 ) pk1 (t1 ); ::; pkh (th ) 2 P , 8 2 Subst(V ar(c); TERM()), 8i 2 f1; ::; hg; runSf (ti ) 2 Fki ) runSf (t0 ) 2 Fk0 Proof :
Let us assume that there exists a clause c = pk0 (t0 ) pk1 (t1 ); ::; pkh (th ) in P and a ground substitution 2 Subst(V ar(c); TERM()) s.t. 8i 2 f1; ::; hg; runSf (ti ) 2 Fki and runSf (t0 ) 62 Fk0 . So, according to the form of c: c = pk0 (f (X1 ; ::; Xm)) pk1 (X1 ); ::; pkm (Xm ): in this case, it means that there exists for all j in f1; ::; mg, a state qj = runSf (Xj ) 2 Fkj s.t. f (q1 ; ::; qm ) ! q 2 Sf and q 62 Fk0 . A rule of R could be used on Sf producing a new automaton, and then Sf would not be a x-point. c = pk0 (X ) pk1 (t1 ); ::; pkm (tm ): the substitution can be written as [X=s] 0 s.t. 0 2 Subst(Body only(c); TERM(). We are going to consider now the unique substitution 0 from Body only(c) onto Reachable(Sf ) s.t. 8Y 2 Body only(c), 0 Y = runSf (Y ). Let = 0 [X=runSf (s)]. It is obvious to see that 8j 2 f1; ::; mg, runSf (tj ) = runSf (tj ). So, 8j 2 f1; ::; mg, runSf (tj ) 2 Fkj . It implies that there exists a transition rule in Sf having reachable states in its left-hand side and q = runSf (s) for right-hand side and that q 62 Fk0 . So, a inference rule of R could be applied which contradicts that Sf is a x-point.
2
Theorem 2 [Completeness] 8t 2 TERM(); 8i; P j= pi (t) ) runSf (t) 2 Fi Proof : If P j= pi (t) holds, then pi (t) is the root of a proof tree built with ground instances of
clauses of P [Llo87]. Then, the proof is obvious by induction on the height of this tree and by using lemma 2.
2
10
4.4 Complexity
The problem of \computing a grammar (or tree automaton) representation of the least set-based model for a logic program" has an easy lower-bound.
Proposition 1 There is no polynomial algorithm (w.r.t. the size of the program) computing a grammar (or tree automata) representation of the least set-based model of every logic program. Proof :
One should notice rst that the least set-based model of a reduced regular program is exactly its least Herbrand model. It is easy to encode the emptiness problem of tree automata intersection into this algorithm: starting from automata represented with clauses as p0 (f (X1 ); ::; f (Xn )) p1 (X1 ); ::; pn (Xn ), the clause inter(X ) pf1 (X ); ::; pfm (X ) is added to the program (with pfi the nal state of the ith tree automaton). Set-based analysis produces a grammar, for which testing whether or not inter is empty could be done in a linear time w.r.t the size of the grammar. Since the emptiness problem of tree automata intersection is known EXPTIME-complete [FSVY91], then set-based analysis could not be achevied in a polynomial time. 2 Now, we are going to prove for a natural strategy our algorithm is EXPTIME. First of all, let !R;fc;lg be the relation !R for which the chosen clause is c and the transition rule used has l for left-hand side. The algorithm would be:
Least Herbrand Model(P ): S := S0
Until a x-point is reached loop For each c in P loop For each l left-hand side of transition rule loop Let S 0 be the set of rules s.t. S !R;fc;lg S 0 S := S 0 Endloop Endloop Endloop Return S
This algorithm obviously computes Sf . The size of the automata could be easily computed: Let np (resp.nf ) be the number of predicate (resp. function) symbols of P and am , the greatest arity of function symbol. It follows that the number of states of the automaton is 2np , that nf 2npam is an upper bound for the number of transition rules of an automaton and that (am + 1)nf 2npam (denoted T ) is an upper bound for the size of an automaton. We assume that testing whether a state is nal and that \switching" a state can be achieved in constant time. Testing whether a state is reachable can be done in O(T ) using a forward chaining algorithm. For an application of !R;fc;lg according to the form of the clause c: for swith func: the test for this function could be done at most in O(am ) (denoted time func). for swith proj: Let tc be the maximal size of the body-part of a clause of P and nv the maximal number of body-only variables in a clause. Testing that the states occuring in the 11
left-hand side of the selected rule are reachable can be achieved in at most O(am T ). For a Q-instance of a clause, testing that states used for the instantiation costs at most O(nv T ) and computing the run of each atom in the body of the clause costs at most O(tc ). Since the number of possible Q-instances is at most 2npnv , globaly, for this kind of clause, we obtain O(tc nv T ) (denoted time switch). Since time func < time switch, time func will be approximated by time switch. So, if n denotes the number of clauses, for every clause and every rule, it costs at most O(n nf 2npam time switch) for one step. One should notice that for one step at least one \right hand-side" state is switched (to a greater one). Since there is only at most np nf 2npam possible switches, this gives the maximal number of steps. It is obvious that nv < tc . Moreover, because of the particular form of the program, we can claim that am < tc and that nf < n. Finally, it can be assumed that np < n. By noticing this and that S , the size of the program majorates n tc , it can be deduced that the algorithm costs at most O(S 4 2S2 +2S ). Proposition 2 Computing a tree grammar representation of the least set-based model of a logic program can be achevied with an exponential time algorithm w.r.t. the size of the input program.
Proof :
Obvious by considering the transformation of a logic program into a quasi-automaton program with linear time and space algorithm, the complexity of our algorithm and the easy relation between TAdc and the required tree grammar representation.
2
5 Example Let us recall the original program from example 2 that we are going to analyze in Figure 3. The associated quasi-automaton program (in Figure 4) has been decomposed in two parts according to the form of the clauses and the inference rules which will be applied for computing the automaton.
p(a; a) p(b; a) q(g(Y; Y )) q(Y ) q(X ) p(X; X )
Figure 3
(1) (2) (3) (4) (5) (6) (7)
type(fp(X; Y )) pa (X ); pa (Y ) type(fp(X; Y )) pb (X ); pa (Y ) type(fq (X )) x(X ) type(fq (X )) rg (X ) rg (g(X; Y )) y(X ); y(Y ) pa(a) pb (b)
(8) y(X ) (9) x(X )
type(fq (X )) type(fp(X; X ))
Figure 4
The clauses from (1) to (7) are function clauses. In other words, a term is added to the success set of a predicate i its immediate sub-terms belong to the respective sets. Let us translate Clause (2) to a transition rule by using the rst inference rule. The transition rules5 : pb ; pa ) ! type x y pa pb rg fp ( 5
In a state, pred states for 1 and pred for 0. \ " means either 1 or 0.
12
are switched to
pb
fp (
;
) ! type x y pa pb rg
pa
This means that if the value of X et Y is respectively in pb and pa , then the value of fp (X; Y ) is in type. Since the position in the tuple ( type x y pa pb rg ) clearly indicates the associated pedicate, each predicate symbol (type, x, y, pa , pb , rg ) may be replaced with the boolean value 1, and their complements (type, ... ) with 0. So by application of the rst inference rule for all the clauses from (1) to (7), the following transition rules are obtained :
fp ( 1 ; 1 ) ! 100000 fp ( 1 ; 1 ) ! 100000 fq ( 1 ) ! 100000 fq ( 1 ) ! 100000 g( 1 ; 1 ) ! 000001
type(fp (X; Y )) pa (X ); pa (Y ) type(fp (X; Y )) pb (X ); pa (Y ) type(fq (X )) x(X ) type(fq (X )) rg (X ) rg (g(X; Y )) y(X ); y(Y )
a ! 000100 b ! 000010
pa(a) pb (b)
One may notice that such function clauses do not need to be taken into account more than once. In fact, only the projection clauses need a x-point operator to reach the nal tree automaton. Let us try to apply clause (8) according to the second inference rule. The only reachable states are : 100000 , 000100 and 000010 . So the automaton is unchanged. Let us consider now clause (9), the only modi ed rule is a ! 000100 , that is having its left-hand side states reachable and run(fp ( 000100 ; 000100 )) 2 Ftype . So, this transition rule will be replaced by : a ! 010100 We may consider again clause (8): as in the previous step, the rule a ! 010100 has to be modi ed (since run(fq ( 010100 )) 2 Ftype ) into a ! 011100 . It is easy to see that the second inference rule applied for clause (9) keeps the automaton unchanged. Let us apply the second inference rule for clause (8). The transition rule g( 011100 ; 011100 ) ! 000001 has to be modi ed since 011100 is a reachable state and run(fq (g( 011100 ; 011100 ))) 2 ftype . So, this rule is replaced with g( 011100 ; 011100 ) ! 001001 . After a few similar inference steps, a x-point is reached and the nal automaton is : fp ( 1 ; 1 ) ! 100000 fp ( 1 ; 1 ) ! 100000
fq ( 1
) ! 100000
fq (
1 ) ! 100000
g( 011100 ; 011100 ) ! 001001
g( 001001 ; 001001 ) ! 001001
g( 011100 ; 001001 ) ! 001001
g( 011100 ; 001001 ) ! 001001
g(q; q0 ) ! 000001
where q; q0 2 1 13
nf 011100 ; 001001 g
a ! 011100
b ! 000010
The language recognized by this computed automaton is exactly the least model of the input quasi-automaton program, that is the language generated by the following tree grammar6:
Type ) fp(A; A) j fp(B; A) j fq (A) j fq (N) N ) g(A; A) j g(A; N) j g(N; A) j g(N; N) A)a B)b By replacing the function symbols, which represent predicate symbols of the original program, we obtain a grammatical presententation of the least set-based model of the original program. The reader should be convinced that if a ground atom is a logical consequence of the program given in Figure 3, then it is generated by this grammar.
6 Remark: Bottom-up vs Top-down One should notice that the least set-based model corresponds with the bottom-up approximate semantics described in [Hei92b]. In the reference, a top-down (i:e: goal-directed) approximate semantics is also described. However, the idea of this method is rather close to one developed in [GdW92] and [GdW94]. This latter is based on a program transformation, called Magic-set Transformation. This technics aims to simulate with a bottom-up algorithm the top-down behavior of a (goal-directed) resolution. Therefore, performing a top-down approximate computation for a program and a goal is closely related to a bottomup computation of the magic-set transformation of those two.
7 Conclusion We have proposed a new method for computing the least set-based model of a logic program. It is based on type programs, introduced in [FSVY91], and tree automata. We claim that this method is simpler than the original presentation [HJ90] and gives an equivalent representation of the solution, unlike [FSVY91]. The exact time complexity of this method is proved. It appears that this complexity is similar to the one of the (less accurate) method based on set constraints. We think that, as in the case of the method using set constraints, this algorithm intractable in theory could be implemented in a quite ecient way. Finally, considering Magic-set transformation, this method is suitable for top-down approximation as well.
References [CC77]
P. Cousot and R. Cousot. Abstract interpretation: a uni ed lattice model for static analysis of programs by construction or approximation of xpoints. In 4th ACM Principles of Programming Languages Conference, pages 238{252, 1977. [CC92a] P. Cousot and R. Cousot. Abstract interpretation and application to logig programs. Journal of Logic Programming, 1992. [CC92b] P. Cousot and R. Cousot. Comparing the Galois connection and widenig/narrowing approaches to abstract interpretation. In Proc. PIPL'92. SpringerVerlag, 1992.
6 In fact, this grammar is an easy translation of the automaton where reachable states are replaced with nonterminal symbols.
14
[CC95]
[CH92] [Deu91] [FSVY91] [GdW92] [GdW94] [GS84] [Hei92a] [Hei92b] [Hei94] [HJ90] [HJ92] [Llo87] [Mis84] [MO84] [MR85] [XW88] [Zob87]
P. Cousot and R. Cousot. Formal language, grammar and set-constraint-based program analysis by abstract interpretation. In Conference Record of FPCA'95 Conference on Functional Programming Languages and Computer Architecture, pages 170{181, 1995. B. Le Charlier and P. Van Hentenryck. Experimental evaluation of a generic abstract interpretation algorithm for Prolog. In Proc. IEEE ICCL 1992, pages 138{146, 1992. A. Deutsch. An operational model of strictness properties and its abstraction. In Proc. 1991 Glasgow University Functional Programming Workshop, pages 82{99, 1991. T. Fruhwirth, E. Shapiro, M.Y. Vardi, and E. Yardeni. Logic programs as types for logic programs. In Proceedings of the 6th IEEE-LICS, pages 300{309, jun 1991. J. Gallagher and D.A. de Waal. Regular approximations of logic programs and their uses. Technical Report CSTR-92-06, University of Bristol, mar 1992. J. Gallagher and D.A. de Waal. Fast and precise regular approximations of logic programs. In Proceedings of the 11th Int. Conf. on Logic Programming, pages 599{613. MIT-Presse, 1994. F. Gecseg and M. Steinby. Tree Automata. Akademiai Kiado, Budapest, 1984. N. Heintze. Practical aspects of set based analysis. In Proceedings of the International Joint Conference and Symposium on Logic Programming, nov 1992. N. Heintze. Set Based Program Analysis. PhD thesis, Carnegie Mellon University, sep 1992. N. Heintze. Set-based analysis of ML programs. In Lisp and Functional Programming, pages 306{317. ACM, 1994. N. Heintze and J. Jaar. A nite presentation theorem for approximating logic programs. In Proceedings of the 17th ACMPOPL, pages 197{209, jan 1990. N. Heintze and J. Jaar. An engine for logic program analysis. In Proceedings of the 7th IEEE-LICS, pages 318{328, jun 1992. J. Lloyd. Foundations of Logic Programming. Springer-Verlag, 1987. P. Mishra. Toward a theory of types in prolog. In Proceedings of the 1st IEEE Symposium on Logic Programming, pages 289{298, jul 1984. A. Mycroft and R.A. O'Keefe. A polymorphic type system for prolog. Arti cial Intelligence, 23:295{307, 1984. P. Mishra and U. Reddy. Declaration-free type checking. In Proceedings of the 12th Annual ACM Symposium on the Principles of Programming Languages, pages 7{21, 1985. J. Xu and D.S. Warren. A type inference system for prolog. In Proceedings of the 5th Int. Conf. and Symp. on Logic Programming, pages 604{619. MIT-Press, jul 1988. J. Zobel. Derivation of polymorphic types for prolog programs. In Proceedings of the 4th Int. Conf. on Logic Programming, 1987. 15