Language Extensions for Semantic?Integration of Deductive Databases P. Asirelli1, C. Renso2 , F. Turini2 1 2
IEI-CNR, Via S. Maria - Pisa, Italy. E-mail:
[email protected], Dipartimento di Informatica, Corso Italia, 40 - 56125 Pisa, Italy. E-mail: frenso,
[email protected] Abstract. A language in support of semantic integration of deductive
databases is proposed. The language allows one to construct mediators by extending logic programming with a suite of operators for composing programs and message passing features. The abstract semantics and implementation techniques of the extensions are discussed, and an example of integration of databases supporting libraries and departments is used to illustrate the usefulness of the approach.
1 Introduction At present, in the database area, much attention is devoted to studying the possibility of integrating dierent databases or, in any case, databases which have been developed within other projects, and that may be resident at dierent sites [19]. This issue has been largely studied in the past decades giving rise to approaches such as federated databases, multidatabases, interoperable databases or mediators. Here, we focus on integration of deductive databases. The problem is twofold:
{ at the lower level, how to enrich traditional databases with deductive capabilities; { at the higher level, how to integrate deductive databases which have the same semantic domain, but present dierent local models.
In the rst case, one would like to integrate deductive databases (such as Datalog) with traditional databases. A successful approach to this problem was implemented, for example, in the LDL + + language [3]. Our proposal does not focus on this issue, although we intend to make use of such techniques. Our aim is to study the second issue, that is the problem of semantic integration of deductive databases. To illustrate the idea, we show a very simple case. Consider two travel agencies, each having a database which contains information about accommodations. They have the same semantic domain (accommodations) but they use dierent data representation, as we can see in the following example: ?
Work partially supported by the EC-US Cooperative Activity Project and by CNR, grant #96.00069.02
Agency A
lodging(excelsior, 4-star, rome, hotel, 500.000, 06/555175) lodging(holiday-inn, 4-star, orence, hotel, 600.000, 055/555002) lodging(solaria, 4-star, marilleva, apartments, 500.000, 0437/555789) .. .
Agency B
hotel(miramonti, cortina, 600.000) apartment(les alpes, m.d.campiglio, 1.000.000) .. .
In the rst database the information is represented by means of the lodging predicate, where the rst argument is the name of the accommodation, the
second is the classi cation, the third is the location, the fourth is the kind of the accommodation and the last two arguments are the price in italian lire (per night the hotels and per week the apartments) and the telephone number, respectively. The database of the agency B represents accommodations in a dierent way. Each hotel is represented by the hotel predicate with three arguments, the rst of which is the name, the second one the location and the last one the price (in italian lire). The apartments are represented with the apartment predicate. This is an example of schematic discrepancy, where the values of one database correspond to metadata (relations) in the other [12]. In an integrated environment, the user would like to access the information of both databases in a uniform way. For example, we would like to represent, somehow, the equivalence between the information on hotels of Agency A and Agency B, so that a user can ask information about hotels to both databases without having to know the local data representation of the agencies. On the other hand, we do not want to change the local databases. We aim at de ning a middle layer between the data sources and the user level. This middle layer correspond to the concept of mediator (due to Wiederhold [23]) which can be regarded as a framework to perform semantic integration over multiple data sources and reasoning systems. Our approach aims at the de nition of a mediator language for the semantic integration of dierent conceptual schema. In the following, we refer to [21] where interoperability means making information systems work together. Interoperability can be intended at two levels of abstraction: syntactic (low level) and semantic (high level) one. In the syntactic the \meaning" of the terms is irrelevant. Examples of this kind of interoperation are dierent Unix systems which work together or cooperating SQL systems of dierent vendors. Semantic interoperation means integrating systems at a level which involves shared meaning, or semantics, of terms. In other words it is the process of specifying methods to resolve con icts, pool information together, and de ne new compositional operations, based on existing operations, in the individual data sources. Ullman in [21] summarizes semantic interoperability in three broad approaches: Point-to-point, Facilitator and Mediator corresponding to dierent architectures.
The logic language we have in mind is an extension of Logic Programming with mechanisms to combine deductive databases according to the Mediator model. In particular, we will enrich Horn Clauses language by means of metafeatures which make our language suitable to interoperability purposes [12]. We will give a formal declarative semantics of these mechanisms, based on the immediate consequences operator. The plan of the paper follows. Section 2 introduces the syntax of the language and it provides an intuitive explanation of composition operators. Section 3 motivates our work by showing, as an example, the formalization of the problem of managing a library. In section 4 we give the denotational semantics via an extension of the immediate consequences operator. Section 5 shows the inference rules that describe the operational semantics of the language. Finally, in section 6 we make some comparisons of our language with other similar approaches in the literature and we discuss future work.
2 The Language We consider a set of meta-level operations for composing de nite logic programs, originally introduced in [6, 8, 2]: Union ([), Intersection (\), and Constraint (=). These operations de ne the following language of program expressions: Pexp ::= Program j Pexp [ Pexp j Pexp \ Pexp j Pexp=Program where Program is a named collection of clauses. Each set of clauses (program) is associated with a unique name by means of a global naming mechanism. In the sequel we will abuse the notation and use a program identi er to directly denote the set of clauses associated with it. The language of program expressions has been largely investigated both from the theoretical and from the application point of view (e.g. in [2, 6, 8, 10, 15]). In all these papers the language of program expressions has been employed as a meta-language for composing object programs written in a separate language, namely the language of de nite programs. Other operators were introduced. For example, an import operator / was introduced in [8] in order to handle modularization problems, and we expect that it may turn out useful in the future also in the context of mediator-based languages. Here, we consider an extended object language in which program expressions may occur in object level programs. Namely, programs are extended de nite programs that may contain meta-level calls to program expressions in clause bodies. More precisely, a program is a nite set of extended de nite clauses of the form A B1 ; : : :; Bn where each Bi is either an atomic formula or a meta-level formula of the form \C in Pexp", where C is an atomic formula and Pexp a program expression. A goal like \C in Pexp" introduces a form of message passing between object level
program. The idea is that the program containing the goal \C in Pexp", sends the message C to the \virtual" program denoted by \Pexp". As usual, logical variables act as input/output channels between programs. We assume that the language in which programs are written is xed. Namely, there is a xed set of function and predicate symbols that include all function and predicate symbols used in the programs being considered. Moreover, program names and program composition operations are disjoint from all other constant and function symbols that may occur in programs. Notice also that in all expressions of the form \A in Pexp" occurring in the clause bodies of programs, the program expression Pexp is ground (i.e. it does not contain variables) by de nition of program expression itself. Below we will provide the language with a denotational semantics based on an extended immediate consequence operator. We will also describe an operational top-down semantics via an extension of SLD-resolution, and we will hint at methods for using the denotational semantics as a basis for an ecient bottomup operational semantics based on techniques like the semi-naive computation strategy. Here we give the operators an informal semantics by means of examples. Given a program expression E, we show a plain logic program that behaves as the program expression, i.e. it provides the same answers to the same queries, whatever is the operational semantics in use. Such a transformational approach, thoroughly described in [7, 2], is not practical in a database context, but it may be useful for an intuitive understanding of program expressions. Consider the following plain programs P: arc(a; b) arc(b; c) arc(c; b) Q:
arc(X; Y )
arc(X; Z); arc(Z; Y )
P describes a graph. Q axiomatizes a general property of graphs. P [ Q behaves as a plain program containing the clauses of P and the clauses of Q. As the example shows, union may be used to factor knowledge in dierent modules. Intersection allows to combine knowledge by merging clauses with uni able heads into clauses having the conjunctions of the bodies of the original clauses as body. The net eect is that the two plain programs act as sets of constraints one upon the other. Consider the example P:
likes(X; Y )
sweet(Y )
Q:
hates(X; Y )
bitter(Y )
likes(bob; Y )
sour(Y )
Then the program expression P \ Q behaves as the plain logic program likes(bob; Y )
sweet(Y ); sour(Y )
Notice that P \ Q does not say anything about hates, since nothing about hates is deducible from Q. The constraint operator combines the features of union, intersection and a simple form of negation to provide an asymmetric composition between a program P and a program Q. Q acts as a set of constraints for P as it is illustrated by the following example. Consider P: Q: likes(john; Y ) green eyes(Y ) likes(X; Y ) blond hair(Y ) green eyes(mary) blond hair(mary) blond hair(susan) The following plain program behaves as the program expression P=Q. likes(X; Y ) X 6= john; blond hair(Y ) likes(john; Y ) green eyes(Y ); blond hair(Y ) green eyes(mary) blond hair(mary) blond hair(susan) Notice that the constraint is applied only to John while retaining the general knowledge about the rest of the people.
3 A Motivating Example In the following we will show as an example a typical situation in which a library loans books to people of various departments. Each department has a database that contains information about its employees, and rules that state the local criteria for authorizing the loan of books from the library. The database of the library catalogues the publications, it contains information on users that have lost or destroyed books in the past, and rules that establish which kind of publications are loanable to users. The agreements between the library and the departments state that only authorized people can loan publications from the library, and that every employee is allowed to read the publications on the
premises. Furthermore, special agreements between the library and each department are possible in order to grant longer periods of loan. We will show a proposal for a formalization of this problems with the use of deductive databases for the representation of conceptual schemata, and mediators to allow the user to access, in a uniform way, the knowledge encoded in the various databases. Mediators are built using the language informally introduced in the previous section.
3.1 Architecture The general architecture we refer to mirrors the architecture proposed by Ullman et al. in [18] where they propose the use of wrappers between information sources and the mediation layer. A wrapper can be de ned as a component that converts data from each source to a common model , and provides a common query language for extracting information. A mediator combines, re nes, integrates data from wrappers, providing applications with a \cleaner" view. The key point here is that wrapping does not aect the local models of the databases. It just builds a layer of information above the local databases. In the following, we show the proposed architecture, the entity-relationship schemata and the source code of the deductive databases, the common model obtained from wrappers, and some examples of mediators which use the composition operators to combine data from wrappers. We will see how the message passing mechanism can be used to implement wrappers and mediators. USER
LOAN
MED2
Mediators MED1
W-Department
W-IEI
W-University
W-Agreement
Dep. of CS
IEI
University
Agreement
Layer
W-Library
Wrapping Layer
Database
Library
Sources
Figure 1 Architecture of the system The deductive databases represent the conceptual schema of the departments and of the library. The department of Computer Science, IEI (an Institute of the National Research Council), and the university have similar information but
dierent ways of representing data. In other words, they have the same semantic domain, but dierent local schemata. Here the task of the wrapping phase is twofold: on one hand it translates the dierent models into a unique common model (which, in general, will be a superset of the various models) so that the wrapped version of each database represents its own information by means of a common model. On the other hand, wrapping transforms pieces of database information into a more compact form (as in the case of the Library database). In the mediator layer, we de ne two mediators and a set of constraint rules (Loan). The user can query Med-2 that exploits information from Med-1, the wrapped databases and the constraint rules.
3.2 Entity-Relationship Diagrams Here, we present the design of the system, showing the local schemata of the databases and the schemata obtained from wrapping. In the schema integration phase each local model gives its contribution to the common model. We will explain this phase by means of an entity-relationship formalism, showing the diagrams both of the local models and of the models obtained by wrapping. The department of Computer Science ( g. 2) represents people by means of four entities: student, PhD student, researcher, professor. Each entity is in relation with the entity authorized that represents people that are allowed to loan books from the library. Here the criterium is the following: each PhD student, researcher and professor is authorized to loan books, but not all the students are. In the IEI Institute ( g. 3), people are represented by means of the researcher entity. Each researcher is authorized to loan books from the library. In the University database ( g. 4), people are represented by means of the person entity, with rst name, last name, the department they belong to and the position in the department as attributes. FN LN year FN LN year e-mail
student
phd (0,1) person
FN LN e-mail FN LN e-mail
authorize
(1,1)
researcher
professor
Figure 2 Department of Computer Science local schema
authorized
period
FN LN e-mail
researcher
FN LN Dept position
person
(1,1)
(1,1)
authorize
allow_loan period
Figure 3 IEI local schema (1,1)
authorize
(1,1)
Figure 4 University local schema
authorized for loan
period
The Library database ( g. 5) states that only books and journals are loanable publications. Each entity is identi ed by attributes such as title, author, volume and a special ag which represents the real availability of that publication in the library. Notice that proceedings of conferences are always available since they are not loanable. Furthermore, the library has a list of bad users, i.e. users who lost books in the past. The Agreements database ( g. 6) contains information about special agreements the library may have established with some departments. This situation is represented by means of the special-agreement entity, whose attributes specify the longer period of loan for a given department. title author
title year
book
proceedings
avail loanable title volume number avail
FN journal
bad_user LN
Figure 5 Library local schema
Dept
special
Period
agreement
Figure 6 Agreements local schema Since the Department of CS, the University and IEI have the same semantic domain, we choose a common model that includes all the information of the local schemata we are interested in. In this example we represent people by means of rst and last name, department, position and year, while we ignore e-mail. The diagram of the common model of the Department of CS, IEI and University is in gure 7. In the wrapped version of the Library ( g. 8), we represent each publication by means of the publication entity. It has attributes such as the type
of publication (book, journal ...), title, author, volume. Again, we have the list of bad users. FN LN position year Dept
person
publ title author volume number year avail
publication
(0,1)
(1,1) authorize
authorized for loan period
Figure 7 Common Model
(0,1)
(1,1) loanable
loanable
FN bad_user LN
Figure 8 Library wrapped schema
3.3 Source Databases In the following, we represent the deductive databases of the departments and of the library. In the departments the authorization entity is de ned by means of rules. In the Library database the loanable predicate establishes the policy of the library, for example: it is possible to loan only books and journals, and not proceedings. The Agreements database contains information about particular conditions, that is, it associates the longer loan period to each department that has a special agreement with the library,
Department of CS student(gianni, rossi, 3) student(davide, verdi, 2) .. . phd(maria, gialli, 3, maria@cs) researcher(giuseppe, bianchi, giuseppe@cs) professor(michele,neri, michele@cs) .. . authorized(FN,LN, 3months) professor(FN, LN ) authorized(FN,LN, 1month) researcher(FN, LN, )
authorized(FN,LN, 1month) phd(FN, LN, , ) authorized(FN,LN, 1week) student(FN, LN, Year), Year 3
IEI researcher(tommaso, rosi,
[email protected]) researcher(luca, rossi,
[email protected]) .. . allow loan(FN,LN, 3months) researcher(FN,LN, )
University person(maria, gialli, cs, phd) person(susanna, monti, economy, professor) person(simone, verdi , literature, student) .. . authorized for loan(FN,LN, 1month) person(FN,LN, math, ) authorized for loan(FN,LN, 1week) person(FN,LN, economy, professor) authorized for loan(FN,LN, 1day) person(FN,LN, literature, professor)
Library book(sterling-shapiro, the-art-of-prolog, yes) book(ullman, database-and-knowledge-base-systems, yes) journal(communications-ACM, 10, 1, no) journal(communications-ACM,10, 2, yes) journal(ieee-computer,20, 1, no) journal(ieee-computer,20, 2, yes) proceedings(gulp, 1995) proceedings(vldb, 1994) .. . loanable(Author, Title, nil, nil) book(Author, Title, Dept, Avail) loanable(nil, Title, Vol, Num) journal(Title, Vol, Num, Avail) bad user(gianni, rossi) .. .
Agreements special agreement(cs, 2months)
special agreement(math, 1month) .. .
Translated Databases
We choose a common model for databases that represent \similar" information by using dierent local schemata. Here, we represent people by the person predicate and the authorization by the authorized for loan predicate. The translation into a common model is done in a \dynamic" way by means of the message passing mechanism. We denote with the \W" pre x the wrapped databases. In this phase we deal with some kind of semantic heterogeneity. In the W-Department of CS, W-IEI and W-Library databases we solve a problem of semantic discrepancy between the local schemata and the chosen wrapped schemata. In the W-University database we just rearrange the attributes of the entity by selecting the attributes that represent information useful for mediators.
W-Department of CS person(FN, LN, student, Year, cs) student(FN, LN, Year) in Department of CS person(FN, LN, professor, nil, cs) professor(FN,LN, Email) in Department of CS person(FN, LN, researcher, nil, cs) researcher(FN,LN, Email) in Department of CS person(FN, LN, phd, Year, cs) phd(FN, LN, Year) in Department of CS authorized for loan(FN,LN, Period) authorized(FN,LN, Period) in Department of CS
W-IEI person(FN, LN, research, nil, iei) researcher(FN, LN, Email) in IEI authorized for loan(FN, LN, Period) allow loan(FN, LN, Period) in IEI
W-University person(FN, LN, Position, nil, Dept) person(FN, LN, Dept, Position) in University
authorized for loan(FN,LN, Period) authorized for loan(FN,LN, Period) in University
W-Library publication(book, Author, Title, nil, nil, nil, Avail) book(Author, Title, Avail) in Library publication(journal, nil, Title, Vol, Num, nil, nil, Avail) journal(Title, Vol, Num, Avail) in Library publication(proceedings, nil, Title, nil, nil, Year, yes) proceedings(Title, Year) in Library loanable(Author, Title, Vol, Num) loanable(Author, Title, Vol,Num) in Library bad user(FN, LN) bad user(FN,LN) in Library
W-Agreements special agreement(Dept, Period) special agreement(Dept, Period) in Agreements
Mediators
Med-1 states the general loan regulation on the basis of information about publications that are in the Library database and by checking the authorization in all the departments. Loan contains a restricting rule which de nes the loan predicate stating that the loan is possible only if a person is not a bad user and if the publication is really available in the library. In Med-2 we nd two rules. The rst one deals with the loan of publications. The rule re nes the information obtained by Med-1 by means of the constraint rule in Loan and by checking for special agreements. It replies to the user with the period for which the person is allowed to loan the book. The other rule is concerned with the use of a publication on the premises. Since the consultation of a publication on the premises does not require any authorization, all employees of the departments are allowed to consult publications if they are really available. The user can query Med-2 in order to loan or to use a given publication on the premises. Notice that both mediators refer to the wrapped version of the databases in order to apply the operators correctly. Finally, a short remark on negation: in this example we use negation in the constraint rule although the abstract semantics is de ned for positive programs. However, the negation used here is a
shorthand for a check on a list of extensional atoms.
Med-1 loan(FN, LN,Author,Title, Vol, Num, Period) loanable(Author,Title, Vol, Num) in W-Library, authorized for loan(FN, LN, Period) in W-IEI [ W-Department of CS [ W-University
Loan loan(FN, LN, Author, Title, Vol, Num, Period) not bad user(LN,FN) in W-Library, publication(Author,Title, Vol, Num, yes) in W-Library
Med-2 loan to user(FN,LN,Author,Title, Vol, Num, Period) loan(FN, LN, Author, Title, Vol, Num, Period1) in Med-1 = Loan person(FN, LN, Position, Year, Dept) in W-IEI [ W-Department of CS [ W-University special agreement(Dept, Period2) in W-Agreements max period(Period1, Period2, Period) use(FN,LN,Author,Title, Vol, Num) person(FN, LN, Position, Year, Dept) in W-IEI [ W-Department of CS [ W-University publication(Author,Title, Vol, Num, yes) in W-Library Notice that in Loan, the variable Period in the rule for loan does occur in the head, but it does not occur in the body. However, the clause is a constraint that will never be evaluated as it is. Later, the clause will be used in the constraint construction, and this apparent anomaly will disappear.
4 Denotational Semantics We now de ne the denotational semantics of the language in a bottom-up style by extending the standard immediate consequence operator (TP ) semantics for logic programs. The semantics we are going to give does not include the de nition of the \message passing" mechanism. The introduction of the \in" feature makes the bottom-up de nition much more complex. The reader can refer to [4, 9] for the formal de nition.
Recall that, for a de nite logic program P, the immediate consequence operator TP is a continuous mapping over Herbrand interpretations de ned as follows [22]. For any Herbrand interpretation I: A 2 TP (I) () (9B : A
B 2 ground(P) ^ B I)
where B is a (possibly empty) conjunction of atoms and ground(P) denotes the ground (i.e. fully instantiated) version of program P. Such an approach is motivated by the observation that the classical least model semantics is not compositional | it is not possible to obtain the least model of, say, the union of two programs P and Q by homomorphically composing the least models of P and Q. In [6] it is shown that the TP -based semantics is in fact both compositional and fully-abstract with respect to the repertoire of composition operations adopted in this paper. In the de nition of the constraint operator we will use the notation E(F ) with E program expression and F a program. The intuitive meaning is that E(F ) contains all the clauses of E which de ne predicate names which have also a de nition in F. In the following we give the formal de nition.
De nition1. We give the de nition by induction on the structure of E. { E plain program, then E F contains the clauses of E de ning predicates ( )
that have a de nition in F; { E = R [ S then E(F ) = R(F ) [ S(F ) { E = R \ S then E(F ) = R(F ) \ S(F ) { E = R = S then E(F ) = R(F ) =S
Finally, we denote with E(F )? all the clauses of E which do not belong to E(F )
De nition2. The semantics of program expressions is given as follows: T(E [F ) (I) = TE (I) [ TF (I) The set of immediate consequences of the union of two program expressions is the set-theoretic union of the immediate consequences of the two expressions. (The symbol [ is used to denote both the program composition operation and the set-theoretic union.) T(E \F ) (I) = TE (I) \ TF (I) Analogously, the set of immediate consequences of the intersection of two program expressions is the set-theoretic intersection of the immediate consequences of the two expressions. T(E=F ) (I) = TE \F (I) [ T(E F ? ) (I) [ (TE F (I) \ T(F;E )c ) (I)) ( )
( )
The immediate consequences of a program expression E constrained by a program F is a combination of the union and intersection operator and a kind of complement of a program w.r.t. a program expression. Informally, consider the case in which E is a plain program constrained by a set of clauses F. The resulting program is obtained by the union of two parts. One is the intersection of the two programs, that forces them to agree during the deduction. But intersection alone is not enough, because some clauses would be missing in the result. In particular, we miss all the clauses for predicates which are de ned in E and not constrained by F. These predicates are of two kinds: the ones which do not have a de nition in F, and those which have a de nition in F that constrains only a subset of atoms potentially derivable in E. The complement of a program F w.r.t a program expression E is de ned as follows: T((F;E )c ) (I) = BE(F ) n TF (BE ) Here we denote with BE the Herbrand Base of the program E and with BE(F ) the set of atoms of the Herbrand Base of E which have predicate names
that have a de nition in the program F. Therefore, the immediate consequences of the complement of a program F w.r.t. a program expression E is obtained by the set-dierence of BE(F ) and the heads of F instantiated by the Herbrand base of E.
5 Operational semantics We now present an operational semantics for the extended language by means of a set of inference rules. The operational characterization of the language is expressed by extending directly the standard notion of SLD refutation [1]. The standard notion of SLD refutation may be de ned by means of inference rules of the form Premise Conclusion with the meaning that Conclusion holds whenever Premise holds. We write P `# G if there exists a top-down refutation for the goal G in the program (expression) P with computed substitution #. Let us rst present the rules that model the SLD refutation for de nite programs.
P `# (H
P ` empty
(1)
P `# G1 ^ P ` G2# P `# (G1; G2)
(2)
G) ^ = mgu(A; H#) ^ P ` G# P `# A
(3)
P is a program ^ A G 2 P P ` (A G)
(4)
Rule (1) states that the empty goal is solved in any program P with the empty substitution . Rule (2) deals with goal conjunction: A conjunction (G1,G2) is solved in a program P if both goals are solved in P. More precisely, rule (2) speci es a left-to-right order in the evaluation of conjunctive goals. First goal G1 is solved with computed substitution #, and then goal G2, suitably instantiated by #, is solved with computed substitution . (Notice that we might also write rule (2) without imposing the order of evaluation of conjuncts by employing a composition operator on substitutions, as de ned for instance in [17].) Rule (3) states that an atomic goal A is solved with computed substitution # if there is a clause (H G)# in P such that A uni es with H# via , and G# is in turn solvable in P with computed substitution . Notice that the substitution # in P `# (H G) in rule (3) is the empty substitution if P is a program, as speci ed by rule (4). On the other hand, when P is an arbitrary composition of programs the substitution # in P `# (H G) becomes relevant, as we shall see below. The derivation relation ` can be extended to deal with program compositions in a simple way. Namely, each composition operation is modeled by adding new inference rules de ning whether some instance (A G)# of a clause A G belongs to the \virtual" program denoted by a program expression E, that is by proving E `# (A G). From now onwards, we will say that (A G)# belongs to E as a shorthand for saying that the clause (A G)# belongs to the virtual program: f(A G)# j E `# (A G)g. We now present the inference rules modeling the composition operations. P `# (A G) (5) P [ Q `# (A G) Q `# (A G) (6) P [ Q `# (A G) Rules (5) and (6) describe the union operation [. They state that a clause belongs to the program expression P [ Q if it belongs either to P or to Q. P `# (H1 G1) ^ Q `# (H2 G2 ) ^ = mgu(H1 #1; H2#2 ) (7) P \ Q `# # (H1 G1 ; G2) 1
2
1
2
Rule (7) states that a clause (x1 y; z) belongs to the program expression P \ Q if there is a clause x1 y in P and a clause x2 z in Q such that x1 and x2 unify via . Here, the observation of rule 2 about the ordering of evaluation of conjuncts holds.
P(Q)? [ (P(Q) \ (Q; P)c) [ (P \ Q) `# (A P=Q `# (A G)
G)
(8)
In rule (8) we give the inference rule of the constraint operator which is a combination of union, intersection and complement. P `# (p(x) G); notunify(x#; t ) (P; )c ` (p(t) empty)
(9)
Rule (9) describes the top-down behavior of the complement operator. A unit clause p(t) empty belongs to the complement of a program expression P (w.r.t any other program) with computed substitution if a clause p(x) G belongs to P with computed substitution # and x# does not unify with t . The key point here is the de nition of the notunify predicate: it takes n-tuples of variables as arguments and it returns true if it does exists a bag of indices fa1; : : :; ak g; 1 k n such that (xa ; : : :; xak )# are ground, and exists at least one index aj such that taj 6= xaj # ^ j 2 f1; : : :; kg. If 8i 2 [1; : : :; n] xi# is not ground then notunify returns false. It is worth noting that in the rule the second argument of the complement operator here is not used to perform the computation. In fact, while the second argument, in the bottom-up semantics de nes the universe where values are chosen, in the top-down semantics, the arguments are already ground when this inference rules is applied. 1
Q `# A P `# (A in Q)
(10)
This rule de nes the behavior of \in" formulae. Namely, solving an extended goal of the form \A in Q" simply amounts to solving A in the program expression Q. We have proved in [9] that the top-down semantics in correct w.r.t the abstract semantics.
6 Conclusions Our proposal aims at de ning an interoperable language much in the spirit of others approaches developed in the last few years. In particular, we would like to mention here the HERMES [20] and the TSIMMIS [11] projects, and the SchemaLog [13] and IDL [12] languages. In the following we will try to point out some commonalities and dierences between them and our proposal. The mediator language of the HERMES project is a logical language enriched with mechanisms inherited from the Hybrid Knowledge Bases theory [14]. A mechanism somehow similar to our \message passing" (domain call) enables
the mediator to extract information from a particular database. There, the mediator refers to databases which are really heterogeneous, in the sense that this mechanism allows the evaluation of queries in dierent domains, such as spatial, relational or text databases. The architecture of the HERMES system does not include any kind of wrapping, and the language does not oer composition operators. In the TSIMMIS project, Ullman et al. [11] de ne a mediator language based on an object-oriented extension of SQL. This language is provided with a \message passing" mechanism to refer to the wrapped databases. Furthermore, it has higher order features, that allows one to deal with schematic discrepancies in a compact way. This is a feature shared also by IDL, a higher order logic language based on Horn clauses. This language allows variables ranging over data and metadata, the de nition of a variable number of relations, depending on the state of the database, and views update capabilities. SchemaLog is a logical language with higher order syntax which can be used in a federation of relational databases without wrapping. The main application elds of SchemaLog are database programming, schema browsing, schema integration and schema evolution. It is worth noting that the common denominator of the languages mentioned above is that they are all de ned starting from a logical language and by enriching it with meta or higher order features. This makes the languages suitable for dealing with interoperability. We have followed this common approach de ning new mechanisms on top of logic programming and extending, in a conservative way, its formal semantics. Higher order syntax with variables ranging over data and metadata provides an elegant solution to the problem of schematic discrepancies. Our approach is dierent in that we concentrate on meta level composition of deductive databases, rather than on the use of meta features over terms and formulas. The future developments of our research are in three main directions. The rst one concerns an ecient implementation of the language. Our aim is to extend some of the classic query evaluation methods to the composition operators. In particular, we intend to study the possible extensions to the seminaive evaluation strategy and the magic set technique. Intuitively speaking, the seminaive evaluation technique consists in an ecient implementation of the immediate consequences operator (also called naive evaluation). The eciency is obtained by focusing only on the new facts generated at each iteration, thereby avoiding recomputing the same facts in case of recursion. The idea underlying the magic set method is to combine bottom-up and top-down techniques in order to have a \goal-driven bottom-up" computation, in the sense that the bottomup computation computes only the facts \relevant" to the query [16, 5]. We have encouraging preliminary results in this respect. The second direction is to study how to introduce update mechanisms in the language. The last direction is the study of techniques for introducing negation in the language. It seems that extending the Fitting operator could be a good starting point [2].
Acknowledgments We are grateful to Dr. Alessandra Raaeta for helpful comments.
References 1. K. R. Apt. Logic programming. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, pages 493{574. Elsevier, 1990. Vol. B. 2. D. Aquilino, P. Asirelli, C. Renso, and F. Turini. An operator for composing deductive databases with theories of constraints. In A. Nerode V. W. Marek, editor, 3rd International Conference on Logic Programming and Non Monotonic Reasoning, volume 928, Lexington, KY, USA, June 1995. LNCS. 3. N. Arni, K. Ong, and C. Zaniolo. Negation and aggregates in recursive rules: the LDL++ approach. In Proc. 3rd Int. Conference on Deductive and O-O DBs, DOOD93, Phoenix, December, 6-8 1993. 4. P. Asirelli, C. Renso, and F. Turini. Semantic integration of deductive databases. Technical Report TR B4-17, IEI-CNR, June 1996. 5. C. Beeri and R. Ramakrishnan. On the power of magic. Journal of Logic Programming, 10(324), 1991. 6. A. Brogi. Program Construction in Computational Logic. PhD thesis, University of Pisa, March 1993. 7. A. Brogi, A. Chiarelli, V. Mazzotta, P. Mancarella, D. Pedreschi, C. Renso, and F. Turini. Implementations of program composition operations. In M. Hermenegildo and J. Penjam, editors, Proceeding of the Sixth Int'l Symp. on Programming Language Implementation and Logic Programming, volume 844 of LNCS. Springer-Verlag, Berlin, 1994. 8. A. Brogi, P. Mancarella, D. Pedreschi, and F. Turini. Modular Logic Programming. ACM Transactions on Programming Languages and Systems, 16(4):1361{ 1398, 1994. 9. A. Brogi, C. Renso, and F. Turini. Program composition and message passing in logic programming. technical report, University of Pisa, 1996. 10. A. Brogi and F. Turini. Fully abstract compositional semantics for an algebra of logic programs. Theoretical Computer Science, 150, 1995. 11. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proceedings of IPSJ Conference, Tokyo, Japan, October 1994. 12. R. Krishnamurthy, W. Litwin, and W. Kent. Language features for interoperability of databases with schematic discrepancies. In ACM SIGMOD Conference, volume 20. ACM, 1991. 13. L. Lakshmanan, F. Sadri, and I. Subramanian. Logic and algebraic languages for interoperability in multidatabase systems. Technical Report TR-DB-95-01, Department of Computer Science, Concordia University, 1995. 14. J. Lu, A. Nerode, and V.S. Subrahmanian. Hybrid knowledge bases. IEEE Transactions on Knowledge and Data Engineering, 1994. 15. P. Mancarella and D. Pedreschi. An algebra of logic programs. In R. A. Kowalski and K. A. Bowen, editors, Proceedings Fifth International Conference on Logic Programming, pages 1006{1023. The MIT Press, 1988.
16. P. Mascellani and D. Pedreschi. The declarative side of magic. submitted for publication, 1996. 17. C. Palamidessi. Algebraic properties of idempotent substitutions. In SpringerVerlag, editor, Proc. of the 17th International Colloquium on Automata, Languages and Programming (ICALP), number 443 in Lecture Notes in Computer Science, pages 386{399, 1990. 18. Y. Papakostantinou, H. Garcia-Molina, and J. Ullman. Medmaker: A mediation system based on declarative speci cations. In ICDE, 1996. to appear. 19. A. Silberschatz, M. Stonebraker, and J. Ullman. Database research: Achievements and oppurtunities into the 21st century. Technical report, NFS Report, 1995. 20. VS. Subrahmanian, S. Adali, A. Brink, JJ. Lu, A. Rajput, J. Rogers, R. Ross, and C. Ward. HERMES: A heterogeneous reasoning and mediator system. submitted for publication. Can be found in http://www.cs.umd.edu/projects/hermes/overview/paper/index.html. 21. J. Ullman. High level interoperation. Slides. 22. M. H. van Emden and R. A. Kowalski. The semantics of predicate logic as a programming language. Journal of the ACM, 23(4):733{742, 1976. 23. G. Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, 25:38{49, March 1992.
This article was processed using the LaTEX macro package with LLNCS style