Constraints and Redundancy in Datalog - Semantic Scholar

Report 8 Downloads 140 Views
Constraints

and Lev#

Alon

Dept.

Dept.

Science

CA

and

pushing

database. are considered. eliminates

itg

participate

and predicates Redundancy but considers

of identical

ancestor

of the

is, derivation atoms,

other.

case of programs algorithms

with

not only

are given, constraint detect

presence of constraints,

do not

Under

certain

query

for

detect-

including

redundancies

as tightly

cursive

major rules

are

to

straints

order

removing

by Klug

closely redundant

[K188]

who

that are part and the issues and constraint

that

the two

of redundancy

are are

concept

and the second is a newly intro-

Grant number NCC 2-537. t Part of the work of this author was done while visiting IBM Almaden Research Center. Permission to copy without fee all or part of thie material is granted providad that the copias are not mada or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinary. To copy otherwisa, or to republish, raquires a faa and/or specific permission. 11th Principles of Database Systems/6/92/San Diego, CA a 1992 ACM ~.8979~-~20-8/92/0006 /QQ67...$J .5Q

not.

of

when constraints

67

types

was

equiva-

is the well known

which

based on the notion types

Con-

inequalities.

constraints

it is shown Two

of unmachabilitg,

Both

investigated

redundancy

The first

magic-set constraints

were investigated

(but not the data),

related.

these pa-

on constraints

queries with

In fact,

duced concept,

●This author was supported by NASA through NASA

to

and the rules,

work

are both

considered.

parts

started

of the

of the data

Earlier

lence of conjunctive

of re-

the

was investi-

to handle

in the query

as a part

pushing. optimization

is a

from

constraints

database

generalizations

[KKR90].

done

have Pushing

the

This paper deals with

for

[Ul189])

MF*90]—essentially in

appear

of interest

strategies

role.

query

that

Introduction

Two

constraints

the

of the program

1

(cf.

an important

in the

as possible.

of re-

dat alog programs. constants

play

pers proposed

are pushed to the EDB

under

[Sag88] is one example

recently,

transformation

in

to the

to the database.

These

discussed

query

of pushing

from

in

the

for minimization

parts from

gated in [BK*89,

the

from

transformation

example

More

but also push constraints

the paper, the constraints

magic-set

based on ir-

lit erals.

assumptions

redundant

only minimal

from the given query and rules to the EDB predicates.

moving prime

one is an

constants

The algorithm equivalence

The

trees having

ac.il)

uniform

for

such that

Algorithms

ing these redundancies

that

tree of a fact

predicate. trees, that

programs

based on machabil-

is similar,

derivation no pair

rules

in any derivation

the query relevance

Redundancy

Israel

([email protected].

Abstract in datalog

Science

University

Jerusalem,

94305

([email protected])

Two types of redundancies

Sagivt

of Computer Hebrew

University

Stanford,

in Datalog Yehoshua

of Computer

Stanford

Redundancy

we cdl

derivation

redundancy are present

One case, namely,

able rules when there

(and is

irrelevance

of minimal

trees).

are investigated and when they are

the problem

of unreach-

are no constraints

has an

easy solution [Ki88]. The other three cases are not as easy, and we solve them in this paper. The creating

algorithms a

we

rule-goal

redundancy-free

present tree

derivations

that that

are

based

represents can

on all

be con-

strutted

for

the

the rule-goal

query. between

state-equivalence

These concepts

vary

of redundancy

construction

2

of

by the concepts with

those

as we move from

to another,

or from

Preliminaries

of We discuss

nodes of the tree and

associated

constraint-labels

constraints

The

tree is guided

nodes.

(i.e., only

one type

one type

of

(EDB ways.

First,

can be used in two

these algorithms

discovering;

that

dant

rules

that

gram

without

lead to redundancy

is, the algorithms can be removed

changing

find from

the result.

algorithms

can be used to push constraints

database.

In this paper,

aspect of the results.

we emphasize

However,

Vardi

oft ree automata

dat slog rules.

on certain

tree automata. direct

tree automat First,

it

to better

into tree automat An explicit

to incorporate

magic-set

(at

is needed,

constraint

transformation.

however,

pushing

The

into

algorithms

consists of a set of ground

and an EDB,

applying

the We

one IDB

predicate,

tree

and the

is

query

tree

consisting

a first

insight

A recent

the

rule-node.

pre-

is identical

also be

The approach

was useful in getting

[C091]

characterized

to its parent

tigate

work).

and therefore, that

However,

do not fall into

the problems

the frameworkl

our results

a large

that

tree is for

a derivation

grams. moval

we inves-

(for

of

Note

that that

for that

derivation

is that

our problems

by a datalog

program,

involve but

the EDB. an IDB

la-

We say

predicate

q,

we define two types

may exist

in datalog

a rule is redundant

2.1:

the output

pro-

if its

re-

of the program

A rule r of a datalog

if there is no EDB,

pears in some derivation reason

from

all EDBs).

is unreachable

the facts derived

rule-node A rule-node

at the root of the tree.

does not change

Definition

work.

1The

mle-

atom,

tree are goal-nodes

atoms

icate. only

and

by a ground

for each one of its subgoals.

beled by ground

of redundancies

of [C091],

are not a corollary

a derivation

goal-node.

In the rest of this section,

class of decidable problems involving dat slog rules (the theory of tree automata is at the core of that

in a bottom-up

The head of an instantiated

if q is the predicate

into these problems. paper

of

gen-

and it has a single child, which is an instantiated

in

The leaves of a derivation

however,

facts

of goal-nodes

A goal node is labeled

(or

(or answer)

and an EDB,

has a child goal-node

of tree automata,

output

predicate

a program a

nodes.

as the query

is the set of all ground

for the

Given

predicate

trees that are used

evaluation.

eval-

the ground

no more new facts are generated.

and could

used for a magic-set

with

a

We continue

predicates.

sented in this paper construct rules,

Given

a bottom-up

we start

for finding

redundant

predicates.

the IDB

erated

rules.

the IDB An exten-

the rules to derive facts for

evaluation.

we consider,

while

EDB facts and apply

least

which

(EDB)

program

are two:

a reduction

The EDB predicates

for the EDB

the program

in

are the predicates

relations,

is one in which

goal)

a can only find redundant

construction

facts

however,

Second,

problems,

database

problems

do not involve

efficiency

if not in theory).

of the redundancy

order

(that

database

sional

distinguish

a), and the reasons for that

leads

practice

We prefer,

algorithms

which

are defined by the program.

rules until

we

are those

in heads of rules.

atomic datalog

prob-

The problems

predicates)

to the

uation

the theory

can also be solved as decision

to present

to the

our work.

[Va89] showed that

pred-

(IDB

predicates

the first

is a useful tool for solving

lems involving consider

from

appear

appearing

the

predicates

that

refer

we also indicate

how the second aspect follows Recently,

redun-

are allowed).

sets of predicates

extensional

which

symbols

only in bodies of rules, and the intentional icates

the pro-

Second,

predicates)

two

The

are collec-

no function

and variables

between

in a given program.

we present

which

progmms

constants

We distinguish

to another.

The algorithms

datalog

tions of safe Horn rules with

program

such that

~ ap-

tree for the query pred-

I

not

Finding

also the

cially

trees.

68

unreachable

rules is easy; it is espe-

easy if we assume that

the query predicate

depends

on all other

constants

in the

predicates

program

case,

a rule

head

has an umwachable

IDB

predicate

erated

for

and there

[Ki88],

is unreachable

In this

if either IDB

for each EDB

the

EDB

predicate,

body

or

And

an

if no fact has exactly

with

Definition

special

its

predicate.

q is unreachable

q when

are no

rl:

2.2:

p(z)

T2: p(z) q(z)

r3:

Consider

derivation

rl

:– p(y,z).

: p(z,y)

:– 7’(Z).

p, q and of which

Rule

is the

cannot

predicates,

and

T2 is unreachable

query

if q is the

the latter

However,

r are IDB

predicate.

algorithm).

predicate,

query

e

and

rl ity,

by the above

since

any

accounts

as shown 2.3:

only

in the next

Consider

for

is more

derivation

3

:– p(z).

r3 : q(x)

:– e(x),

(regardless

puts

of which

is the

query

finding

[Sar90,

Sh87].

under

cannot

evance

is redundant

predicate).

show

definitions

of redundancy,

called

on the notion notion

of

in

the

describe

irrelevance,

of minimal

redundancy

but

rules

2..4

above

which

is

one

tree is minimal

(or

non-redundant)

in the tree,

the other.

are no two identical

such that

2The converse of redundancies

is also true; considered

and

heads

columns

able

that

in this

node)

wit h g.

is, some redundancies

an AND

but not by the types

g in the are the

The

the unification.

paper.

69

tree

rule-goal

The

root

The heads

of

vari-

children

referred

subgoals

to

of an

a distinct

of a rule-node

The

how

consisting

(also

the

distinct

by construct-

query.

having

dis-

4.3).

with

are

no coni.e.,

have

begins

position.

rules

children

node)

irrelevance.

we explain

the

lpredicate

in each argument

a descrip-

have

of a rule

tree is a goal-node

of the query

a goal-node

of

for

If

irrel-

modification

(see Remark

tree

q.

remains

with

rules

section

the algorithm

rule-goal

to

are rectified,

of the head

constraint

the

that

of rules

relax

this

this

begin

out-

from

for determining

In the next

as input

algorithm

discuss

we assume

Informally,

goal-

one is an ancestor

equivalence

does

q, and

are irrelevant

We

variables.

atom

predicate

the

We

of the algorithm

OR

by uniform

in

for deciding

accepts

of redundancy

section.

clarity,

ing

S

can be shown

same.

the

the rule-goal

if there

notion

For

trees. the

It

a query

tion

tinct

a new

derivation subsumes

A derivation

appear

irrelevance

an algorithm

of ‘P that

the

the in

stants

on unreachability.

Definition nodes

next

is redun-

[Sag88],

redundancies

unreachabil-

Rules

of irrelevance.

to unreachability,

almost

H

is undecid-

notion

equivalence

the

The

rules

A narrower

uniform

examples.2

based

redundant

pred-

predicate

does not

However,

describes P and

all

later Generally,

than

rule

Irrelevant

property

we change but

general

tree.

section

T2 : q(z)

r2 is reachable,

query

all redundancies.

Finding

a program

program,

if the

if the query

program:

:– q(x).

?’1 : p(z)

is relevant

some

example.

the following

rl

is irrelevant

an unreachable

capture

This

This

predicate.

I

the

based

and rule

z),

example,

Irrelevance

(note

not

unreachability

Example

type

I

regardless

predicate,

query

in any

:– e(~,~). :– p(z,

is p, but

ispl.

re-

predicate

be discovered

icate

I

redundancies,

it

r2 is irrelevant

if q is the

rl

used

predicate.

:– e(z).

gardless

dancy

query

program:

rs :pl(z)

is unreachable

able

2.3, rule

in a given

r is never

for the

2.6:

T2 : P(Z,~)

that

In this

IEDBs,

Example

:– q(z).

is an EDB

that

r is irrelevant

q or p is the query

is irrelevant

In this Note

all

In example

1’s as all its argu-

the following

if for

minimal

of whether

one fact

A rule

program

is gen-

ments. Example

2.5:

of

to as an that

unify

(also

called

resulting

from

tree can be viewed

as encoding

all the possible

facts of q. However, sive rules,

derivation

the construction

can go on forever. in designing

Therefore,

when to stop expanding

Example

a node-tag which

the main difficulty

illustrates

lowing

of the tree.

a goal-node

q(z, y) :– e(z, t),

q(t, t),

the

e(t, y).

rs : q(z, y) :– p(z, y).

two

nodes

have

the

appearing

same

in

by the following

gl

Two

on the

goal-nodes

such that

set

definition.

and g2, are said

V(g2),

posi-

variable.

is induced

there exists a one-to-one onto

We

if in each argument

relation

3.2:

predicate,

:– q(z, 2).

a given rule-goal

and query predicate.

the set of variables

equivalence

Definition

rA : p(x, y) :– q(z, y).

of

In the fol-

g. Two nodes of the same predicate

of goal-nodes

rz : q(z, y) :– e(z, y).

below.

we consider

are said to be identical An

B

is the concept

we introduce

definitions,

denote by V(g)

this difficulty.

3.1:

rs : ql(z)

never be usable.

tree for a given program

tion, ?’1 :

r.4 will

that

The key to this observation

tree

arises in the decision the branches

example

concluding

has recur-

of the rule-goal

the algorithm

The following

trees for

when the program

of

the

same

if

to be equivalent

mapping,

#, from V(gl)

@(gl ) = g2. The mapping

+ is called an isornorphism.

9

(1) ql(x)

For example, equivalent,

nodes 2,4, and 10 in Figure

1 are

but nodes 3 and 8 are not equivalent.

r

Definition

[ (2) q(%x)

3.3:

The

tag of a goal-node

noted by Z’(g), includes goal-nodes r

T

r

that

have only variables

from V(g).

A ‘8LL’)AX1:[(L’) Intuitively,

(4) ~(~t)

(3) ax,t)

the tag

goal-nodes

(7) p(x,x)

(6) e(x,x)

(sjti~x)

that

g, de-

itself and all its ancestor

of a node

should the

not

of g in order

for

only

minimal

derivations.

tain

variables

not in V(g)

g contains

appear

rule-goal

again

tree

to encode

Ancestors

that

(9) e@)

(10) q(u,u)

Definition Figure

1: A rule-goal

tree.

gram

rule-goal is shown

the head not duce tors yield

tree in Figure

of r4 unifies

expand

them

subgoals of those

constructed

that nodes,

a minimal

the construction

with

further,

for

1. Notice since

derivation

therefore,

tree.

pro-

12, we do

would

tag of

only itself.

equivalent

@ is the of gl and

In Figure

pro-

to some

ances-

would

never

However,

but

goal-nodes,

gl

if @(l’(gl))

isomorphlsm

showing

= the

g2. 1

since

show that

rithm

4 and

of them

is tag-equivalent

they

algorithm

a branch we

70

works

the

describe

state-equivalence

for

in

10 are tag-equivalent, to node

2

are all equivalent). uses

the

to determine

equivalence ing

1, nodes

though

The

which

10 is sufficient

neither

(even

there might be a point in which r4 can be used in some minimal derivation. Fortunately, we can in node

where

equivalence

of the tree could go on forever,

stopping

Two

The

not

although

7 and that

are identical and

this

that

nodes

3.4:

of g.

will

nodes 1 and 2, while

and g2, are said to be tag-equivalent

T(g2), The

1 includes

the tag of node 4 includes

(11) e(u,t)

con-

need not be included

in any subtree

node 2 in Figure

all

in subtrees

in the tag of g, because these variables appear

9

condition

when In

tree.

below,

it

steps.

Step

state-

expand-

Algorithm

suffices

as tag-equivalence. in three

of

to stop

to The

1 expands

3.1, define algothe

rule-goal ing

tree for the query

with

q that

a goal-node

consisting

has a distinct

position.

branch

Step

predicate

variable

of an

1 terminates

to another

the

goal-node

the

a goal-node

that

that

unification

is identical that are with

heads

a rule

~ cannot

sible

from

ion.

Step

fashion, root

the

EDB

via nodes Steps

marked

some

that

as well.

The

that

is marked

following

example

(rather

than

such as equivalence to assure

g

then

/* Step 2: Bottom-up

the

Mark

of r identical

subgoal

g or an ancestor

of g

and

are

in

3 as rel-

2.

g

is accessible

then mark g as accessible; if a goal-node g is stat~equivalent

terminat-

to an accessible goal-node

based on state3.2))

marking*/

nodes in To as accessible;

of a rule+node r accessible then mark r ss accessible; if at least one child of a goal-node

to it are

a less refined

all EDB

repeat if all children

is marked,

if it appears

of the

any

to

make rule r a child of g;

in Step 2. In

in Step

(Definition

g in TO,

perform the unification

fash-

from

shows that

correctness

making to either

are acces-

is shown in Figure

of a branch

goal-node

a top-down

a goal-node

is relevant

algorithm

ing the expansion

needed

in

in each column;

such that g is not state-equivalent any expanded goal-node in To do for each rule r c P do if rule r unifies with g without

there

in a bottom-up

variable

there is an unexpanded

ancestors.

are state-equivalent

The full

equivalence

g or its

as accessible

A rule

rule-node

and

are reachable

2 and 3, when

all goal-nodes

evant.

that

a distinct

while

which

a goal-node

as relevant,

nodes

with

r if

of g. Note

of the tree that

nodes

3 marks

the

marked

change

the nodes

gl

are rectified

q)

begin /* Step 1: Constructing the rule-goal tree */ Let To be a tree consisting of a goal-node for q

Step 1 will

a rule-node

of

irrelevant-rules(~,

of a

is state-

a subgoal

unification

procedure

is already

g or an ancestor

of rules

constants,

Step 2 marks

both

produce

to either

since no

will

g with

of

argument

expansion

in the tree and has been expanded. not expand

atom

in each

when it reaches a goal-node

equivalent

q, start-

h

then mark g as accessible; until no new nodes are marked;

notion, is indeed

algorithm.

I* Step

3: Top-down

marking

*I

if the root of To is accessible Example

then mark it as relevanfi repeat if g is a relevant goal-node,

3.5:

T1 : q(z, y) :– q(z, z),

r2 : q(z, y) :– el(z,

e(z, y).

r is a child rule-node

y).

all children

?’3 : q(z, y) :– p(z, y).

then mark r and its children as relevant; if a goal-node g is state-equivalent

r4 : p(z, y) :– ez(z, y). r5 : p(z, y) :– q(z, y). The rule-goal the query

to a relevant

tree created

predicate

for this program

p is shown

this tree, all the nodes would 2, and therefore,

in Figure

be marked

all rules would

and 3.

And

indeed,

h

In The relevant rules are those appearing in rule-nodes that are marked as relevant;

in Step

be deemed rel-

all other

rules are irrelevant;

end.

had we stopped

expanding

the tree based only on equivalence

goal-nodes

(i.e., not expanded

have deduced that

goal-node

then mark g as relevant; until no new nodes are marked;

evant in Step 3. Notice that the node Q(Z, V) is equivalent to the node q(z, z), but they are not state-equivalent.

of g, and

of r are accessible

Figure

of

rules.

q(z, z)), we would

r~ is irrelevant,

since we could

71

2:

Algorithm

3. l—Finding

irrelevant

means

that

if the

algorithm

r to be irrelevant, r4

Completeness

f

it will

q(xw

e2(X,Y)

In

T&

el(XoY)

q(XZ)

an

e(T,Z)

p

Z)

I

3: The

we use symbolic trees

with

(derivation

say that

a symbolic is minimal

establish

correctness

several

apply

it to g(z,

y).

~

Lemma

3.6:

dancy

from

two

changes

the

definition

pler:

two

defined

If we change irrelevance

to

are needed

in Algorithm

Second,

r should

in the

be unified

the unification subgoals

we

with

equivalence

(i.e.,

tree

as the

to push

~.

the

same

advantage

algorithm

for detecting for the

trees (see Lemma

constraints

(see a discussion

fore,

topic

to en-

3.10 below).

+(t)

lemma

are marked

Step or

3.9:

show

soundness

and

completeness.

derivation symbolic

tree for derivation

shows that derivation be part

trees.

There-

of some minimal

siblings

and ancestors

in Step 3).

A goal-node

symbolic

all the goal-

g of To is marked

3.1 if and derivation

only

tree for

if there

in is a

g.

predi-

in the

of Algorithm

in

via the isomorphism

trees for the query predicate

2 of Algorithm

minimal Finally,

next

Correctness

the correctness

deriva-

in Step 2 of the algorithm

symbolic

the next trees

for

coded

in the portion

marked

as relevant.

3The isomorphism

To prove

symbolic

in To

and g2 be two goal-nodes

is a minimal

derivation

Lemma

derivation

of

termination

goal-nodes

symbolic

(and this is detected

I

Proof

Let gl

if the same holds for their

based on

EDB

branch

is based on state-equivalence).

those nodes will

symbolic

to use the rule-goal to the

of this

tree,

gz.3

have minimal

it

unreachabil-

computation,

then

nodes that

pred-

tree

3.8:

The following

to

that

the

are state-equivalent

tme for

(in the size

rule-goal

of a magic-set

nodes

of the rule-goal

state-equivalent

If t is a minimal

gl,

g whenever

even if one of the

have

if we want

basis

Lemma

1, a rule

two

(which

To that

are equiv-

of Step

define

time

in order

is important

3.1

are

a goal-node

has the

alone)

This

section),

sim-

predicate

we need state-equivalence

code all derivation

in order

loop

if they

to a polynomial

of the program

becomes if they

even

definition

ityy. However,

cates

same

3.1, we first

tion trees (up to an isomorphism).

to one of its ancestors).

could

be state-equivalent Thk

for

is possible

is identical

fact,

icate. leads

of the

of g. M

have the same set of minimal

First,

2.4),

no goal-node

by the algorithm.

It shows that

then

3.1.

of state-equivalence

goal-nodes

of redun-

unreachability,

to be state-equivalent

zdent.

In

the notion

t is minimal

tree

of Algorithm

3.8 justifies

condition Remark

2).

g of To, we

to g, and

properties

con-

to Definition

to an ancestor

To, constructed not

derivation (according

of

in Section

a goal-node

is identical

trees, i.e.,

instead

are defined

Given

for gift

dem”vation

variables

trees

3.7:

To prove

need for tags.

algorithm.

the rule-goal tree 3.1. In the follow-

derivation

oft

is irrelevant,

TO denotes

ing proofs

root

a rule

irrelevant.

if a rule

by Algorithm

of t is identical

e2(kZ)

that

constructed

the

determines

r is indeed

so by the

section,

Definition q(x,’1’)

7 r4

Figure

means

be deemed this

stants T-r el(X~)

then

3.1, we

of gl.

Soundness

variable

IL

We extend

lemma the

shows that query

of To that

are en-

consists

of nodes

~ is defined only

it to all variables

not in gl to a new distinct

all minimal

predicate

on the variables

of t by mapping variable.

each

Lemma

3.10:

Let To be the rule-goal

ated by Algorithm P),

derivation

tme for and

is represented

tree T

variables

of T.

the query pxdicate

and

an assignment

icate,

derivation

such that

the rule-goal

u to the

Claim

tree constructed

3.13

as relevant vant). g E T, node g is state-

Recall

pred-

that

To is

by Algorithm

3.1.

Let g be a goal-node

tion

r is a child

of TO. Suppose

of g and is marked

(and, hence, g is also marked

Then the~

is a minimal

as rele-

symbolic

tree for g in which r is a child

deriva-

rule-node

of

9.

3. For every node v c T (either a goal-node a rule-node), f(v) is marked as ndevant

or

The claim

in

vant,

4. For every rule-node

r c T, the node f(r)

is proved

be the children

To.

a rule-node

then there is an EDB and tree d for the query

d uses rule r.

that a rule-node = root(TO).

2. For every goal-node equivalent to f(g).

if the algo-

f, from

the nodes of T to the nodes of To, such that: 1. f(rooi(T))

we show that

deems r relevant,

a minimal

(and

by a symbolic

Then there is a mapping,

To prove this,

rithm

3.1. Suppose that d is a min-

imal den”vation pmgmm

Proof:

tme cre-

is

symbolic

labeled by the same rule as node

of r.

the children

accessible.

as follows.

Let nl,.

Since r is marked

nl, ...,

By Lemma derivation

nl must

... rq

as rele-

be marked

3.9, there

as

is a minimal

tree ti for n; (i = 1, ...,

/).

By a suitable renaming of variables, we can guarantee that every pair of trees tj and tj (1 < i