Constraints
and Lev#
Alon
Dept.
Dept.
Science
CA
and
pushing
database. are considered. eliminates
itg
participate
and predicates Redundancy but considers
of identical
ancestor
of the
is, derivation atoms,
other.
case of programs algorithms
with
not only
are given, constraint detect
presence of constraints,
do not
Under
certain
query
for
detect-
including
redundancies
as tightly
cursive
major rules
are
to
straints
order
removing
by Klug
closely redundant
[K188]
who
that are part and the issues and constraint
that
the two
of redundancy
are are
concept
and the second is a newly intro-
Grant number NCC 2-537. t Part of the work of this author was done while visiting IBM Almaden Research Center. Permission to copy without fee all or part of thie material is granted providad that the copias are not mada or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinary. To copy otherwisa, or to republish, raquires a faa and/or specific permission. 11th Principles of Database Systems/6/92/San Diego, CA a 1992 ACM ~.8979~-~20-8/92/0006 /QQ67...$J .5Q
not.
of
when constraints
67
types
was
equiva-
is the well known
which
based on the notion types
Con-
inequalities.
constraints
it is shown Two
of unmachabilitg,
Both
investigated
redundancy
The first
magic-set constraints
were investigated
(but not the data),
related.
these pa-
on constraints
queries with
In fact,
duced concept,
●This author was supported by NASA through NASA
to
and the rules,
work
are both
considered.
parts
started
of the
of the data
Earlier
lence of conjunctive
of re-
the
was investi-
to handle
in the query
as a part
pushing. optimization
is a
from
constraints
database
generalizations
[KKR90].
done
have Pushing
the
This paper deals with
for
[Ul189])
MF*90]—essentially in
appear
of interest
strategies
role.
query
that
Introduction
Two
constraints
the
of the program
1
(cf.
an important
in the
as possible.
of re-
dat alog programs. constants
play
pers proposed
are pushed to the EDB
under
[Sag88] is one example
recently,
transformation
in
to the
to the database.
These
discussed
query
of pushing
from
in
the
for minimization
parts from
gated in [BK*89,
the
from
transformation
example
More
but also push constraints
the paper, the constraints
magic-set
based on ir-
lit erals.
assumptions
redundant
only minimal
from the given query and rules to the EDB predicates.
moving prime
one is an
constants
The algorithm equivalence
The
trees having
ac.il)
uniform
for
such that
Algorithms
ing these redundancies
that
tree of a fact
predicate. trees, that
programs
based on machabil-
is similar,
derivation no pair
rules
in any derivation
the query relevance
Redundancy
Israel
(
[email protected].
Abstract in datalog
Science
University
Jerusalem,
94305
(
[email protected])
Two types of redundancies
Sagivt
of Computer Hebrew
University
Stanford,
in Datalog Yehoshua
of Computer
Stanford
Redundancy
we cdl
derivation
redundancy are present
One case, namely,
able rules when there
(and is
irrelevance
of minimal
trees).
are investigated and when they are
the problem
of unreach-
are no constraints
has an
easy solution [Ki88]. The other three cases are not as easy, and we solve them in this paper. The creating
algorithms a
we
rule-goal
redundancy-free
present tree
derivations
that that
are
based
represents can
on all
be con-
strutted
for
the
the rule-goal
query. between
state-equivalence
These concepts
vary
of redundancy
construction
2
of
by the concepts with
those
as we move from
to another,
or from
Preliminaries
of We discuss
nodes of the tree and
associated
constraint-labels
constraints
The
tree is guided
nodes.
(i.e., only
one type
one type
of
(EDB ways.
First,
can be used in two
these algorithms
discovering;
that
dant
rules
that
gram
without
lead to redundancy
is, the algorithms can be removed
changing
find from
the result.
algorithms
can be used to push constraints
database.
In this paper,
aspect of the results.
we emphasize
However,
Vardi
oft ree automata
dat slog rules.
on certain
tree automata. direct
tree automat First,
it
to better
into tree automat An explicit
to incorporate
magic-set
(at
is needed,
constraint
transformation.
however,
pushing
The
into
algorithms
consists of a set of ground
and an EDB,
applying
the We
one IDB
predicate,
tree
and the
is
query
tree
consisting
a first
insight
A recent
the
rule-node.
pre-
is identical
also be
The approach
was useful in getting
[C091]
characterized
to its parent
tigate
work).
and therefore, that
However,
do not fall into
the problems
the frameworkl
our results
a large
that
tree is for
a derivation
grams. moval
we inves-
(for
of
Note
that that
for that
derivation
is that
our problems
by a datalog
program,
involve but
the EDB. an IDB
la-
We say
predicate
q,
we define two types
may exist
in datalog
a rule is redundant
2.1:
the output
pro-
if its
re-
of the program
A rule r of a datalog
if there is no EDB,
pears in some derivation reason
from
all EDBs).
is unreachable
the facts derived
rule-node A rule-node
at the root of the tree.
does not change
Definition
work.
1The
mle-
atom,
tree are goal-nodes
atoms
icate. only
and
by a ground
for each one of its subgoals.
beled by ground
of redundancies
of [C091],
are not a corollary
a derivation
goal-node.
In the rest of this section,
class of decidable problems involving dat slog rules (the theory of tree automata is at the core of that
in a bottom-up
The head of an instantiated
if q is the predicate
into these problems. paper
of
gen-
and it has a single child, which is an instantiated
in
The leaves of a derivation
however,
facts
of goal-nodes
A goal node is labeled
(or
(or answer)
and an EDB,
has a child goal-node
of tree automata,
output
predicate
a program a
nodes.
as the query
is the set of all ground
for the
Given
predicate
trees that are used
evaluation.
eval-
the ground
no more new facts are generated.
and could
used for a magic-set
with
a
We continue
predicates.
sented in this paper construct rules,
Given
a bottom-up
we start
for finding
redundant
predicates.
the IDB
erated
rules.
the IDB An exten-
the rules to derive facts for
evaluation.
we consider,
while
EDB facts and apply
least
which
(EDB)
program
are two:
a reduction
The EDB predicates
for the EDB
the program
in
are the predicates
relations,
is one in which
goal)
a can only find redundant
construction
facts
however,
Second,
problems,
database
problems
do not involve
efficiency
if not in theory).
of the redundancy
order
(that
database
sional
distinguish
a), and the reasons for that
leads
practice
We prefer,
algorithms
which
are defined by the program.
rules until
we
are those
in heads of rules.
atomic datalog
prob-
The problems
predicates)
to the
uation
the theory
can also be solved as decision
to present
to the
our work.
[Va89] showed that
pred-
(IDB
predicates
the first
is a useful tool for solving
lems involving consider
from
appear
appearing
the
predicates
that
refer
we also indicate
how the second aspect follows Recently,
redun-
are allowed).
sets of predicates
extensional
which
symbols
only in bodies of rules, and the intentional icates
the pro-
Second,
predicates)
two
The
are collec-
no function
and variables
between
in a given program.
we present
which
progmms
constants
We distinguish
to another.
The algorithms
datalog
tions of safe Horn rules with
program
such that
~ ap-
tree for the query pred-
I
not
Finding
also the
cially
trees.
68
unreachable
rules is easy; it is espe-
easy if we assume that
the query predicate
depends
on all other
constants
in the
predicates
program
case,
a rule
head
has an umwachable
IDB
predicate
erated
for
and there
[Ki88],
is unreachable
In this
if either IDB
for each EDB
the
EDB
predicate,
body
or
And
an
if no fact has exactly
with
Definition
special
its
predicate.
q is unreachable
q when
are no
rl:
2.2:
p(z)
T2: p(z) q(z)
r3:
Consider
derivation
rl
:– p(y,z).
: p(z,y)
:– 7’(Z).
p, q and of which
Rule
is the
cannot
predicates,
and
T2 is unreachable
query
if q is the
the latter
However,
r are IDB
predicate.
algorithm).
predicate,
query
e
and
rl ity,
by the above
since
any
accounts
as shown 2.3:
only
in the next
Consider
for
is more
derivation
3
:– p(z).
r3 : q(x)
:– e(x),
(regardless
puts
of which
is the
query
finding
[Sar90,
Sh87].
under
cannot
evance
is redundant
predicate).
show
definitions
of redundancy,
called
on the notion notion
of
in
the
describe
irrelevance,
of minimal
redundancy
but
rules
2..4
above
which
is
one
tree is minimal
(or
non-redundant)
in the tree,
the other.
are no two identical
such that
2The converse of redundancies
is also true; considered
and
heads
columns
able
that
in this
node)
wit h g.
is, some redundancies
an AND
but not by the types
g in the are the
The
the unification.
paper.
69
tree
rule-goal
The
root
The heads
of
vari-
children
referred
subgoals
to
of an
a distinct
of a rule-node
The
how
consisting
(also
the
distinct
by construct-
query.
having
dis-
4.3).
with
are
no coni.e.,
have
begins
position.
rules
children
node)
irrelevance.
we explain
the
lpredicate
in each argument
a descrip-
have
of a rule
tree is a goal-node
of the query
a goal-node
of
for
If
irrel-
modification
(see Remark
tree
q.
remains
with
rules
section
the algorithm
rule-goal
to
are rectified,
of the head
constraint
the
that
of rules
relax
this
this
begin
out-
from
for determining
In the next
as input
algorithm
discuss
we assume
Informally,
goal-
one is an ancestor
equivalence
does
q, and
are irrelevant
We
variables.
atom
predicate
the
We
of the algorithm
OR
by uniform
in
for deciding
accepts
of redundancy
section.
clarity,
ing
S
can be shown
same.
the
the rule-goal
if there
notion
For
trees. the
It
a query
tion
tinct
a new
derivation subsumes
A derivation
appear
irrelevance
an algorithm
of ‘P that
the
the in
stants
on unreachability.
Definition nodes
next
is redun-
[Sag88],
redundancies
unreachabil-
Rules
of irrelevance.
to unreachability,
almost
H
is undecid-
notion
equivalence
the
The
rules
A narrower
uniform
examples.2
based
redundant
pred-
predicate
does not
However,
describes P and
all
later Generally,
than
rule
Irrelevant
property
we change but
general
tree.
section
T2 : q(z)
r2 is reachable,
query
all redundancies.
Finding
a program
program,
if the
if the query
program:
:– q(x).
?’1 : p(z)
is relevant
some
example.
the following
rl
is irrelevant
an unreachable
capture
This
This
predicate.
I
the
based
and rule
z),
example,
Irrelevance
(note
not
unreachability
Example
type
I
regardless
predicate,
query
in any
:– e(~,~). :– p(z,
is p, but
ispl.
re-
predicate
be discovered
icate
I
redundancies,
it
r2 is irrelevant
if q is the
rl
used
predicate.
:– e(z).
gardless
dancy
query
program:
rs :pl(z)
is unreachable
able
2.3, rule
in a given
r is never
for the
2.6:
T2 : P(Z,~)
that
In this
IEDBs,
Example
:– q(z).
is an EDB
that
r is irrelevant
q or p is the query
is irrelevant
In this Note
all
In example
1’s as all its argu-
the following
if for
minimal
of whether
one fact
A rule
program
is gen-
ments. Example
2.5:
of
to as an that
unify
(also
called
resulting
from
tree can be viewed
as encoding
all the possible
facts of q. However, sive rules,
derivation
the construction
can go on forever. in designing
Therefore,
when to stop expanding
Example
a node-tag which
the main difficulty
illustrates
lowing
of the tree.
a goal-node
q(z, y) :– e(z, t),
q(t, t),
the
e(t, y).
rs : q(z, y) :– p(z, y).
two
nodes
have
the
appearing
same
in
by the following
gl
Two
on the
goal-nodes
such that
set
definition.
and g2, are said
V(g2),
posi-
variable.
is induced
there exists a one-to-one onto
We
if in each argument
relation
3.2:
predicate,
:– q(z, 2).
a given rule-goal
and query predicate.
the set of variables
equivalence
Definition
rA : p(x, y) :– q(z, y).
of
In the fol-
g. Two nodes of the same predicate
of goal-nodes
rz : q(z, y) :– e(z, y).
below.
we consider
are said to be identical An
B
is the concept
we introduce
definitions,
denote by V(g)
this difficulty.
3.1:
rs : ql(z)
never be usable.
tree for a given program
tion, ?’1 :
r.4 will
that
The key to this observation
tree
arises in the decision the branches
example
concluding
has recur-
of the rule-goal
the algorithm
The following
trees for
when the program
of
the
same
if
to be equivalent
mapping,
#, from V(gl)
@(gl ) = g2. The mapping
+ is called an isornorphism.
9
(1) ql(x)
For example, equivalent,
nodes 2,4, and 10 in Figure
1 are
but nodes 3 and 8 are not equivalent.
r
Definition
[ (2) q(%x)
3.3:
The
tag of a goal-node
noted by Z’(g), includes goal-nodes r
T
r
that
have only variables
from V(g).
A ‘8LL’)AX1:[(L’) Intuitively,
(4) ~(~t)
(3) ax,t)
the tag
goal-nodes
(7) p(x,x)
(6) e(x,x)
(sjti~x)
that
g, de-
itself and all its ancestor
of a node
should the
not
of g in order
for
only
minimal
derivations.
tain
variables
not in V(g)
g contains
appear
rule-goal
again
tree
to encode
Ancestors
that
(9) e@)
(10) q(u,u)
Definition Figure
1: A rule-goal
tree.
gram
rule-goal is shown
the head not duce tors yield
tree in Figure
of r4 unifies
expand
them
subgoals of those
constructed
that nodes,
a minimal
the construction
with
further,
for
1. Notice since
derivation
therefore,
tree.
pro-
12, we do
would
tag of
only itself.
equivalent
@ is the of gl and
In Figure
pro-
to some
ances-
would
never
However,
but
goal-nodes,
gl
if @(l’(gl))
isomorphlsm
showing
= the
g2. 1
since
show that
rithm
4 and
of them
is tag-equivalent
they
algorithm
a branch we
70
works
the
describe
state-equivalence
for
in
10 are tag-equivalent, to node
2
are all equivalent). uses
the
to determine
equivalence ing
1, nodes
though
The
which
10 is sufficient
neither
(even
there might be a point in which r4 can be used in some minimal derivation. Fortunately, we can in node
where
equivalence
of the tree could go on forever,
stopping
Two
The
not
although
7 and that
are identical and
this
that
nodes
3.4:
of g.
will
nodes 1 and 2, while
and g2, are said to be tag-equivalent
T(g2), The
1 includes
the tag of node 4 includes
(11) e(u,t)
con-
need not be included
in any subtree
node 2 in Figure
all
in subtrees
in the tag of g, because these variables appear
9
condition
when In
tree.
below,
it
steps.
Step
state-
expand-
Algorithm
suffices
as tag-equivalence. in three
of
to stop
to The
1 expands
3.1, define algothe
rule-goal ing
tree for the query
with
q that
a goal-node
consisting
has a distinct
position.
branch
Step
predicate
variable
of an
1 terminates
to another
the
goal-node
the
a goal-node
that
that
unification
is identical that are with
heads
a rule
~ cannot
sible
from
ion.
Step
fashion, root
the
EDB
via nodes Steps
marked
some
that
as well.
The
that
is marked
following
example
(rather
than
such as equivalence to assure
g
then
/* Step 2: Bottom-up
the
Mark
of r identical
subgoal
g or an ancestor
of g
and
are
in
3 as rel-
2.
g
is accessible
then mark g as accessible; if a goal-node g is stat~equivalent
terminat-
to an accessible goal-node
based on state3.2))
marking*/
nodes in To as accessible;
of a rule+node r accessible then mark r ss accessible; if at least one child of a goal-node
to it are
a less refined
all EDB
repeat if all children
is marked,
if it appears
of the
any
to
make rule r a child of g;
in Step 2. In
in Step
(Definition
g in TO,
perform the unification
fash-
from
shows that
correctness
making to either
are acces-
is shown in Figure
of a branch
goal-node
a top-down
a goal-node
is relevant
algorithm
ing the expansion
needed
in
in each column;
such that g is not state-equivalent any expanded goal-node in To do for each rule r c P do if rule r unifies with g without
there
in a bottom-up
variable
there is an unexpanded
ancestors.
are state-equivalent
The full
equivalence
g or its
as accessible
A rule
rule-node
and
are reachable
2 and 3, when
all goal-nodes
evant.
that
a distinct
while
which
a goal-node
as relevant,
nodes
with
r if
of g. Note
of the tree that
nodes
3 marks
the
marked
change
the nodes
gl
are rectified
q)
begin /* Step 1: Constructing the rule-goal tree */ Let To be a tree consisting of a goal-node for q
Step 1 will
a rule-node
of
irrelevant-rules(~,
of a
is state-
a subgoal
unification
procedure
is already
g or an ancestor
of rules
constants,
Step 2 marks
both
produce
to either
since no
will
g with
of
argument
expansion
in the tree and has been expanded. not expand
atom
in each
when it reaches a goal-node
equivalent
q, start-
h
then mark g as accessible; until no new nodes are marked;
notion, is indeed
algorithm.
I* Step
3: Top-down
marking
*I
if the root of To is accessible Example
then mark it as relevanfi repeat if g is a relevant goal-node,
3.5:
T1 : q(z, y) :– q(z, z),
r2 : q(z, y) :– el(z,
e(z, y).
r is a child rule-node
y).
all children
?’3 : q(z, y) :– p(z, y).
then mark r and its children as relevant; if a goal-node g is state-equivalent
r4 : p(z, y) :– ez(z, y). r5 : p(z, y) :– q(z, y). The rule-goal the query
to a relevant
tree created
predicate
for this program
p is shown
this tree, all the nodes would 2, and therefore,
in Figure
be marked
all rules would
and 3.
And
indeed,
h
In The relevant rules are those appearing in rule-nodes that are marked as relevant;
in Step
be deemed rel-
all other
rules are irrelevant;
end.
had we stopped
expanding
the tree based only on equivalence
goal-nodes
(i.e., not expanded
have deduced that
goal-node
then mark g as relevant; until no new nodes are marked;
evant in Step 3. Notice that the node Q(Z, V) is equivalent to the node q(z, z), but they are not state-equivalent.
of g, and
of r are accessible
Figure
of
rules.
q(z, z)), we would
r~ is irrelevant,
since we could
71
2:
Algorithm
3. l—Finding
irrelevant
means
that
if the
algorithm
r to be irrelevant, r4
Completeness
f
it will
q(xw
e2(X,Y)
In
T&
el(XoY)
q(XZ)
an
e(T,Z)
p
Z)
I
3: The
we use symbolic trees
with
(derivation
say that
a symbolic is minimal
establish
correctness
several
apply
it to g(z,
y).
~
Lemma
3.6:
dancy
from
two
changes
the
definition
pler:
two
defined
If we change irrelevance
to
are needed
in Algorithm
Second,
r should
in the
be unified
the unification subgoals
we
with
equivalence
(i.e.,
tree
as the
to push
~.
the
same
advantage
algorithm
for detecting for the
trees (see Lemma
constraints
(see a discussion
fore,
topic
to en-
3.10 below).
+(t)
lemma
are marked
Step or
3.9:
show
soundness
and
completeness.
derivation symbolic
tree for derivation
shows that derivation be part
trees.
There-
of some minimal
siblings
and ancestors
in Step 3).
A goal-node
symbolic
all the goal-
g of To is marked
3.1 if and derivation
only
tree for
if there
in is a
g.
predi-
in the
of Algorithm
in
via the isomorphism
trees for the query predicate
2 of Algorithm
minimal Finally,
next
Correctness
the correctness
deriva-
in Step 2 of the algorithm
symbolic
the next trees
for
coded
in the portion
marked
as relevant.
3The isomorphism
To prove
symbolic
in To
and g2 be two goal-nodes
is a minimal
derivation
Lemma
derivation
of
termination
goal-nodes
symbolic
(and this is detected
I
Proof
Let gl
if the same holds for their
based on
EDB
branch
is based on state-equivalence).
those nodes will
symbolic
to use the rule-goal to the
of this
tree,
gz.3
have minimal
it
unreachabil-
computation,
then
nodes that
pred-
tree
3.8:
The following
to
that
the
are state-equivalent
tme for
(in the size
rule-goal
of a magic-set
nodes
of the rule-goal
state-equivalent
If t is a minimal
gl,
g whenever
even if one of the
have
if we want
basis
Lemma
1, a rule
two
(which
To that
are equiv-
of Step
define
time
in order
is important
3.1
are
a goal-node
has the
alone)
This
section),
sim-
predicate
we need state-equivalence
code all derivation
in order
loop
if they
to a polynomial
of the program
becomes if they
even
definition
ityy. However,
cates
same
3.1, we first
tion trees (up to an isomorphism).
to one of its ancestors).
could
be state-equivalent Thk
for
is possible
is identical
fact,
icate. leads
of the
of g. M
have the same set of minimal
First,
2.4),
no goal-node
by the algorithm.
It shows that
then
3.1.
of state-equivalence
goal-nodes
of redun-
unreachability,
to be state-equivalent
zdent.
In
the notion
t is minimal
tree
of Algorithm
3.8 justifies
condition Remark
2).
g of To, we
to g, and
properties
con-
to Definition
to an ancestor
To, constructed not
derivation (according
of
in Section
a goal-node
is identical
trees, i.e.,
instead
are defined
Given
for gift
dem”vation
variables
trees
3.7:
To prove
need for tags.
algorithm.
the rule-goal tree 3.1. In the follow-
derivation
oft
is irrelevant,
TO denotes
ing proofs
root
a rule
irrelevant.
if a rule
by Algorithm
of t is identical
e2(kZ)
that
constructed
the
determines
r is indeed
so by the
section,
Definition q(x,’1’)
7 r4
Figure
means
be deemed this
stants T-r el(X~)
then
3.1, we
of gl.
Soundness
variable
IL
We extend
lemma the
shows that query
of To that
are en-
consists
of nodes
~ is defined only
it to all variables
not in gl to a new distinct
all minimal
predicate
on the variables
of t by mapping variable.
each
Lemma
3.10:
Let To be the rule-goal
ated by Algorithm P),
derivation
tme for and
is represented
tree T
variables
of T.
the query pxdicate
and
an assignment
icate,
derivation
such that
the rule-goal
u to the
Claim
tree constructed
3.13
as relevant vant). g E T, node g is state-
Recall
pred-
that
To is
by Algorithm
3.1.
Let g be a goal-node
tion
r is a child
of TO. Suppose
of g and is marked
(and, hence, g is also marked
Then the~
is a minimal
as rele-
symbolic
tree for g in which r is a child
deriva-
rule-node
of
9.
3. For every node v c T (either a goal-node a rule-node), f(v) is marked as ndevant
or
The claim
in
vant,
4. For every rule-node
r c T, the node f(r)
is proved
be the children
To.
a rule-node
then there is an EDB and tree d for the query
d uses rule r.
that a rule-node = root(TO).
2. For every goal-node equivalent to f(g).
if the algo-
f, from
the nodes of T to the nodes of To, such that: 1. f(rooi(T))
we show that
deems r relevant,
a minimal
(and
by a symbolic
Then there is a mapping,
To prove this,
rithm
3.1. Suppose that d is a min-
imal den”vation pmgmm
Proof:
tme cre-
is
symbolic
labeled by the same rule as node
of r.
the children
accessible.
as follows.
Let nl,.
Since r is marked
nl, ...,
By Lemma derivation
nl must
... rq
as rele-
be marked
3.9, there
as
is a minimal
tree ti for n; (i = 1, ...,
/).
By a suitable renaming of variables, we can guarantee that every pair of trees tj and tj (1 < i