Keys with Upward Wildcards for XML - AMiner

Report 1 Downloads 33 Views
Keys with Upward Wildcards for XML Wenfei Fan, Peter Schwenzer, Kun Wu Temple University

ffan, schwenp, [email protected]

Abstract

in the relation. Along the same lines, to de ne a key for XML data we specify a pair (Q; S ), where Q is a path expression that identi es a set of XML elements on which the key is de ned, denoted by [ Q] and called the target set ; and S is a set of path expressions that provides identi cation for elements in [ Q] , called the key paths . Since XML data is typically modeled as a tree, [ Q] represents a set of nodes in an XML document tree, and S speci es another set of nodes whose values identify nodes in [ Q] . As observed by [7], two forms of keys are particularly important for hierarchically structured data, such as XML documents and scienti c databases. The rst is absolute key . Similar to relational database keys, an absolute key identi es a unique node x in [ Q] with the values of nodes in the subtree rooted at x. In other words, S speci es components of an element in [ Q] . The second is relative key , which is analogous to a key for a weak entity in a relational database. Recall that a key of a weak entity is composed of a key of its \parent" entity and some additional identi cation [25]. In XML data, a relative key uniquely identi es a node x in [ Q] with two parts: (values of) nodes in the subtrees of some \ancestors" of x, and (values of) nodes in the subtree of x that are additional identi cation relative to the ancestors. In hierarchically structured data, relative keys are commonly used. Example. To illustrate absolute and relative keys, let us consider an XML document that contains information about the recent presidential election in the U.S. The document, represented as a tree in Figure 1, describes states won by republican and democratic candidates, respectively. A state has a number of counties, and a county in turn has a number of cities. One wants to de ne keys for state, county and city elements. A key of state is speci ed with the name of state: ( :state; name ): Here is a combination of a (downward) wildcard ` ' and the Kleene star ` ', which matches any path and

The paper proposes a key constraint language for XML and investigates its associated decision problems. The language is de ned in terms of regular path expressions extended with downward and upward wildcards, which can not only move down XML document trees, but also upwards as well. In a uniform syntax it is capable of expressing both absolute keys and relative keys, which are important for hierarchically structured data, including but not limited to XML documents. In addition to being expressive, keys de ned in the language can be reasoned about eciently. In particular, the paper provides a sound and complete set of inference rules and a cubic time algorithm for determining ( nite) implication of these key constraints.





1 Introduction XML (eXtensible Markup Language [6]) has become the prime standard for data exchange on the Web. XML data typically originates in databases, in which keys are commonly used to convey an essential part of the semantics of the data. In order to use XML to represent data currently residing in databases, we must be able to express the original semantics of the data, in particular, keys. However, the key speci cation supported by the XML standard [6] is too weak to fully express relational database keys or keys for hierarchically structured data. The absence of a robust key speci cation for XML has limited its use as a data integration tool, among other things. This demonstrates the need to provide a better key speci cation language for XML: for specifying the semantics of the data, preserving the original information in data exchange, preventing update anomalies, indexing and archiving the data, and for formulating and optimizing queries. In relational databases, to de ne a key we specify the name of a relation (a set of tuples) and a set of attributes of the relation that uniquely identi es tuples  Supported



f



by RIF fund, Temple University.



1

g

r republican

democratic

... state

...

state

state

... county

state

...

county name

name

county

county

... city name

city votes

... name

’NC’

name

’NJ’

name ’Camden’

city

city

’Camden’ name

’Camden’ ‘10K’ "Pine View"

name

votes

‘Cherry Hill’ ’Camden’‘10K’

Figure 1: An XML tree thus allows one to move down the XML document tree to an arbitrary depth. The path expression :state identi es the target set, which consists of all state nodes in the tree no matter where they occur. Of these state nodes, name is de ned to be a key. It asserts that two distinct state nodes cannot have name subelements with the same (string) value. In other words, the value of a name subelement uniquely identi es a state node. This is an absolute key since a name subelement is in the subtree rooted at a state node. Observe that two di erent notions of equality are used: we assume value equality when comparing the values of nodes reached by following the key path name, and node identity when comparing state nodes in the target set. The equality issue is important for XML keys since XML trees do not allow sharing of nodes. In contrast, only value equality is needed when de ning a relational database key. When it comes to a key for county nodes, the story is more complicated. The name of a county uniquely identi es the county, but only within the state in which the county is located. For example, both NJ (New Jersey) and NC (North Carolina) have a unique county named `Camden', but with only the county name one cannot distinguish between the two di erent counties in the two states. In other words, name of county is a key of county nodes relative to state. To uniquely identify a county in the document, one needs not only name of county, but also name of state that the county belongs to. To capture this semantics, we specify a relative key : ( :state:county; name; ucp:name ): Here ucp is the upward wildcard symbol that means \moving up to the parent node" in an XML tree. We call the constraint a relative key because it needs name

of the state, which is at a level higher than county nodes, as opposed to an absolute key speci cation. Similarly, to uniquely identify city nodes we need to specify a relative key. Indeed, the `Camden' counties in NJ and NC both have a city also named by `Camden'. The relative key consists of name of city, name of county in which the city is, and name of state where the county is located. That is:





f

( :state:county:city; name; ucp:name; ucp:ucp:name ): 

f

g

The constraints given above are keys de ned in our key constraint language for XML, denoted by . There are two important technical questions in connection with keys for XML. One concerns ( nite) satis ability of keys: given a nite set of keys, is there a ( nite) XML document that satis es these keys? In other words, are these keys meaningful? The second question concerns implication of keys: suppose it is known that an XML document satis es a nite set of keys, does it follow that the document must also satisfy some other key? It is called the nite implication problem if only nite XML documents are considered. These are the classical decision problems studied in relational dependency theory [2]. They are equally important for XML data. For example, in data integration, it is increasingly common to use XML as a uniform data format for mediators [5, 24]. One may want to know whether a key ' holds in a mediator interface. However, this cannot be veri ed directly since the mediator interface does not contain data. One way to verify ' is to show that it is implied by keys that are known to hold [19]. These decision problems have not received the attention in connection with XML data that they deserve. K

g

2

Contributions. The main contributions of the paper

as the authors are aware, unresolved; and these issues are important if we want to reason about keys as we do in relational databases. Because of this, XML Schema de nes a key with a list (not a set) of paths. While this avoids issues of equivalence of XPath expressions, one can construct keys that are, presumably, equivalent, but have di erent or even anomalous presentations. For example (using [ ... ] for list), (state; [name]) and (state; [name; name]) are essentially the same constraint. Until we know how to determine the equivalence of XPath expressions, there is no general method of determining if two such speci cations are equivalent. Another technical issue is value equality. XML Schema restricts equality to text, but in many situations keys are not so restricted. Following [7], we will provide a more general de nition of value equality in Section 2. Closer to our work is the key speci cation recently proposed by [7]. In that paper, a general de nition of value equality and the notions of absolute and relative keys were introduced. However, [7] did not consider the upward wildcard. It uses di erent syntactic forms to describe absolute and relative keys. In the syntax of [7], to identify an element in an XML document, one needs to specify a chain of (relative) keys. In contrast, in our key language , a single key suces for this purpose. In addition, decision problems associated with relative keys of [7] have not been studied, while these are the central technical problems investigated in this paper. The ( nite) implication problems for keys and the more general functional dependencies have been well studied for relational databases (see, e.g., [2, 25]). For XML, these problems have been investigated in [8, 18, 17, 16]. [8] studied ( nite) implication of absolute keys in the absence of the upward wildcard. It stopped short of addressing relative keys. [18, 17, 16] investigated the ( nite) implication problems associated with a class of simple keys (and foreign keys) in the presence and absence of DTDs. The key constraints considered in those papers are de ned in terms of XML attributes and are not as expressive as keys studied in this paper. We do not consider foreign keys and DTDs here. Constraints de ned in terms of navigation paths have been studied for semistructured [1] and XML data in [3, 9, 10, 11, 12]. These constraints are generalizations of inclusion dependencies commonly found in relational databases, and are not capable of expressing keys. Generalizations of functional dependencies have also been studied [20, 23, 28]. But these generalizations were investigated in database settings, which are quite di erent from the tree model for XML data considered in this paper.

are the following:

1. We propose a key constraint language for XML. The language is de ned in terms of regular path expressions extended with two forms of wildcards: a wildcard ` ' that can move down XML trees as commonly used in XML query languages, and an upward wildcard ucp that moves up XML trees. With these wildcards, is capable of expressing both absolute and relative keys uniformly, using the same syntax. The key speci cation is independent of any DTD and other schema speci cations. 2. We show that keys de ned in the language can be reasoned about eciently. More speci cally, we demonstrate that any keys of are always nitely satis able, and for ( nite) implication of keys, we provide a sound and complete set of inference rules and a cubic time algorithm. K

K

K

K

Related work. Key speci cations for XML have been proposed in the XML standard (DTD) [6], XML Schema [27] and in a recent proposal [7]. One can view ID attributes in a DTD as keys. This key speci cation is rather weak. First, IDs can only be speci ed in DTDs. They do not help documents without DTDs. Second, ID attributes are not \scoped". In contrast to relational database keys, they are unique within the entire document rather than among a \class" of elements. As a result, one cannot, for example, allow that a student (element) and a person (element) use the same SSN as an ID. Third, \keys" can only be de ned with XML attributes. In practice, it is common that one wants to de ne keys in terms of subelements or even with path expressions. For example, it is natural to assume that SSN is a key of a person, no matter where SSN occurs in the XML subtree rooted at the person. Fourth, with ID attributes one can only specify unary keys, i.e., keys de ned in terms of a single ID attribute. In many situations one wants \compound" keys, e.g., a key of enrollment elements is composed of a student id and a course id, indicating that the student is taking the course. Finally, at most one ID attribute can be speci ed for each element type, while in practice, one may want to have more than one key, e.g., both SSN and driver license number as keys of person elements. Our key speci cation overcomes these limitations. XML Schema speci es keys in terms of XPath [14] expressions. XPath provides a \parent" function that is similar to our upward wildcard. However, the equivalence and containment of XPath expressions are, as far

K

3

Organization. The rest of the paper is organized as

v. In either case we say that there is a parent-child edge from v to v0 . The subelements and attributes of v are called the children of v. An XML tree has a tree structure, i.e., for each node v V , there is a unique path of parent-child edges from the root r to v. An XML tree T is said to be nite if V is nite. Intuitively, V is the set of vertices of the tree T . Nodes in V can be classi ed into three types: text

follows. Section 2 de nes XML trees, value equality, extended regular path expressions, and the key constraint language for XML. Section 3 investigates containment of extended regular path expressions, and studies ( nite) satis ability and ( nite) implication of keys in . Finally, Section 4 identi es directions for further research. Proofs of the results established in the paper are given in the appendix. K

2

K

nodes, attribute nodes, and element nodes. Text nodes have no name but carry text, attribute nodes both have a name and carry text, and element nodes have a name but do not carry text. More speci cally, if a node x is an element node, i.e., a node labeled with an element tag in E, then the functions ele and att de ne the children of x, which are partitioned into subelements and attributes . The subelements of node x are ordered. In contrast, the attributes of node x are unordered and are identi ed by their labels (names). The function val assigns string values to attribute and text nodes, i.e., nodes labeled with attributes in A or label S. Observe that T has a tree structure. As a result, sharing of nodes is not allowed in T . For example, Figure 1 depicts an XML tree. There is an one-to-one mapping between XML trees and XML documents (when the order of attributes is ignored).

2 Regular path expressions and key constraints In this section, we rst present the notions of XML trees and value equality on XML trees. We then introduce two classes of extended regular path expressions. Finally, we de ne the key constraint language for XML in terms of these two classes of path expressions. K

2.1 XML trees

A Tree Model. Along the same lines of XML APIs

(DOM [4]), query languages (XSL [13, 29], XQL [26]) and schema speci cations (XML Schema [27], XPath [14]), we de ne an XML document tree as follows. Assume a countably in nite set E of element labels (tags), a countably in nite set A of attribute names, and a symbol S indicating text (e.g., PCDATA in XML [6]). Assume that E, A and S are pairwise disjoint. De nition 2.1: An XML (document) tree T is de ned to be (V; lab; ele; att; val; r), where V is a set of vertices (nodes ); lab is a function from V to E A S ; ele is a partial function from V to sequences of V vertices such that for any v V , if ele(v) is de ned then lab(v) E; att is a partial function from V A to V such that for any v V and l A, if att(v; l) = v0 then lab(v) E and lab(v0 ) = l; val is a partial function from V to string values such that for any node v V , val(v) is a string i either lab(v) = S or lab(v) A; r is a distinguished vertex in V and is called the root of T . Without loss of generality, assume lab(r) = r, and that there is a unique node in T labeled r. For any node v V , if ele(v) is de ned then the nodes in ele(v) are called the subelements of v. For any attribute l A, if att(v; l) = v0 then v0 is called an attribute of

Value equality. The notion of equality is central to a de nition of keys. In relational databases, this is not a problem: one needs only to compare values of atomic types, e.g., integer, real and string values, when checking the satisfaction of a key. An XML tree has a hierarchical structure and it is no longer trivial to compare the values of two XML trees (subtrees). Intuitively, we need a de nition of equality on the values of XML trees such that if two trees are value equal, then the two XML documents represented by the two trees are the same. We next present our de nition of value equality. Let T = (V; lab; ele; att; val; r) be an XML tree, and v; v0 be two nodes in V . Informally, v; v0 are value equal if they have the same tag (label) and in addition, either they have the same (string) value (when they are text or attribute nodes) or their children are pairwise value equal (when they are element nodes). Formally, v and v0 are value equal , denoted by v =v v0 , i the following conditions are satis ed:

f g





[

[ f g



2

2





2

2

2



2

2





2



2

4

lab(v) = lab(v0 ); if lab(v) = S or lab(v) A, then val(v) = val(v0 ), 2

i.e., they have the same string value;

if lab(v) E, then { for any l A, att(v; l) is de ned i att(v0; l) is de ned, and val(att(v; l)) = val(att(v0 ; l)), i.e., they have the same attributes and their attributes pairwise have the same string values; { if ele(v) = [v1 ; :::; vk ], then ele(v0) = [v10 ; :::; vk0 ] and for all i [1; k], vi =v vi0 , i.e., their children are pairwise value equal. For example, referring to Figure 1, the leftmost city node (Camden in NC) and the rightmost city node (Camden in NJ) are value equal because they have the same tag (city) and all their children (name, votes) are pairwise value equal. 

closure that represents any path. A path expression Q of TL de nes a regular language that is a set of paths. We use Q to denote the language de ned by Q, and use  Q to denote that path  is in the language de ned by Q. For example, :state and :state:county are path expressions in TL. The wildcard \ " is commonly used in queries on semistructured and XML data [1]. It allows one to move down an XML tree. More speci cally, let T be an XML tree, and v1 ; v2 be nodes in T . We write T = (v1 ; v2 ) if v2 is a child of v1 , i.e., either a subelement or an attribute of v1 , no matter what its node label is. Intuitively, it indicates moving from a parent node to a child node. Similarly, we write T = (v1 ; v2 ) if there is a (possibly empty) parent-child path  from v1 to v2 , no matter what the path  is. In general, given any path expression Q in TL, we say v2 is reachable by following Q from v1 , denoted by T = Q(v1 ; v2 ), i there exists a path  Q such that T = (v1 ; v2 ). In particular, we use [ Q] to denote the set of nodes in T that is reachable by following Q from the root r, i.e.,

2

2

2



2

j

As mentioned in Section 1, a key constraint ' of is speci ed with a pair (Q; S ), where Q is a regular path expression called the target path of ', and S is a set of path expressions called the key paths of '. The target path identi es a set of nodes in an XML tree on which the key is de ned, denoted by [ Q] . The key paths provide identi cation for nodes in [ Q] , similar to the key attributes in a key in a relational database. Before we formally give the de nition of constraints, we rst present two classes of regular path languages, denoted by TL and KL, for specifying the target paths and key paths, respectively. Paths. A (simple) path is a sequence of node labels, syntactically de ned as follows:  ::=  l: where  is the empty path, node label l E A S, \." is the concatenation operator. Intuitively, a path represents the sequence of node labels in a parent-child path in an XML tree. More precisely, let T be an XML tree, v1 ; v2 be nodes in T and  be a path. We say that there is a path  from v1 to v2 , denoted by T = (v1 ; v2 ), if there is a parent-child path from v1 to v2 whose sequence of node labels is . A path expression speci es a (possibly in nite) set of paths, de ned as follows. K

j

2

f

[

T = Q(r; v) : j

g

Q1 : :Q2 = Q1: : :Q2 : 





That is, the languages de ned by the two path expressions denote the same set of paths. Thus without loss of generality, we assume that a TL expression does not contain consecutive 's, i.e., it does not contain : . 





The path language KL. The language KL is used to express key paths. A path expression P of KL has the form %:, where % is a (possibly empty) sequence of ucp symbols and  is a nonempty (simple) path. Here ucp is the upward wildcard that matches any node label upwards. We call % the upward pre x of P and denote it by P u . For example, ucp:name and ucp:ucp:name are path expressions in KL. Intuitively, in an XML tree T , the upward wildcard ucp indicates moving from a child node up to its parent. More speci cally, let v1 and v2 be nodes in T . We write T = ucp(v2 ; v1 ) i v2 is a child of v1 , i.e., either a subelement or an attribute of v1 , no matter what its node label is. In general, given any path expression P = %: in KL with % being the upward pre x of P , we say v1 is reachable by following P from v2 , denoted by T = P (v2 ; v1 ), i there is a node v in T such that v is

[

j

The path language TL. This language will be used to

specify target paths in constraints. A path expression of TL is syntactically de ned as follows: Q ::=  l Q:Q where \ " is the wildcard symbol that matches any label and \ *" is the combination of the wildcard and Kleene K

j

j

For any TL expressions Q1 ; Q2 , it is easy to verify

j

2

j

[ Q] = v

K

j



j

2.2 Extended regular path expressions

j



j



j

5

De nition 2.2: A key constraint ' for XML is an

reached by moving up k levels from v2 and T = (v; v1 ), where k is the number of ucp symbols in %. Observe that because T is a tree, there exists at most one v reachable by moving up k levels from v2 . Let n be a node in an XML tree T and P be a path expression of KL. We use n[ P ] to denote the set of nodes reachable from n by following P , i.e., n[ P ] = v T = P (n; v) : j

f

j

j

expression of the form

(Q; P1 ; : : : ; Pk ); f

where Q is a path expression in TL and is called the target path of '; P1 , ..., Pk are path expressions in KL and are called the key paths of ', with k 1. In addition, Q has a sux Qs such that Qs is a path (i.e., it does Piu , not contain ` ') and for each i [1; k], Qs u where Pi is the upward pre x of Pi . An XML tree T satis es ', denoted by T = ', i for any n1 ; n2 in [ Q] , if for all i [1; k] there are x n1 [ Pi ] and y n2 [ Pi ] such that x =v y, then n1 = n2 , i.e., 

g



It should be noted that (simple) paths are path expressions of both TL and KL.

2

j j

j

j

8



j





 





j

j

j

j

j

j

j

2

9

2

n1 = n2 ): g

f

g

f

g

f

g

The rst two are absolute keys, and the last one is a relative key. The rst key in fact describes the semantics of ID attributes in DTDs: for any node, no matter where it occurs in a document, if it has an ID, then the ID value uniquely identi es the node in the whole document. The second key states that name is a key for the collection of all book nodes in a document. The third key states that number is a key for chapter, but only within the book that contains the chapter. To uniquely identify a chapter in the document, one needs the number of the chapter as well as the name of the book that contains the chapter. It should be noted that two notions of equality is used to de ne keys: value equality (=v ) when comparing nodes reached by following key paths, and node identify (=) when comparing two nodes in the target set. This is di erent from keys in relational databases, in which only value equality is needed.

j

j

x n1 [ Pi ] y n2 [ Pi ] (x =v y)

( ; ID ); ( :book; name ); ( :book:chapter; number; ucp:name ):

Q1 :Q:Q2 Q1 : :Q2 : . For example, :state Let Q and P be path expressions in TL and KL, respectively. For any XML tree T and any nodes x; y in T , we write T = Q:P (x; y) i there exists a node v such that T = Q(x; v) and T = P (v; y). A special case is when Q = Q0 :Qs , Qs is a path, and Qs = P u , i.e., Q has a path sux Qs , P = %: with % being the upward pre x of P , and Qs = % . In this case we have that if T = Q:P (x; y) then T = Q0 :(x; y). 

9

f

j



1ik

absolute key if for each i 2 [1; k], the upward pre x of Pi is empty, i.e., jPiu j = 0 (Pi does not contain an upward wildcard). Otherwise it is called a relative key . For example, the three constraints we have seen in Section 1 are keys of K. More speci cally, the key for state nodes is an absolute key, and the other two are relative keys. Other examples of K constraints include:





^

2

De nition 2.3: A key (Q; P1; : : : ; Pk ) is called an



j

j

K





j  j

The set of all key constraints is denoted by .





2

!

j

j

n1 n2 [ Q] (

j

j j

j

j

2

nection with path expressions of TL and KL. The length of a path , denoted by  , is the number of node labels in . That is,  = 0 and l:0 = 1 + 0 . Let % be a sequence of upward wildcard symbols. We use % to denote the number of the upward wildcard symbols in %. In general, given any path expression P of KL, we use P u to denote the number of the upward wildcard symbols in the upward pre x of P . For example, let P = ucp:ucp:name then P u = 2. Let Q1 ; Q2 be path expressions in TL (resp. KL). We say Q1 is a pre x of Q2 , denoted by Q1 p Q2 , if there exists Q in TL (resp. KL) such that Q2 = Q1 :Q. Analogously, Q1 is called a sux of Q2 , denoted by Q1 s Q2 , if there exists Q such that Q2 = Q:Q1. For example, we have :state p :state:county and county s :state:county. Let Q1 ; Q2 be path expressions in TL (resp. KL). We say Q1 is contained in Q2, denoted by Q1 Q2 , if for any XML tree T and any nodes x and y in T , if T = Q1 (x; y) then T = Q2 (x; y). In particular, for any TL expressions Q1 ; Q2 and Q, the following holds: j

2

j

Notations. We next introduce some notations in conj j

g

j

j j

j

2.3 A key constraint language for XML In terms of TL and KL, we de ne key constraints of . K

6

constraints. We use  = ' to denote that  implies ', that is, for any XML tree T , if T = , then T = '.

Note that our key speci cation is independent of DTDs and other schema speci cations. In a key, the target path Q starts at the root of T . To simplify the discussion we require Qs Piu . This ensures, among other things, that the sequence Piu of ucp symbols does not seek the \parent" of the root of an XML tree. Intuitively, ' is a key for nodes in [ Q] , and the (nonempty) set of key paths of ' provides identi cation for these nodes. If two nodes n1 ; n2 have and agree on all the key paths, then they must be the same. In other words, the key constraint requires that if two nodes in [ Q] are distinct, then the two sets of nodes reached on some Pi must be disjoint up to value equality. It should be mentioned that we do not require n1 [ Pi ] and n2 [ Pi ] to be singleton sets. That is, there may be multiple nodes reachable from n1 or n2 by following Pi . As long as there exist some xi n1 [ Pi ] and yi n2 [ Pi ] such that xi and yi are value equal for all i [1; k], then n1 and n2 must be the same node. In particular, if Pi is missing at either n1 or n2 then n1 [ Pi ] and n2 [ Pi ] are disjoint up to value equality. The key is satis ed in this case. This de nition of keys takes into account the semistructured nature of XML data [1]. For example, the XML tree in Figure 1 satis es: ( :state:county:city; name; ucp:name; ucp:ucp:name ); since no two distinct city nodes agree on all the key paths up to value equality. However, it does not satisfy: ( :state:county:city; name ); because the city of Camden in the county of Camden of NC and the city of Camden in the county of Camden of NJ agree on name, but they are di erent cities. A path intends to represent a parent-child path in an XML tree. Note that an attribute or a text (S) node must be a leaf in an XML tree and it cannot have any outgoing edges. Therefore, we assume that in any key (Q; P1 ; : : : ; Pn ), the target path Q does not contain any attribute or the symbol S, and moreover, if Pi contains an attribute or S, then it is of the form Pi0 :l, where Pi0 does not contain any attribute or the symbol S. j

j  j

K

j

K

K

j

K

j

K

[ f

f

f

K

f

g



K

3.1 Containment of path expressions The containment problems are not only interesting in their own right, but also important in the analyses of inference and ( nite) implication of constraints. Containment of KL expressions. The analysis of this problem is trivial: for any path expressions P1 ; P2 in KL, P1 P2 i P1 and P2 are (syntactically) equal (assume that P1 and P2 do not contain  unless they are the empty path ). This can be easily veri ed by induction on the number of upward wildcard symbols in P1 , i.e., on P1u . As a result, containment of KL expressions can be decided in linear time.

g

K



j

j

Containment of TL expressions. It should be noted

that TL is a star-free regular language (see, e.g., [30]). As shown in [22], the containment problem for star-free languages is coNP-complete in general. In contrast, we have shown in [8] that containment of TL expressions is in square time. Theorem 3.1 [8]: Containment of TL expressions can be determined in square time. The idea of the proof is as follows. Given two TL ex-

constraints

K

j

[f

g g

K

K

j

f

j

j

In this section, we investigate the ( nite) satis ability and nite implication problems associated with key constraints of . We rst describe these decision problems. Let  be a nite set of constraints and T an XML tree. We use T =  to denote that T satis es . That is, for any , T = . Let  ' be a nite set of 2





g

3 Decision problems for

K

f

g



K

j

2

f

g

j

2



j

The satis ability problem for is to determine, given any nite set of key constraints  of , whether there exists an XML tree T such that T = . The nite satis ability problem for is to determine whether there exists a nite XML tree T such that T = . The implication problem for is to determine, given any nite set of key constraints  ' in , whether  = '. The nite implication problem for is to determine whether  nitely implies ', that is, whether for any nite XML tree T , if T = , then T = '. As an example, let us consider  = ( ; ID ) ; ' = ( :person; ID; name ): Then we have  = ' (see a proof shortly), which is an instance of the implication problem for . To answer these questions, we need to determine containment of path expressions of TL and KL. That is the problem to determine, given any path expressions Q1 and Q2 of TL (or KL), whether Q1 Q2 . We rst investigate the containment problems for KL and QL, and then study the decision problems for . Proofs of the results are given in the appendix.

j

2

j

g

7

pressions Q1 ; Q2 , we consider their NFAs (nondeterministic nite state automata. See, e.g., [21] for a de nition of NFAs), denoted by M (Q1 ); M (Q2 ), respectively. We treat these NFAs as graphs and de ne a certain simulation relation from M (Q1 ) to M (Q2 ) (see, e.g., [1] for discussions of simulation in connection with graphs and semistructured data). It can be shown that Q1 Q2 if and only if there exists a simulation relation from M (Q1 ) to M (Q2 ), and moreover, there is a square time algorithm for determining whether such a simulation exists or not. A detailed proof can be found in [8].

for nite implication of constraints to be established below also hold for implication of constraints. We next show that nite implication of constraints is nitely axiomatizable. We present a set of inference rules for nite implication of key constraints of , denoted by , as follows: superkey: (Q; S ); P KL (Q; S P ) subnode: (Q:; %:P1 ; :::; Pl0 ; :::; %:Pk ); ; 0 are paths and Pl = :0 ; % is a sequence of ucp's and % =  ; Pl0 is either 0 or %:Pl (Q; P1 ; :::; Pl ; :::; Pk ) K

K

K

K

I





2

[f

g



3.2 Satis ability of keys

f

Given a key constraint language for XML, it is important to be able to decide whether keys expressed in the language can be satis ed by any ( nite) XML trees at all. Better still, it is desirable if all keys are meaningful, i.e., given any nite set of keys, one can always nd a ( nite) XML tree that satis es the keys. Some key speci cations do not have this property. For example, keys proposed in XML Schema [27] and the strong keys de ned in [7] do not have the nite model property, i.e., some keys expressed in those languages may not have any nite XML document that satis es them. In relational databases, given any nite set of keys (and foreign keys) over a relational schema, one can always nd a nonempty nite instance of the schema that satis es the constraints. In fact, one can construct a nite instance of the schema by creating a tuple for each relation such that the instance satis es the constraints. In other words, any nite set of keys (and foreign keys) in a relational database is always nitely satis able. Keys of also have this property. Indeed, given any nite set  of key constraints in , one can always nd a nite XML tree T such that T = . In particular, the tree consisting of a single node (the root) satis es any keys in . In other words, keys of are always nitely satis able.

j j

f



containment:

j j

g

(Q; S ); Q0 Q (Q0 ; S ) 



empty-path:

S KL; 

S is nonempty (; S )

We brie y illustrate these rules as follows. superkey . It should be mentioned that this rule also holds for keys in relational databases. It states that if S is a key for [ Q] , then so is any superset of S . subnode . To understand this rule, consider two nodes n1 ; n2 [ Q] . Suppose that for all i [1; k] there exist xi n1 [ Pi ] and yi n2 [ Pi ] such that xi =v yi . Then by the assumption that  is a pre x of Pl , there must be distinct nodes n01 ; n02 [ Q:] such that n01 n1 [ ] and n02 n2 [ ] . Observe that xl n01 [ Pl0 ] and yl n02 [ Pl0 ] , and moreover, for any i [1; k], if i = l then xi n01 [ %:Pi ] and yi n02 [ %:Pi ] . Since (Q:; %:P1 ; :::; Pl0 ; :::; %:Pk ) is a key, we must have n01 = n02 . In addition, n1 = n2 since n1 ; n2 are ancestors of n01 ; n02 , respectively, and because XML trees do not allow sharing of nodes. Thus (Q; P1 ; :::; Pl ; :::; Pk ) is also a key. Note that if  = , then the key in the precondition and the one in the consequence of the rule are the same. containment . This rule holds because [ Q0 ] [ Q] given Q0 Q and moreover, a key for [ Q] is also a key for any subset of [ Q] .





2

K

2

2

K

j

K

g

2

2

2

K

2

2

2

2

3.3 Implication of keys

6

2

We rst show that for constraints of , implication and nite implication are the same problem (see the appendix for a proof). Proposition 3.2: The implication and nite implication problems for coincide. By Proposition 3.2, we can also use  = ' to denote that  nitely implies '. In addition, all the results K

f

K



j

g

g





8

2

f

empty-path . This rule is sound because [ ] consists of a single node, namely, the root. Given a nite set  ' of constraints, we use  ' to denote that ' is provable from  using . That is, there is an -proof of ' from . For example, recall the constraints  = ( ; ID ) and ' = ( :person; ID; name ) given earlier. We show  ' as follows. By :person and the containment rule, we have

4 Conclusion



[ f

g

We have proposed a key constraint language for XML de ned in terms of regular path expressions extended with downward and upward wildcards. Despite the simple syntax of the language, we have demonstrated that it is capable of expressing both absolute and relative keys, which are important for hierarchically structured data such as XML documents and scienti c databases. We have investigated decision problems associated with the key constraint language. More speci cally, we have shown that any keys expressed in the language can be satis ed by a nite XML tree. We have also shown that the implication and nite implication problems for the key language are nitely axiomatizable and are decidable in cubic time. A number of questions remain open. First, the decision problems are investigated in the absence of XML DTDs. Keys may interact with DTDs and the interaction may complicate reasoning about keys [17, 16, 10]. This issue needs further investigation. Second, keys also interact with foreign keys. As shown by [18], the ( nite) implication problem for simple keys and foreign keys is undecidable even in the context of relational databases. It is important to nd practical restrictions under which the decision problems for XML keys and foreign keys are decidable. Finally, one may want to de ne keys with more general path expressions. The decision problems for more expressive keys need to be studied.

K

`

I

I



f

`





`

f







f

g g

g

( :person; ID ): 

f

g

By the superkey rule, we have ( :person; ID )

f



f

g g `

( :person; ID; name ): 

f

g

Thus  '. The theorem below shows that is indeed a nite axiomatization for constraint implication. Theorem 3.3: The set of inference rules is sound and complete for nite implication of constraints. The theorem claims that given any nite set  ' of constraints,  = ' i  '. Soundness of can be veri ed by induction on the lengths of -proofs. To prove its completeness, it suces to show that if  ', then  = '. More speci cally, we show that if  ', then there exists a nite XML tree G such that G =  but G = '. The proof is a little involved. See the appendix for the details of the proof. A similar result was proven in [8]. However, keys considered in [8] are de ned in terms of less expressive path expressions. In particular, the upward wildcard ucp was not considered there. As an example, consider the constraints  and ' given earlier. We have shown  '. By Theorem 3.3, we have  = '. `

I

K

I

K

[f

K

j

`

g

I

I

6`

6j

6`

j

j

:

Acknowledgments. helpful discussions.

We thank Peter Buneman for

References [1] S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufman, 2000. [2] S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. [3] S. Abiteboul and V. Vianu. Regular path queries with constraints. In Proc. ACM Symp. on Principles of Database Systems (PODS), pages 122{133, 1997. [4] V. Apparao et al. Document Object Model (DOM) Level 1 Speci cation. W3C Recommendation, Oct. 1998. http://www.w3.org/TR/REC-DOM-Level-1. [5] C. Baru, A. Gupta, B. Ludascher, R. Marciano, Y. Papakonstantinou, P. Velikhov, and V. Chu.

`

j

Finally, we show that absolute keys and relative keys can be reasoned about eciently. Theorem 3.4: The nite implication problem for is decidable in cubic time. A cubic time algorithm for determining nite implication of constraints can be found in the appendix. The algorithm is developed based on the inference rules of . By Proposition 3.2, the algorithm can also be used to determine implication of constraints. It should be noted that the complexity is measured in the size of keys of , which is in general much smaller than the size of a data instance, i.e., an XML document. K

K

I

K

K

9

[6]

[7] [8]

[9]

[10]

[11]

[12] [13] [14] [15] [16]

[17]

XML-based information mediation with MIX. In Proc. ACM SIGMOD Conf. on Management of Data, pages 597{599, 1999. T. Bray, J. Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation, Feb 1998. http://www.w3.org/TR/REC-xml. P. Buneman, S. Davidson, W. Fan, C. Hara, and W. Tan. Keys for XML. Draft manuscript, 2000. P. Buneman, S. Davidson, W. Fan, C. Hara, and W. Tan. Reasoning about keys for XML. Technical Report TUCIS-TR-2000-005, Dept. of Computer and Information Sciences, Temple University, 2000. P. Buneman, W. Fan, and S. Weinstein. Path constraints on semistructured and structured data. In Proc. ACM Symp. on Principles of Database Systems (PODS), pages 129{138, 1998. P. Buneman, W. Fan, and S. Weinstein. Interaction between path and type constraints. In Proc. ACM Symp. on Principles of Database Systems (PODS), pages 56{67, 1999. P. Buneman, W. Fan, and S. Weinstein. Query optimization for semistructured data using path constraints in a deterministic data model. In Proc. Int'l Workshop on Database Programming Languages (DBPL), 1999. P. Buneman, W. Fan, and S. Weinstein. Path constraints in semistructured databases. J. Computer and System Sciences (JCSS), in press. J. Clark. XSL Transformations (XSLT). W3C Recommendation, Nov. 1999. http://www.w3.org/TR/xslt. J. Clark and S. DeRose. XML Path Language (XPath). W3C Recommendation, Nov. 1999. http://www.w3.org/TR/xpath. H. B. Enderton. A Mathematical Introduction to Logic. Academic Press, 1972. W. Fan and L. Libkin. Finite implication of key and foreign key constraints for XML data. Technical Report TUCIS-TR-2000-003, Dept. of Computer and Information Sciences, Temple University, 2000. W. Fan and L. Libkin. Finite satis ability of key and foreign key constraints for XML data. Technical Report TUCIS-TR-2000-002, Dept. of Computer and Information Sciences, Temple University, 2000.

[18] W. Fan and J. Simeon. Integrity constraints for XML. In Proc. ACM Symp. on Principles of Database Systems (PODS), pages 23{34, 2000. [19] D. Florescu, L. Raschid, and P. Valduriez. A methodology for query reformulation in CIS using semantic knowledge. Int'l J. Cooperative Information Systems (IJCIS), 5(4):431{468, 1996. [20] C. Hara and S. Davidson. Reasoning about nested functional dependencies. In Proc. ACM Symp. on Principles of Database Systems (PODS), pages 91{ 100, 1999. [21] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison Wesley, 1979. [22] H. Hunt, D. Rosenkrantz, and T. Szymanski. On the equivalence, containment, and covering problems for the regular and context-free languages. J. Computer and System Sciences (JCSS), 12:222{ 268, 1976. [23] M. Ito and G. E. Weddell. Implication problems for functional constraints on databases supporting complex objects. J. Computer and System Sciences (JCSS), 50(1):165{187, 1995. [24] Y. Papakonstantinou and V. Vianu. Type inference for views of semistructured data. In Proc. ACM Symp. on Principles of Database Systems (PODS), pages 35{46, 2000. [25] R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill Higher Education, 2000. [26] J. Robie, J. Lapp, and D. Schach. XML Query Language (XQL). Workshop on XML Query Languages, Dec. 1998. [27] H. S. Thompson, D. Beech, M. Maloney, and N. Mendelsohn. XML Schema Part 1: Structures. W3C Working Draft, Apr. 2000. http://www.w3.org/TR/xmlschema-1/. [28] M. F. van Bommel and G. E. Weddell. Reasoning about equations and functional dependencies on complex objects. IEEE Transactions on Data and Knowledge Engineering, 6(3):455{469, 1994. [29] P. Wadler. A formal semantics for patterns in XSL. Technical report, Computing Sciences Research Center, Bell Labs, Lucent Technologies, 2000. [30] S. Yu. Regular languages. In G. Rosenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 1, pages 41{110. Springer, 1996. 10

Appendix

must exist n1 ; n2 [ Q] , xi n1 [ Pi ] and yi n2 [ Pi ] in T 0 for all i [1; k] such that xi =v yi , but however, n1 = n2 . Observe that the set of nodes in T 0 is the same as that in T . By the choice of  , we must have n1 ; n2 [ Q] in T . Note that Pi 's are in KL and thus do not contain ` '. Thus by the de nition of abstract trees, in T we also have xi n1 [ Pi ] and yi n2 [ Pi ] for all i [1; k]. Therefore, T = . This contradicts the assumption that T = . Thus T 0 = . We next show T 0 = '. Let ' = (Q; P1 ; : : : ; Pk ). By T = ', there exist n1 ; n2 [ Q] , xi n1 [ Pi ] and yi n2 [ Pi ] in T for all i [1; k] such that xi =v yi but n1 = n2 . By the construction of T 0 and the choice of  , we must have n1 ; n2 [ Q] in T 0 . Recall that Pi 's do not contain ` ' by the de nition of KL expressions. Therefore, we have xi n1 [ Pi ] and yi n2 [ Pi ] in T 0 for all i [1; k]. As a result, T 0 = '. This completes the proof of the claim. 2 2

Proof of Proposition 3.2: Observe that given any nite set  ' of constraints, V= ' i there exist no XML tree T such that T =  '. Thus it suces to show that if there exists an XML tree T V such that T =  ', thenVthere must be a nite XML tree T 0 such that T 0 =  '. That is, the complement of the implication problem for has the nite model property [15]. This can be veri ed as follows. Let ' = (Q; P1 ; : : : ; Pk ). Since T = ', there are nodes n1 ; n2 [ Q] , xi n1 [ Pi ] and yi n2 [ Pi ] for i [1; k] such that xi =v yi but n1 = n2 . Let T 0 be the nite subtree of T that consists solely of all the nodes in the paths from root to xi ; yi for all i [1; k]. It is easy to verify that T 0 =  but T 0 = '. Moreover, T 0 is a nite XML tree. [ f

g

K

j

2

^ :



2

^ :

j

2

^ :

g

2

j

2

2

:

j

j

2

j

j



j

j

j

j

g

j

j

6

j

f

I

2

:

2

j

2

:



j

j  j

6

:

[f

j

`

:

j

2

2



j

ig

2

j

j

g

f



6j

:

g

f





j

f

:

6j

j

:

6`

`

j

2

j

Equipped with the claim, we proceed to prove Theorem 3.3. Assume  '. We construct a nite XML tree G such that G =  and G = '. Assume ' = (Q; P1 ; : : : ; Pk );  = 1 ; : : : ;  n ; i = (Qi ; Pi1 ; : : : ; Pik ); for i [1; n]. Consider the key '. By the de nition of KL expressions, we assume that for each j [1; k], Pj = %j :j , where %j is the upward pre x of Pj and j is a nonempty path of node labels. Without loss of generality, also assume that for all j; l [1; k], %j %l if j < l and that there is m [1; k] such that for all j > m, we have Pj = j , i.e., %j = 0. Given these, we construct G in two steps. We rst build an abstract tree T using ', and then construct G from T . Observe that if Q = , then  ' by the empty-path rule in , which contradicts the assumption. Thus we assume Q =  in the sequel. The abstract tree T is depicted in Figure 2. The root r of T has two disjoint branches, referred to as T1 and T2 , which consist of a Q path each. Let n1 ; n2 be the two distinct nodes at the end of the Q paths, respectively. For each j [1; k], let uj ; vj be the two nodes that are %j -level up from n1 and n2 , respectively. From uj and vj we add two disjoint j paths. Let xj and yj refer to the nodes at the ends of the j paths, respectively. We assume that for j [1; k], xj =v yj , but for all other pair of nodes x; y in T , x =v y. This can be achieved by assuming that each node has a text subelement (if it is not already an attribute or a text node) and assigning appropriate string values to these text (attribute) nodes. Observe the following:

6`

j

2

2

2

I

K

2

2

2

K

g

:

g



Proof of Theorem 3.3: We show that is complete for nite implication of constraints, i.e., given any nite set  ' of constraints, if  = ' then  '. To do so, it suces to show that if  ', then there is a nite XML tree G such that G =  but G = ', i.e.  = '. To facilitate the construction of G, we introduce a notation of abstract trees. An abstract tree T is an extension of XML tree by allowing ` ' as a node label. As a result, a parent-child path Q in T is a path expression of TL, which may contain ` '. Let x; y be two nodes in T and Q0 be a path expression of TL. We say T = Q0 (x; y) i there exists a parent-child path Q in T such that T = Q(x; y) and Q Q0 . However, we do not allow the upward wildcard ucp to move up from a node labeled , i.e., T = ucp(y; x) for any node y labeled . The connection between abstract trees and XML trees is revealed by the following claim. Claim: If there exists a nite abstract tree T such that T =  and T = ', then there exists a nite XML tree T 0 such that T 0 =  if T 0 = '. Proof of Claim: Given an abstract tree T such that T =  and T = ', we construct a nite XML tree T 0 from T as follows. Let  be a fresh node label (element tag) that does not appear in  ' . We replace every occurrence of ` ' in T with  , and refer to the nite XML tree obtained in this way as T 0. We show that T 0 =  and T 0 = '. We rst show T 0 = . Suppose, by contradiction, that there is   such that T 0 = . Assume that  = (Q; P1 ; : : : ; Pk ). Then by De nition 2.2, there [f

f

6

2

j

:

2

6

j

:

j

j

6j

2

j

2

j

K

f

2

6

j

j

2

2

:

g

11

j

r

Q

ρ1

x1 x m-1 ρm

-*

u1

ρ m-1

n1

xm

Q

ρ1

-*

v1

vm-1

ρk

ρm

xk

x1

ρ m-1

u m-1

xl

y m-1 n2

ρk

ym

r

Q

y1

n1

ρm

xk



yl

n2

ρk

ym

yk

T2 (a) r

T = '. Indeed, n1 and n2 agree on all the key j

y

ρm

Figure 2: An abstract tree 

ρl

vl

ρk

xm

T1

ul

y m-1

x m-1

yk

ρl

x

Q

y1

:

paths but they are distinct nodes. For all j [1; k], j contains neither ` ' nor `ucp'. The path from root r to u1 (v1 ) is a pre x of Q and may contain \ ". In contrast, the paths below u1 and v1 do not contain ` ' by De nition 2.2.

xl

ρl

x=y



2

u1 = v 1

x1

ul = v l

yl



y m-1

x m-1



We next modify T such that the modi ed tree satis es . For each i [1; n], if the tree does not satisfy i , then there must be x; y [ Qi ] , x0j x[ Pij ] , yj0 y[ Pij ] for all j [1; ki ] such that x0j =v yj0 , but x = y (recall that i = (Qi ; Pi1 ; : : : ; Pik )). By the construction of T , it is easy to see that x; x0j and y; yj0 are in di erent branches. Without loss of generality, assume that x; x0j are in T1 and y; yj0 are in T2. By the de nition of the tree T , if x0j =v yj0 , then there must be s [1; k] such that x0j = xs and yj0 = ys . Moreover, ki k. Without loss of generality, assume that for j [1; ki ], x0j = xj and yj0 = yj . To ensure the satisfaction of i , we merge x and y to be the same node, and also merge their ancestor such that they are pairwise identical. There are two cases to consider here. (1) x and y are above n1 and n2 (i.e., they are not in the subtrees rooted at n1 ; n2 . See Figure 3 (a)). By the de nition of abstract trees, there must be TL expressions Q0 ; Q00 , number l < m and path 0l such that Q = Q0 :Q00 , Q0 :0l Qi and 0l p l . That is, the concatenation of a pre x of Q and a pre x of l is contained in Qi . Note that it is possible that 0l =  and in that case, a pre x of Q is contained in Qi . We merge x and y to be the same node and moreover, for all the corresponding nodes x0 ; y0 in the two paths from root r to x; y in the two branches, we let x0 = y0 . In particular, if x0 = us and y0 = vs for some s < l, then we discard the s path in the T2 branch (see Figure 3

n1

ρm

ρk

ρm

x

ym

n2

ρk

2

2

2

x

2

2

m

k

yk

6

ig

f

(b)

Figure 3: The abstract trees in case 1

2



2



r

ρ1

x1 x m-1 ρm

ρ m-1

Q

Q

y1

ρ m-1

y m-1 n1



xm

ρ1

x

ρ l’

xl

ρk

xk

ρm

ym

n2 y

ρ l’

ρk

yk

yl

Figure 4: The abstract tree in case 2 12

Algorithm

a nite set  ' of constraints, where ' = (Q; P1 ; :::; Pk ) `yes' i  = ' and `no' otherwise 1. if Q =  then output `yes' and terminate; 2. for each   do if  = (Q0 ; P10 ; : : : Pm0 ) and all of the following conditions are satis ed: (i) there is a path  such that Q: Q0 , and (ii) there exists a sequence % of ucp symbols such that % =  , and (iii) there exist a path 0 and a number l [1; k] such that Pl = :0 , and (iv) there exists s [1; m] such that either Ps0 = 0 or Ps0 = %::0 , and (v) for all i [1; m] and i = s there is j [1; k] such that Pi0 = %:Pj then output `yes' and terminate; 3. output `no';

Input:

[ f

Output:

g

K

f

g

j

2

f

g



j j

j j

2

2

2

6

2

Figure 5: An algorithm for testing constraint implication K

(b)). Let T 0 be the tree obtained after the process. It can be veri ed that T 0 = i . In addition, for any  , if  is satis ed by the tree before the processing of i , then T 0 also satis es . Moreover, T 0 = ' since n1 and n2 remain distinct in T 0 but they agree on all the key paths in '. (2) x and y are below n1 and n2 (i.e., x and y are in the subtrees of n1 and n2 . See Figure 4). In this case, we show  '. Indeed, if x and y are below n1 and n2 , then there must be l m and paths 0l ; 00l such that Q:0l Qi and Pl = 0l :00l . In addition, there is a sequence % of ucp symbols such that % = 0l and for each j [1; ki ] and j = l, we have %:Pj = Pij . When j = l, we have either %:Pl = Pil or 00 = Pil . That is, i is of the form: j

our assumption that  '. Therefore, case (2) cannot happen if  '. We repeat the procedure until all keys in  have been processed. Let T 0 denote the abstract tree obtained at the end of the process. We show that T 0 =  and T 0 = '. By the assumption that  ', for any  , either T =  or we have T =  and  has the properties described in case 1. From the discussions given above follows that the processing of case 1 ensures that T 0 =  and T 0 = '. Now we have an abstract tree T 0 such that T 0 =  and T 0 = '. By the claim that we have shown earlier, there must exist a nite XML tree G such that G =  and G = '. Therefore,  = '. This completes the proof of Theorem 3.3. 6`

2

j

6`

:

j

j

j

j

j

(Qi ; %:P1 ; :::; Pil ; :::; %:Pk ): ig

 (Q; P1 ; :::; Pl ; :::; Pk ):

2

ig

[f

g

j

Finally, by the superkey rule, we have

j

j

j

2

j

 (Q; P1 ; :::; Pl ; :::; Pk ; :::; Pk ): i

6j

K

By this and the subnode rule,

f

:

I

 ` (Q:0l ; f%:P1 ; :::; Pil ; :::; %:Pki g):

`

:

, we develop a cubic time algorithm for determining nite implication of constraints (Figure 5). From the proof of Theorem 3.3 follows that the algorithm is correct. We next analyze its complexity. For each   ' in the input, let us use  to denote the size of , and  to denote the size of . Observe that each   is processed at most once and the condition test in the body of the loop takes at most O(  2 ' ) time by Theorem 3.1. Hence the loop takes O(  2 ' ) time. Thus the algorithm is in cubic time.

I

f

:

Proof of Theorem 3.4: Using the inference rules of

By i and the containment rule in , we have

`

j

:

j

6

f

j

j



j

j

j



j j

6`

2

`

2

:

g

j

Observe that a special case is when 0l = , i.e., Q is contained in Qi . In that case the containment and superkey rules suce. Thus  '. This contradicts `

13

j

j

 j

j

j

j