Finitely Representable Nested Relations E. Bertino
B. Catania
Dipartimento di Scienze dell'Informazione Universita degli Studi di Milano Via Comelico 39/41 20135 Milano, Italy Email: bertino,catania @dsi.unimi.it f
g
Limsoon Wong
BioInformatics Centre & Real World Computing Partnership Novel Function Institute of Systems Science Laboratory Singapore 119597. Email:
[email protected].
Keywords: Databases, nested relational calculus, constraints.
1 Introduction The need for sophisticated functionalities has lead to the evolution of database theory, requiring the de nition of appropriate data models. In this respect, at least two important research directions have been devised: the rst is the de nition of complex object models [1, 5], the second is the use of constraint models, using mathematical constraints to nitely represent in nite information [8]. Several approaches have been proposed to model complex data by using nitely representable relations. By nitely representable nested relations we mean relations that are nested and such that the used sets can be either nite, as in the traditional nested relational model, or in nite but nitely representable, as in the constraint relational model. Most of the proposed languages model sets up to a given height of nesting [12]. Others do not have this restriction but are de ned only for speci c theories. For example, the semantics assigned to C-CALC [7] makes sense only when using the dense-order theory. For others, as LyriC [3], the de nition of a formal basis, supporting the de nition and the analysis of relevant language properties, has been left to future work. The aim of this paper is the de nition of a model and a query language for nitely representable nested relations, overcoming some limitations of the previous proposals. Our language is obtained by extending NRC [14] to deal with possibly in nite relations, nitely representable by using the real polynomial constraint theory, and it is called frNRC . NRC is similar to the well-known comprehension mechanism in functional programming and its formulation is based on structural recursion [5] and on monads [13]. NRC has been proved equivalent to most nested relational languages presented before. The choice of this language is motivated by the fact that the formal semantics assigned to NRC and the structural recursion on which it is based allow us to prove several results about frNRC in a simple way. Moreover, even if frNRC has been de ned for the theory of real polynomial constraints, other theories can be easily modeled in the same framework. One of the main result about this language is that frNRC as NRC , has the conservative extension property. This means that, when input and output are restricted to deal with a speci c degree of nesting, any higher degree of nesting generated by the computation is useless [14]. Thus, when input and output relations represent constraint at relations, frNRC expressions can be mapped in rst-order logic extended 1
with polynomial constraints. Note that, even if this result is a consequence of properties of NRC and constraint query languages, nobody has proved this result before. Moreover, giving a constructive proof, we prove that frNRC is eectively computable. In particular, it is equivalent to the constraint relational calculus extended with real polynomial constraints, modulo encoding/decoding of input/output. The same proof shows that the language has NC data complexity. The proposed proof, that can be applied to other languages as well (for example to LyriC [3]), clearly shows which are the main issues arising in the compilation of nested constraint query languages to at constraint query languages.
2 Finitely Representable Nested Relational Calculus Types In the traditional constraint query setting, a relation can be an in nite set of tuples taking
values from a given domain, as long as the set is nitely representable by a nite number of constraints, expressed using a decidable logical theory [6, 8, 10]. We extend this paradigm to sets that can be nested to an arbitrary depth. To this purpose, we choose the theory of real polynomial inequality constraints. In particular, we allow such in nite sets of tuples of reals to appear at any depth in a nested relation1. However, we do not allow a nested set to have an in nite number of such in nite sets as its elements, to guarantee eective computability and low data complexity. To be precise, the types that we want to consider are: s
::= R
j s1 sn j fsg j ffr
R
Rg
The type R contains all the real numbers. The type s1 sn contains n-ary tuples whose components have types s1 , ..., sn respectively. The type fsg are sets of nite cardinality whose elements are objects of type s. The type ffr sg are sets of (possibly) in nite cardinality whose elements are objects of type s, where s is a type of the form R R. We also require each set in ffr sg to be nitely representable in the sense of [6, 8, 10]. For convenience, we also introduce a `type' B to stand for Booleans. However, for economy, we use the number 0 to stand for false and the number 1 to stand for true .
Expressions To express queries over our nitely representable nested relations, we extend the nested relational calculus NRC de ned in [5, 14]. We call the extended calculus frNRC , standing for nitely representable NRC . The syntax and typing rules of frNRC are presented in Figure 1. We often omit the type superscripts
as they can be inferred. An expression e having free variables ~x is interpreted as a function f (~x) = e, ~ x] as its output. An expression e with no free variable can be regarded which given input O~ produces e[O=~ as a constant function f (~x) = e that returns e on all input ~x. In the following, we present the language incrementally. NRC . NRC is equivalent to the usual nested relational algebra [1, 5]. The semantics of the NRC rules is as follows. Variables xs are available for each type s. Every real number c is available. The operations for tuples are standard. Namely, (e1 ; : : : ; en ) forms an n-tuple whose i component is ei and i e returns the i component of the n-tuple e. fg forms the empty set. feg forms the singleton set containing e. S e1 [ e2 unions the two sets e1 and e2. fe1 j x 2 e2 g maps the function f (x) = e1 over all elements in e2 and then returns their union; thus if e2 is the set fo1 ; : : : ; on g, the result of this operation would be S f (o1 ) [ [ f (on ). For example, ff(x; x)g j x 2 f1; 2gg evaluates to f(1; 1); (2; 2)g. The operations 1
Even if in this paper we consider real numbers, other domains can be easily considered.
2
NRC rules xs : s e : s1 sn i e : si
feg : fsg
e1 : R e 2 : R e1 = e2 : B
e1 : s1 en : sn (e1 ; : : : ; en ) : s1 sn e1 : fsg e2 : fsg e1 [ e2 : fsg
e:s
fgs : fsg
c:R
Sef1e1: fjtxgs 2ee22:gf:sfgtg
e1 : B e2 : s e3 : s if e1 then e2 else e3 : s
e : fRg empty e : B
Rules for nitely representable sets and constraints
ffr gs : ffr sg
e:s
ffr eg : ffr sg
e1 : R e 2 : R e1 + e2 : R
e1 : ffr sg e2 : ffr sg e1 [fr e2 : ffr sg
e1 : R e 2 : R e1 ? e2 : R
Sfe1 e:1ffrj xss12g2 fr
e1 : R e2 : R e1 e 2 : R
e2 : ffr s2 g fr e2 g : ffr s1 g
e1 : R e2 : R e1 e 2 : R
e : ffr Rg empty fr e : B
R : ffr Rg
Rules for integrating sets and nitely representable sets
Sfe1 e: 1ffrj xss12g 2 ee22g::ffs2 gs1g fr
fr
Figure 1: frNRC syntax and typing rules for Booleans are also quite typical, with the understanding that true is represented by 1 and false is represented by 0. e1 = e2 returns true if e1 and e2 have the same value and returns false otherwise. empty e returns true if e is an empty set and returns false otherwise. Finally, if e1 then e2 else e3 evaluates to e2 if e1 is true and evaluates to e3 if e1 is false ; it is unde ned otherwise. Finitely representable relations and constraints. We add constructs analogous to the nite set constructs of NRC to manipulate nitely representable sets and constructs for arithmetics to express real polynomial constraints2. The semantics of the rst four rules is analogous to those of nite sets, except that each operation does not return a set but a nitely representable set. The four arithmetic operations have the usual interpretation. empty fr e tests if the nitely representable set e of reals is empty. Finally, the symbol R denotes the in nite (but nitely representable) set of all real numbers. It is the presence of this symbol R that allows to express unbounded quanti cation. For example, given a S polynomial f (x), we can express its set of roots easily: ffr if f (x) = 0 then ffr xg else ffr g j x 2fr Rg. Similarly, we can express the usual linear order on the reals, because the formula 9z:(z 6= 0) ^ (y ? x = S z 2), which holds i x < y, is expressible as not (empty fr ( ffr if not (z = 0) then if y ? x = z z then ffr z g else ffr g else ffr g j z 2fr Rg)), with not implemented in the obvious way. Integrating sets and nitely representable sets. The constructs described above let us manipulate nite sets and nitely representable sets independently. In order for these two kinds of sets to interact, we need one more construct; see Figure 1. This construct let us convert a nite set of real tuples into S a nitely representable one. The semantics of ffr e1 j x 2 e2 g is to apply the function f (x) = e1 to 2
Note that dierent sets of rules can be inserted to represent dierent logical theories (for example, dense-order).
3
each element of e2 and then returns their union as a nitely representable set. That is, if e2 is the set fo1 ; : : : ; on g, then it produces the nitely representable set f (o1 ) [fr [fr f (on ). For example, the conS version of a nite set e of real tuples to a nitely representable one can be expressed as ffr ffr xg j x 2 eg. Before we study frNRC properties, let us brie y introduce a nice shorthand, based on the comprehension notation [4, 13], for writing frNRC queries. Recall from [4, 5, 14] that the comprehension fe j A1 ; : : : ; An g, where each Ai either has the form xi 2 ei or is an expression ei of type B , has a direct correspondent in NRC that is given by recursively applying the following equations: fe j xi 2 ei ; : : :g fe j ei ; : : :g
= if
= ei
S
ffe j : : :g j xi 2 ei g
then
fe j : : :g
else
fg
The comprehension notation is very user-friendly. For example, it allows us to write f(x; y) j x 2 SS e1; y 2 e2 g for the Cartesian product of e1 and e2 instead of the clumsier f ff(x; y)g j y 2 e2 g j x 2 e1 g. The comprehension notation can be extended naturally to all frNRC expressions. We can interpret the comprehension ffr e j A1 ; : : : ; An g, where each Ai either has the form xi 2 ei or has the form xi 2fr ei or is an expression ei of type B, as an expression of frNRC by recursively applying the following equations: ffr e j xi 2 ei ; : : :g ffr e j xi 2fr ffr e j ei ; : : :g
=
S
S fr
ffr ffr e j : : :g j xi 2 ei g
= = if i then
ei ; : : :g e
f
ffr e j : : :g j xi 2fr
ffr e j : : :g
else
ei g
ffr g
For example, the query to nd the roots of f (x) becomes ffr x j x 2fr R; f (x) = 0g. Similarly, the query to test if x < y becomes not (empty fr (ffr z j z 2fr R; not (z = 0); y ? x = z z g)). In addition to comprehension, we also nd it convenient to use a little bit of pattern matching, which can be removed in a straightforward manner. For example, we write f(x; z ) j (x; y) 2 e1 ; (y0 ; z ) 2 e2; y = y0 g for relational composition instead of the more ocial f(1 xy; 2 yz ) j xy 2 e1; yz 2 e2 ; 2 xy = 1 yz g. We should also remark that while frNRC provides only equality test on R and emptiness tests on fRg and ffr Rg, these operations can be lifted to every type s using frNRC as the ambient language; see [14]. Similarly, commonly used operations such as set membership, set subset tests, set dierence, and set intersection are expressible at all types in frNRC .
3 Conservative Extension Property Given a type s, the height of s is de ned as the depth of nesting of set brackets fg and ffr g in s. Given an expression e of frNRC , the height of e is de ned as the maximum height of all the types that appear in e's typing derivation. For example, f(x; y) j x 2 e1 ; y 2 e2 g has height 1 if both e1 and e2 have height 1. On the other hand, f(x; ffr z j z 2fr R; z < xg) j x 2 eg have height 2 if e has height 1. De nition 3.1 A language L is said to have the conservative extension property if every function f : s1 ! s2 that is expressible in L can be expressed using an expression of height no more than the maximum between the heights of s1 and s2 . 2 We now prove that frNRC has the conservative extension property, just like NRC [14]. As in [14], a set of strongly normalizing rewriting rules that reduces set height is given. Then we show that the induced normal forms have height no more than that of their free variables (i.e., their input variables). 4
;
i (e1 ; : : : ; en ) ei if true then e1 else e2 e1 if false then e1 else e2 e2 fg [ e e e [ fg e empty (e1 [ [ en ) false , if some ei has the form feg empty (e1 [ [ en ) true , if every ei has the form fg empty fr (e1 [fr [fr en ) false , if some ei has the form ffr eg empty fr (e1 [fr [fr en ) true , if every ei has the form ffr g fe j x 2 fgg fg fe1 j x 2 fe2 gg e1 [e2 =x] fe1 j x 2 e2 [ e3 g fe1 j x 2 e2 g [ fe1 j x 2 e3 g fe1 j x 2 fe2 j y 2 e3 g f fe1 j x 2 e2 g j y 2 e3 g fe1 j x 2 if e2 then e3 else e4 g if e2 then fe1 j x 2 e3 g else fe1 j x 2 e4 g ffr e j x 2fr ffr gg ffr g ffr e1 j x 2fr ffr e2 gg e1 [e2 =x] ffr e1 j x 2fr e2 [fr e3 g ffr e1 j x 2fr e2 g [fr ffr e1 j x 2fr e3 g ffr e1 j x 2fr ffr e2 j y 2fr e3 gg ffr ffr e1 j x 2fr e2 g j y 2fr e3 g ffr e1 j x 2fr if e2 then e3 else e4 g if e2 then ffr e1 j x 2fr e3 g else ffr e1 j x 2fr e4 g ffr e j x 2 fgg fg ffr e1 j x 2 fe2 gg e1 [e2 =x] ffr e1 j x 2 e1 [ e2 g ffr e1 j x 2 e2 g [ ffr e1 j x 2 e3 g ffr e1 j x 2 fe2 j y 2 e3 gg ffr ffr e1 j x 2 e2 g j y 2 e3 g ffr e1 j x 2fr ffr e2 j y 2 e3 gg ffr ffr e1 j x 2fr e2 g j y 2 e3 g ffr e1 j x 2 if e2 then e3 else e4 g if e1 then ffr e1 j x 2 e3 g else ffr e1 j x 2 e4 g
; ;
S S S S S S S S S S S S S S S S
; ;
; ;
;
; ;
; S S S ; ;S S S ; ; ; S S ; S ;S S S ; ; ; S S S ; ;S S S ;S S S ;
S
S
S
Table 1: Rewrite rules Table 1 shows the rewrite rules that we want to use. Those for NRC are taken from [14]. As usual, we assume that bound variables are renamed to avoid capture and that e1 [e2 =x] denotes the expression obtained by replacing all free occurrences of x in e1 by e2 . It is readily veri ed that the proposed rewrite rules are sound. That is, expressions obtained from e1 by rewriting are semantically equivalent to e1 . Furthermore, using a straightforward adaptation of the termination measure given in [14], we can prove the following result. Proposition 3.2 If e1 ; e2, then e1 = e23. Moreover, the rewrite system presented in Table 1 is guaranteed to stop no matter in what order these rules are applied (it is strongly normalizing). 2 The following result follows from the application of a simple induction on the structure of expressions. Proposition 3.3 Let e : s be an expression of frNRC having free variables x1 : s1, ..., xn : sn such that e is a normal form with respect to the above rewrite system. Then the height of e is at most the maximum of the heights of s, s1 , ..., sn . 2 Combining Propositions 3.2 and 3.3, we conclude the following. Theorem 3.4 frNRC has the conservative extension property. 2 Paredaens and Van Gucht gave a translation for mapping nested relational algebra expressions having
at relations as input to an equivalent expression in rst-order logic with bounded quanti cation [11]. This translation can be easily adapted to provide a translation for mapping frNRC expressions of height 1 to rst-order logic with polynomial constraints. Next result follows from this and Theorem 3.4. Corollary 3.5 If f : s1 ! s2 is a function expressible in frNRC and s1 and s2 have height 1, then f is expressible in rst-order logic with polynomial constraints. 2 3
The symbol = denotes semantic equivalence.
5
Thus all functions f : s1 ! s2 in frNRC , with s1 and s2 of height 1, are eectively computable by compiling into constraint query languages such as those proposed in [6, 8, 10]. As a consequence, we can make use of well-known results [2, etc.] on constraint query languages to analyze the expressiveness of frNRC with respect to such functions. For example, an immediate consequence is that frNRC cannot express parity test, connectivity test, and transitive closure. We can also use the above \compilation procedure" to study the expressive power of frNRC on functions whose types have heights exceeding 1. We borrow an example from [9] for illustration. A set of sets O = fO1 ; : : : ; On g : ffRgg is said to have a family of distinct representatives i it is possible to pick an element xi from each Oi such that xi 6= xj whenever i 6= j . It is known from [9] that NRC cannot test if a set has distinct representatives. We show it cannot be expressed in frNRC either. Corollary 3.6 frNRC cannot test if a set of sets has distinct representatives. Proof. frNRC cannot express parity test. It follows that it cannot test if a chain has an even number of nodes. Let a set Xm = f(x1 ; x2 ); : : : ; (xm?1 ; xm )g be given, where m > 2. Then we can construct in frNRC the set Sm = ffx1 g; fxm g; fx1; x3 g; fx2; x4 g; : : : ; fxm?2 ; xm gg. According to [9], Sm has distinct representatives i m is even. It follows that frNRC cannot test for distinct representatives. 2
4 Eective Computability and Complexity Recall that expressions in frNRC can iterate over in nite sets. An important question that arises is whether every function expressible in frNRC is computable. In the previous section, we saw that if a function in frNRC has input and output of height 1, then it is computable. In this section, we lift this result to functions of all heights. Our strategy is as follows. We nd a total computable function ps : s? > s0 to encode nested nitely representable sets into at nitely representable sets. We also nd a partial computable decoding function qs : s0 ! s so that qs ps = id. Finally, we nd a translation ()0 that maps f : s1 ! s2 in frNRC to (f )0 : s01 ! s02 in frNRC such that qs2 (f )0 ps1 = f . Note that (f )0 has height 1 and is thus computable. Before we de ne p and q, let us rst de ne s0 , the type to which s is encoded. Notice that s0 always has the form ffr R Rg.
R0
(
=
ffr
Rg
s1 s n
ffr sg
=
0
R
0
, where
ffr t1 tn g
0
si
=
.
ffr ti g
sg
, where = fr The encoding function ps : s ! s0 is de ned by induction on s. In what follows, ~0 stands for a tuple fsg
0
=
ffr
) =
ffr
R
R
tg
s
0
f
tg
of zeros (0; : : : ; 0) having the appropriate arity. A nitely representable set is coded by tagging each element by 1 if the set is nonempty and is coded by a tuple of zeros if it is empty. A nite set is coded by tagging each element by 1 and by a unique identi er if the set is nonempty and is coded by a tuple of zeros if it is empty. More precisely, p
R(o) = ffr og
(( 1 n )) = fr ( 1 n) 1 fr ( ) = fr (0 0) , if is empty. Otherwise,
ps1 sn
fr sg
pf
O
o ;:::;o f
;~ g
f
O
x ;:::;x
j x
2
( ) fr s ( ) =
( )
ps1 o1 ; : : : ; xn 2fr psn on g pf
6
g O
ffr
(1 ) ;x
.
j x 2fr Og
( ) = fr (0 0 0) , if is empty. Otherwise, s ( ) = 1 fr = 1 n and fr n , if = (1 ) ( ) . Note that we allow the 's above to be any numbers, so long as they are i fr fr s i distinct positive integers.
pfsg O O
f
f
;~ g
;
; i; x
O
j x 2
p
pf g O
o
g
O
[
[
O
O
fo ; : : : ; o
g
i
S
We use fe1 j x 2fr e2 g to stand for the application of f (x) = e1 to each element of e2 , provided the nitely representable set e2 has nite number of elements, and then return the nite union of the results. Then the comprehension notation fe j A1 ; : : : ; An g is extended to allow Ai to be of the form xi 2fr ei and the translation equations are augmented to include the equation: fe j xi 2fr ei ; : : :g = Sffe j : : :g j x 2 e g. i fr i The decoding function qs : s0 ! s, which strips tags and identi ers introduced by ps , can be de ned as follows: q
R(O) = o, if O = ffr og.
( )=( 1 n ), if i = si ( fr i ( (1 ) fr . fr s ( ) = fr (1 ) fr = ) (1 s ( fr s ( )=
qs1 sn O qf
g O
qf g O
o ;:::;o
f
fq
x j
f
y j
;x
o
2
q
f
x
j
x1 ; : : : ; x n
)
).
2fr Og
Og
; j; y
2
O; i
jg
j
)
; i; x
.
2fr Og
It is clear that ps and qs are both computable, even though they cannot be expressed in frNRC . Moreover, using the fact that ps (O) is never empty, by induction on the structure of s we can show that qs is inverse of ps . Proposition 4.1 qs ps = id. 2 Note that ps is not deterministic. Let O1 : s0 and O2 : s0 . Then we say O1 O2 if qs (O1 ) = qs (O2 ). That is, O1 and O2 are equivalent encodings of an object O : s. It is clear that whenever O1 O10 , ..., and On On0 , then ffr (x1 ; : : : ; xn ) j x1 2fr O1 ; : : : ; xn 2fr On g ffr (x1 ; : : : ; xn ) j x1 2fr O10 ; : : : ; xn 2fr On0 g. It is also obvious that whenever O O0 , then ffr xi j (x1 ; : : : ; xn ) 2fr Og ffr xi j (x1 ; : : : ; xn ) 2fr O0 g. We can now state the following key proposition. Proposition 4.2 For every function f : s1 ! s2 in frNRC , there is a function (f )0 : s01 ! s02 such that s1
ps1
id
s1
6
-
f
s2
6
qs1
?
0
s1
-
-
id
0
( ) f
0
- ? 0
0
s2
s2
ps2
qs2
s1
-
s2
Proof Sketch. Left and right squares commute by de nitions of ps, qs , and . It is then possible to construct (f )0 by induction on the structure of the frNRC expression that de nes f such that the middle
2
square and thus the entire diagram commutes.
Now let f : s1 ! s2 be a function in frNRC , where s1 and s2 have arbitrary nesting depths. Proposition 4.2 implies that there is a function (f )0 : s01 ! s02 in frNRC such that qs2 (f )0 ps1 = f . Since s01 and s02 are both of height 1, by Theorem 3.4, we can assume that (f )0 has height 1. Then by Corollary 3.5, we conclude that (f )0 is eectively computable. Since qs and ps are also computable, we have the very desirable result below. Theorem 4.3 All functions expressible in frNRC are eectively computable. 2 7
The above \compilation procedure" showed that frNRC can be embedded in rst-order logic with polynomial constraints, modulo the encodings ps and qs (thus, frNRC is closed). The converse is also true. For example, a formula 9x:(x) can be expressed in frNRC as not (empty fr ffr 1 j x 2fr R; (x)g). So frNRC does not gain us extra expressive or computational power, compared to the usual constraint query languages. However, it gives a more natural data model and a more convenient query language, since it is no longer necessary to model our spatial databases as a set of at tables. Results about data complexity of frNRC can be obtained from results presented in Section 4 and from [8]. Consider the diagram introduced in Proposition 4.2. As f 0 is expressed in rst-order logic extended with polynomial constraints, it follows from [8] that its data complexity is in NC. Moreover, it is simple to show that encoding and decoding functions ps and qs are also in NC. Thus Proposition 4.4 frNRC has data complexity in NC. 2
References [1] S. Abiteboul and P. Kanellakis. Query languages for complex object databases. SIGACT News, 21(3):9{18, 1990. [2] M. Benedikt, G. Dong, L. Libkin, and L. Wong. Relational expressive power of constraint query languages. In Proc. of 15th ACM Symposium on Principles of Database Systems, pages 5{16, June 1996. [3] A. Brodsky and Y. Kornatzky. The yri language: querying constraint objects. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, 1995. [4] P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87{96, March 1994. [5] P. Buneman, S. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theoretical Computer Science, 149(1):3{48, September 1995. [6] S. Grumbach and J. Su. Finitely representable databases. In Proc. of 13th ACM Symposium on Principles of Database Systems, pages 289{300, May 1994. [7] S. Grumbach and J. Su. Dense-order constraint databases. In Proc. ACM Symposium on Principles of Database Systems, pages 66{77, 1995. [8] P. Kanellakis, G. Kuper, and P. Revesz. Constraint query languages. Journal of Computer and System Sciences, 51:25{52, 1995. [9] L. Libkin and L. Wong. On representation and querying incomplete information in databases with multisets. Information Processing Letters, 56:209{214, November 1995. [10] J. Paredaens, J. Van den Bussche, and D. Van Gucht. Towards a theory of spatial database queries. In Proceedings of 13th ACM Symposium on Principles of Database Systems, pages 279{288, May 1994. [11] J. Paredaens and D. Van Gucht. Converting nested relational algebra expressions into at algebra expressions. ACM Transaction on Database Systems, 17(1):65{93, March 1992. [12] P.Z. Revesz. Datalog queries of set constraint databases. In Proc. of the Int. Conf. on Database Systems, pages 424{438, 1995. [13] P. Wadler. Comprehending monads. Mathematical Structures in Computer Science, 2:461{493, 1992. [14] L. Wong. Normal forms and conservative extension properties for query languages over collection types. Journal of Computer and System Sciences, 52(3):495{505, 1996. L
C
8
A Some Relevant Proofs Proposition 5.2 For every function f : s1 ! s2 in frNRC , there is a function (f )0 : s01 ! s02 such that s1
ps1
id
s1
6
-
f
s2
6
qs1
?
0
s1
-
-
id
0
( ) f
0
-
- ? 0
0
s2
s2
ps2
qs2
s1
-
s2
Proof. We note that the left and right squares commute by de nitions of ps, qs, and . We now construct (f )0 by induction on the structure of the frNRC expression that de nes f and we argue that the middle square and thus the entire diagram commutes. To simplify notations, we omit the subscript s from ps and qs in the proof below. We also omit the argument for the more obvious cases. Case f (~x) = xi . Set (f )0 (O) = ffr xi j (x1 ; : : : ; xn ) 2fr Og. For this case, suppose q(O) = ~x. Thus O p(~x). Then (f )0 (O) p(xi ). Since q p = id, we have q((f )0 (O)) = xi . So the middle square commutes.
Case f (~x) = c. Set (f )0 (O) = ffr cg Case f (~x) = i e. Let g(~x) = e. Set (f )0 (O) = ffr xi j (x1 ; : : : ; xn ) 2fr (g)0 (O)g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((g)0 (O)) = e. Thus (g)0 (O) p(e). By de nition, p(i e) = ffr xi j (x1 ; : : : ; xn ) 2fr p(e)g. Then (f )0 (O) p(i e). Hence q((f )0 (O)) = i e. So the middle square commutes.
Case f (~x) = (e1 ; : : : ; en ). Let gi (~x) = ei . Set (f )0 (O) = ffr (x1 ; : : : ; xn ) j x1 2fr (g1 )0 (O), ..., xn 2fr (gn )0 (O)g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((gi )0 (O)) = ei . Thus (gi )0 (O) p(ei ). By de nition, p((e1 ; : : : ; en )) = ffr (x1 ; : : : ; xn ) j x1 2fr p(e1 ); : : : ; xn 2fr p(en )g. Then (f )0 (O) p((e1 ; : : : ; en )). Hence q((f )0 (O)) = (e1 ; : : : ; en ). So the middle square commutes. Case f (~x) = fg. Set (f )0 (O) = ffr (0; 0; ~0)g. Case f (~x) = feg. Let g(~x) = e. Set (f )0 (O) = ffr (1; 1; x) j x 2fr (g)0 (O)g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((g)0 (O)) = e. Thus (g)0 (O) p(e). By de nition, p(feg) = ffr (1; 1; x) j x 2fr p(e)g. Then (f )0 (O) p(feg). Hence q((f )0 (O)) = feg. So the middle square commutes. Case f (~x) = empty fr e. Let g(~x) = e. Set (f )0 (O) = ffr 1 j (0; ~0) 2fr (g)0 (O)g [fr ffr 0 j (1; x) 2fr (g)0 (O)g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((g)0 (O)) = e. Thus (g)0 (O) p(e). Thus e is empty i (g)0 (O) = ffr (0; ~0)g. Thus q((f )0 (O)) = empty fr e. So the middle square commutes. The following variations of the emptiness test are used in subsequent cases:
myempty(X ) = empty fr ffr 0 j 0 2fr (emptyfr )0 (X )g 9
myempty0(X ) = empty fr ffr 1 j (1; x) 2fr X g
Case f (~x) = e1 [ e2 . Let gi (~x) = ei . Set (f )0 (O) = if myempty((g1)0 (O)) then (if myempty((g2)0 (O)) then ffr~0g else (g2 )0 (O)) else (if myempty((g2)0 (O)) then (g1 )0 (O) else (g1 )0 (O) [fr A(O), where A(O) = ffr (1; i + k + 1; x) j (1; i; x) 2fr (g2 )0 (O), k = maxffr j j (1; j; y) 2fr (g1 )0 (O)g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((gi )0 (O)) = ei . There are four subcases. The subcases where either e1 or e2 is empty are trivial. So suppose e1 and e2 are both not empty. First note that A(O) (g2 )0 (O) and thus q(A(O)) = e2 . Next observe that for any (1; i; x) in (g1 )0 (O) and (1; j; y) in A(O), it is the case that i < j . Thus for any h so that one of the set X = ffr (1; i; x) j (1; i; x) 2fr (f )0 (O); i = hg, Y = ffr (1; i; x) j (1; i; x) 2fr (g1 )0 (O); i = hg, Z = ffr (1; i; x) j (1; i; x) 2fr A(O); i = hg is not empty, it is the case that X = Y or X = Z . Consequently, for any o in (f )0 (O), we have o in e1 [ e2 and vice versa. That is, q((f )0 (O)) = e1 [ e2 . So the middle square commutes.
Case f (~x) = Sfe1 j y 2 e2 g. Let g1 (~x; y) = e1 and g2(~x) = e2 . Set (f )0 (O) = if myempty((g2)0 (O)) S then ffr (0; 0; ~0)g else if myempty0(A(O)) then ffr (0; 0; ~0)g else A(O), where A(O) = ffr B (O; i) j (1; i; y) 2fr (g2 )0 (O)g, where B (O; i) = ffr (1; k i + h + 1; w) j (1; h; w) 2fr C (O; i); k = maxffr h j (1; j; u) 2fr (g2 )0 (O); j < i; (1; h; v) 2fr C (O; j )gg, where C (O; i) = (g1 )0 (ffr (z; u) j z 2fr O; (1; j; u) 2fr (g2 )0 (O); i = j g). This is the most complex case. Suppose q(O) = p(~x). Thus O p(~x). By hypothesis, q((g2 )0 (O)) = e2 . Thus (g2 )0 (O) p(e2 ). Now there are two subcases. For the rst subcase, suppose e2 is empty. Then myempty((g2)0 (O)) is true. Then q((f )0 (O)) = ffr g = f (~x). So the middle square commutes
in this subcase. For the second subcase, we assume that e2 is not empty. Then myempty((g2)0 (O)) is false. By the hypothesis on g2 , we know that for each i such that the set yi = ffr u j (1; j; u) 2fr (g2 )0 (O); i = j g is not empty, we have q(yi ) is an element oi of e2 . Moreover, there is one such i for each element of e2 . Then by hypothesis on g1 , we have q(C (O; i)) = g1 (~x; oi ) for each such i. It is also obvious that q(B (O; i)) = g1 (~x; oi ) for each such i, provided g1 (~x; oi ) is not empty. Note that if g1 (~x; oi ) is empty, then B (O; i) is also empty, as opposed to being a singleton zero tuple. However, B (O; i) has an advantage over C (O; i) because the numbers it uses to identify the elements of g1 (~x; oi ) are distinct from those of B (O; j ) whenever i 6= j . To see this, suppose k = maxffr h j (1; j; u) 2fr (g2 )0 (O); j < i; (1; h; v) 2fr C (O; j )g. Then k is the maximum identi er that is used to identify elements in g1 (~x; oj ), for j < i. This k exists because g1 (~x; oj ) is nite for each oj in e2 . Then k i + 1 is greater than the cardinality of the union of g1 (~x; oj ) for j < i. We now have two subsubcases. For the rst subsubcase, suppose g1 (~x; oi ) is empty for each oi in e2 . Then B (O; i) is empty for all such oi . Then A(O) is also empty. Then myempty0(A(O)) is true. Then q((f )0 (O)) = ffr g = f (~x). So the middle square commutes in this subsubcase. For the second subsubcase, we assume that there are o1 , ..., on in e2 such that g1 (~x; oi ) is not empty and f (~x) = g1(~x; o1 ) [fr [fr g1(~x; on ). Then (f )0 (O) = A(O) = B (O; o1 ) [fr [fr B (O; on ). Then q((f )0 (O)) = f (~x). This nishes the nal subsubcase. 10
Case f (~x) = (e1 = e2 ). Let gi (~x) = ei . Set (f )0 (O) = if empty fr ffr 1 j x 2fr (g1 )0 (O); y 2fr (g2 )0 (O); x = yg then ffr false g else ffr true g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((gi )0 (O)) = ei . So (gi )0 (O) p(ei ). Since ei : R, we have (gi )0 (O) = ffr ei g. Then it is obvious that q((f )0 (O)) = (e1 = e2). So the middle square commutes.
Case f (~x) = empty e. Let g(~x) = e. Set (f )0 (O) = ffr empty fr ffr 1 j (1; i; x) 2fr (g)0 (O)gg. Case f (~x) = if e1 then e2 else e3 . Let gi (~x) = ei . Set (f )0 (O) = if empty fr ffr 1 j 0 2fr (g1 )0 (O)g
then (g2 )0 (O) else (g3 )0 (O). For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((g1 )0 (O)) = e1 . Thus (g1 )0 (O) p(e1 ). Since e1 : B , we have (g2 )0 (O) = ffr 1g if e1 is true and (g2 )0 (O) = ffr 0g if e1 is false. Then empty fr ffr 1 j 0 2fr (g1 )0 (O)g is true i e1 is true. Then it follows by hypothesis on e2 and e3 that q((f )0 (O)) = if e1 then e2 else e3 . So the middle square commutes.
Case f (~x) = ffr g. Set (f )0 (O) = ffr~0g. Case f (~x) = ffr eg. Let g(~x) = e. Set (f )0 (O) = ffr (1; x) j x 2fr (g)0 (O)g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((g)0 (O)) = e. Thus (g)0 (O) p(e). Since e : R R, we know that (g)0 (O) = ffr eg. Thus q((f )0 (O)) = ffr eg. So the middle square commutes.
Case f (~x) = e1 [fr e2 . Let gi (~x) = ei . Set (f )0 (O) = if myempty((g1)0 (O)) then (if myempty ((g2 )0 (O)) then ffr~0g else (g2 )0 (O)) else (if myempty((g2)0 (O)) then (g1 )0 (O) else (g1 )0 (O) [fr (g2 )0 (O).
Case f (~x) = Sffr e1 j y 2fr e2 g. Let g1 (~x; y) = e1 and g2 (~x) = e2. Set (f )0 (O) = if myempty ((g2 )0 (O)) then ffr~0g else if myempty0(A(O)) then ffr (0; 0; ~0)gelse A(O), S where A(O) = ffr if myempty (B (O; y)) then ffr g else B (O; y) j (1; y) 2fr (g2 )0 (O)g, where B (O; y) = (g1 )0 (ffr (z; y) j z 2fr Og). For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((g2 )0 (O)) = e2 . Thus (g20 )(O) p(e2 ). Now there are two subcases. For the rst subcase, suppose e2 is empty. Then S myempty((g2)0 (O)) is true. Then q((f )0 (O)) = ffr g = ffr e1 j y 2fr e2 g. So the middle square commutes in this subcase. For the second subcase, we assume that e2 is not empty. Then myempty((g2)0 (O)) is false and (g2 )0 (O) = ffr (1; x) j x 2fr e2 g is forced. It is clear that q(ffr (z; y) j z 2fr Og) = (~x; y) for each y in e2 . By hypothesis, q(B (O; y)) = g1(~x; y) for each y in e2 . We now have two subsubcases. For the rst subsubcase, suppose g1(~x; y) is empty for each y in e2. Then myempty(B (O; y)) is true for each y in e2 . Then myempty0(A(O)) is true. Then q((f )0 (O)) = ffr g = f (~x). So the middle square commutes in this subsubcase. For the second subsubcase, we assume that there are y1 , ..., yn in e2 such that g1(~x; yi ) is not empty and f (~x) = g1 (~x; y1 ) [fr [fr g1 (~x; yn ). Then (f )0 (O) = A(O) = B (O; y1 ) [fr [fr B (O; yn ). Then q((f )0 (O)) = f (~x). This nishes the nal subsubcase. So the middle square commutes. 11
Case f (~x) = e1 e2, where is either +, ?, , or . Let gi (~x) = ei . Set (f )0 (O) = ffr x y j x 2fr (g1 )0 (O); y 2fr (g2 )0 (O)g. For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((gi )0 (O)) = ei . Thus (gi )0 (O) p(ei ). Since ei : R, we must have (gi )0 (O) = ffr ei g. Then q((f )0 (O)) = q(ffr e1 e2 g) = e1 e2 . So the middle square commutes.
Case f (~x) = R. Set (f )0 (O) = ffr (1; x) j x 2fr Rg.
Case f (~x) = Sffr e1 j y 2 e2 g. Let g1(~x; y) = e1 and g2 (~x) = e2. Set (f )0 (O) = if myempty((g2)0 S (O)) then ffr (0; ~0)g else if myempty0(A(O)) then ffr (0; ~0)g else A(O), where A(O) = ffr if myempty (B (O; i)) then ffr g else B (O; i) j (1; i; y) 2fr (g2 )0 (O)g, where B (O; i) = (g1 )0 (ffr (z; u) j z 2fr O; (1; j; u) 2fr (g2 )0 (O); i = j g). For this case, suppose q(O) = ~x. Thus O p(~x). By hypothesis, q((g2 )0 (O)) = e2 . Thus (g2 )0 (O) p(e2 ). We have two subcases. The rst is when e2 is empty. Then (g2 )0 (O) is a singleton zero tuple. Then myempty((g2)0 (O)) is true. Then (f )0 (O) = ffr (0; ~0)g. Thus q((f )0 (O)) = ffr g = f (~x). So
the middle square commutes in this subcase. For the second subcase, we assume that e2 is not empty. Then myempty((g2)0 (O)) is false. By the hypothesis on g2, we know that for each i such that the set yi = ffr u j (1; j; u) 2fr (g2 )0 (O); i = j g is not empty, we have q(yi ) is an element oi of e2 . Moreover, there is one such i for each element of e2 . Then by hypothesis on g1, we have q(B (O; i)) = g1 (~x; oi ) for each such i. We now have two subsubcases. For the rst subsubcase, suppose g1 (~x; oi ) is empty for each such i. Then myempty(B (O; i)) is true for each such i. Then myempty0(A(O)) is true. Then q((f )0 (O)) = ffr g = f (~x). So the middle square commutes in this subsubcase. For the second subsubcase, we assume that there are o1 , ..., on in e2 such that g1 (~x; oi ) is not empty and f (~x) = g1(~x; o1 ) [fr [fr g1(~x; on ). Then (f )0 (O) = A(O) = B (O; o1 ) [fr [fr B (O; on ). Then q((f )0 (O)) = f (~x). This nishes the nal subsubcase. So the middle square commutes. 2
12