extended order-generic queries - CiteSeerX

Report 18 Downloads 52 Views
EXTENDED ORDER-GENERIC QUERIES OLEG V. BELEGRADEK, ALEXEI P. STOLBOUSHKIN, AND MICHAEL A. TAITSLIN Abstract. We consider relational databases organized over an ordered domain with some additional relations|a typical example is the ordered domain of rational numbers together with the operation of addition. In the focus of our study are the rst-order (FO) queries that are invariant under order-preserving \permutations"|such queries are called ordergeneric. It has recently been discovered that for some domains ordergeneric FO queries fail to express more than pure order queries. For example, every order-generic FO query over rational numbers with + can be rewritten without +. For some other domains, however, this is not the case. We provide very general conditions on the FO theory of the domain that ensure the collapse of order-generic extended FO queries to pure order queries over this domain: the Pseudo- nite Homogeneity Property and a stronger Isolation Property. We further distinguish one broad class of domains satisfying the Isolation Property, the so-called quasi-o minimal domains. This class includes all the o -minimal domains, but also the ordered group of integer numbers and the ordered semigroup of natural numbers, and some other domains. An important di erence of this paper from the recent series of related papers is that we generalize all the notions to the case of nitely representable database states|as opposed to nite states|and develop a general lifting technique that, essentially, allows us to extend any result of the kind we are interested in, from nite to nitely-representable states. We show, however, that these results cannot be transfered to arbitrary in nite states.

Date : April 5, 1998. 1991 Mathematics Subject Classi cation. Primary 68P15; Secondary 03C40, 03C52. Key words and phrases. Pseudo- nite homogeneity property, isolation property, quasio-minimal structure, local genericity, nite database state, nitely representable database state. A part of this work was done while Oleg Belegradek was visiting the Fields Institute for Research in Mathematical Sciences in Toronto (January{March, 1997). This work of Alexei P. Stolboushkin was partially supported by NSF Grant CCR 9403809. A part of this research was carried out while M. A. Taitslin was visiting UCLA (partially supported by NSF Grant CCR 9403809), DIMACS and Princeton (partially supported by a grant from Princeton University). The work of the author was partially supported by the Russian Foundation of Basic Research (project code: 96-01-00086). M. A. Taitslin is the corresponding author. The address for correspondence: 62 Mozhaiskogo St., apt.265, Tver, Russia 170043. 1

Contents

1. Introduction 1.1. In nite domains 1.2. Finitely representable relations 1.3. Ordered domains and generic queries 1.4. Collapse results 2. Preliminaries 3. Impossibility of translation over arbitrary states 4. Canonical representation for nitely representable relations 5. Collapse of extended locally generic queries 6. Open questions 6.1. Problem 1 6.2. Problem 2 6.3. Problem 3 6.4. Problem 4 6.5. Problem 5 6.6. Comments Acknowledgments References

2 2 3 3 5 7 9 14 22 39 39 39 39 40 40 40 41 41

1. Introduction 1.1. In nite domains. In the relational model of databases introduced by E.F. Codd a database state is thought of as a nite collection of relations between elements. For example, a father-son relation can be represented in the form of one binary relation (or a two-column table). The names of the relations and their arities are xed and are called a database scheme. Particular information stored in the relations of a given scheme is called a database state. As we acquire more and more information about fathers and sons, the database states change, but the scheme (one binary relation) does not. Database relations (tables) are always going to be nite. However, it is often convenient to assume that there is an in nite domain |for example, the integer or rational numbers or the strings|such that the data elements are chosen from this domain. Functions and relations de ned over the entire domain, like < and +, may also be used in querying, for example, if the language of rst-order logic FO is used as the query language, its formulas may use database relations as well as the domain relations, while variables range over the entire domain. A study of databases over domains equipped with an additional structure (`constraint databases') and the expressive power of the corresponding query languages (`constraint query languages') was started by Kannelakis et al. [KKR90, KKR95]. 2

1.2. Finitely representable relations. The database relations are nite, but answers yielded by relational queries may or may not be nite. This makes the traditional relational model not closed, in the sense that the output of queries is of a di erent nature than the input. Kanellakis et al. [KKR90, KKR95] considered the real ordered elds and groups as domains and observed that, since the rst-order theories of these admit elimination of quanti ers, the answers to rst-order queries can be represented as quanti er-free rst-order formulas, and then, if we allow database relations to be arbitrary relations representable by quanti er-free rst-order formulas to begin with, the so modi ed relational model becomes closed in the above sense. The quanti er-free de nable relations are called nitely representable (for short, f.r.); the term is due to [GS94]. Finitely representable databases are a logical choice, because nitely representable relations appear as results of queries dealing with nite relations anyway, and it is also a natural choice in many applications, say, in geographical databases (cf. [KKR90, KKR95]) or spatial databases ([PVV94]). 1.3. Ordered domains and generic queries. The original notion of generic query [CH80]1 referred to the =-generic queries over nite database states, that is, the queries (over nite states) which are preserved under arbitrary permutations of the domain. Some practically interesting queries, say, graph properties, are indeed =-generic. The expressive power of the pure FO with respect to generic queries is, however, severely limited|a classical example is the inexpressibility of the parity query asserting that the cardinality of a nite relation in the database scheme is even. One of the ways to try to enhance the expressive power of the query language is to allow domain functions/relations, or givens to be used in the queries, as we mentioned above. The simplest example is the relation < of linear order. Throwing in such givens obviously increases the expressive power of FO, but what is often not obvious is whether any new generic queries become expressible. Yu. Gurevich [Gur90] showed that there are =-generic queries that are FO expressible over nite states with bi and v < bi+1 such that in [bi; u] and [v; bi+1] there are no P -separable pairs of elements. If u > bi+1 or v 6 bi, we are done. If bi < v 6 u < bi+1 , the pairs bi ; v and v; bi+1 are P -inseparable, and hence the pair bi; bi+1 is P -inseparable, too. In the case bi < u < v < bi+1 the pair u; v is P -inseparable, by Lemma 4.2; since the pairs bi ; u and v; bi+1 are P -inseparable, we have the P -inseparability of the pair bi ; bi+1. Analogous arguments show the P -inseparability of each of the pairs x; b1 and bn ; y . A contradiction. Denote by Inv(P ) the least @P -invariant relation containing P . Clearly, Inv(P ) consists of all k-tuples b for which there is a 2 P such that ai and bi are positioned the same way with respect to @P , for all i. By Lemma 4.1, Inv(P ) is a simple relation over @P . Lemma 4.5. P = Inv(P ) \ D. 18

Proof. We need to prove that there are no a 2 P and b 2 D n P such that ai and bi are positioned the same way with respect to @P , for all i. Suppose there is a counterexample pair a; b. Let i1 2 J1 ; : : :; is 2 Js ; we can assume that xi1 <    < xis for x 2 D. As a 6= b, there is m such that aim 6= bim , but ail = bil for 1 6 l < m. Choose a counterexample pair a; b with the largest possible m. As aim and bim are positioned the same way with respect to @P , in the closed interval between them, there are no points in @P . So, by Lemma 4.4, aim and bim are P -inseparable. Suppose aim < bim . Let c = (c1; : : :; ck ) be the result of replacement in the tuple b the elements bj with aim , for all j 2 Jm . Clearly, c 2 D. Due to the P -inseparability of aim and bim , we have c 2= P . The elements ai and ci are positioned the same way with respect to @P , for all i; so a; c is a counterexample pair. Since ail = cil for 1 6 l 6 m, we have a contradiction with the maximality of m. Now suppose aim > bim . Let c = (c1; : : :; ck ) be the result of replacement in the tuple a the elements aj with bim , for all j 2 Jm . Clearly, c 2 D. Due to the P -inseparability of aim and bim , we have c 2 P . The elements ci and bi are positioned the same way with respect to @P , for all i, so c; b is a counterexample pair. Since cil = bil for 1 6 l 6 m, we have a contradiction with the maximality of m.

Any k-ary relation P is the disjoint union of all P \ D, where D runs over the set of Ek -classes. Obviously, if P is nitely representable, every such P \ D is equal to S \ D, for some simple relation S . So Lemma 4.5 implies Corollary 4.6. Any nitely represented k-ary relation P is the disjoint union of all Inv(P \ D) \ D, where D runs over the set of equivalence classes of Ek . Let P be a nitely representable k-ary relation. Denote by @P the union of all @ (P \ D). As Inv(P \ D) is de ned over @ (P \ D), we have Corollary 4.7. Any nitely represented relation P is de ned over @P . Thus, with every nitely represented relation P we have associated a certain canonical nite set of parameters @P , over which the relation is de ned; the relation P is, in a sense, reduced to a nite family fInv(P \ D)g of simple relations over the set @P . Moreover, the set @P and the family fInv(P \ D)g can be found uniformly in P (by means of certain FO queries), and P can be uniformly recovered from the set @P and the family fInv(P \ D)g (by means of a certain FO query). Now we are going to nd a canonical representation for simple relations. Let S be a simple k-ary relation over a nite set B . For a non-empty B , we call a k-cell I1      Ik B -minimal if every Ij can be de ned by a formula of one of the following forms: x = bi; x < b1 ; bn < x; bi < x < bi+1 ; 19

where B = fb1; : : :; bn g and b1 <    < bn . The only ;-minimal k-sell is, by de nition, the cell U k . Obviously, the set of B -minimal k-cells is nite. Clearly, the B -minimal k-cells are pairwise disjoint. Moreover, if a k-cell C 0 is de ned over B and a k-cell C is B-minimal then C  C 0 provided C \ C 0 6= ;. It follows that S can be decomposed into a disjoint union of B-minimal k-cells, namely, into the disjoint union of all B-minimal k-cells which are contained in S . Note that some of the B -minimal k-cells can be empty. We show how to encode the simple relation S by a nite family of relations on the nite set B . Every 1-cell over B is de ned by a formula of one of the following forms: (0) x = x;

(1) x = b;

(2) x < b;

(3) b < x;

(4) b < x < b0;

where b; b0 2 B . For i < 5, denote by ni the number of constants from B in the formula (i); so n0 = 0, n1 = n2 = n3 = 1, and n4 = 2. For any  = (1 ; : : :; k ) 2 5k , we are going to associate with S and B a relation S on B of arity n = n1 +    + nk . Let an n -tuple of variables y = (y1; : : :; yn ) is the concatenation of tuples y1 ; : : :; yk , where the length of yi is ni . For i = 1; : : :; k, denote by i (xi; yi) the formula  xi = xi if ni = 0,  xi = v if ni = 1 and yi is v,  xi < v if ni = 2 and yi is v,  xi > v if ni = 3 and yi is v,  u < xi < v if ni = 4 and yi is a pair (u; v). This formula just says that xi belongs to the 1-cell of type (i) de ned by parameters yi . Denote by  (x; y) the conjunction of all the i 's; this formula says that x belongs to the k-cell C (y) = I1      Ik , where Ii is a k-cell of type i de ned by parameters yi . For an n -tuple b in U , we de ne S (b) to be true if b is in B , and the k-cell C (b) is B-minimal and is contained in S . Clearly, the B-minimality of the k-cell means exactly that, for i 6= 1, the interval i (U; bi) has no common points with B . It is easy to see that the relations S can be uniformly obtained from S and B by means of certain FO queries  . As S is the union of all k-cells C (b) for which S (b) holds ( 2 5k , b 2 B n ), one can uniformly recover S from B and the family fS g 25k by means of a certain FO query  . Namely,  says that, for one of the  's, there is an n -tuple b in B such that both S (b) and  (x; b) hold. Later we will need the following observation concerning the de nition of  .

Observation. Suppose A = fa1; : : :; amg  U with a1 <    < am , and R are arbitrary nite relations of arity n on A. Then it is easy to write down a quanti er-free formula (x; z1; : : :; zm ) in the pure order language 20

depending only on the isomorphism type of the nite structure A = (A; R ; jLj, every two elementarily equivalent saturated structures of power  are isomorphic. jLj denotes the cardinality of jLj. A structure M of power  is called special if M is the union of a family fM :  is a cardinal < g, where M  M  M for  <  < , and each M is + -saturated. Here + , as usual, denotes the least cardinal greater than . Every two elementarily equivalent special structures of the same power are isomorphic. For any in nite L-structure M and any cardinal  = , there exists a special N  M of power . Here  > jLj; jM j with P  @ is de ned to be < 2@ , and jM j is the cardinality of M . It is easy to construct cardinals  with  =  of arbitrarily large co nality. Theorem 5.1. For any universe U and any Boolean extended -query  the following conditions are equivalent: 1. there is a restricted -query which is equivalent to  over nite database states over U 2.  is generic for pseudo- nite states over V , for all V  U 3. for some uncountable power  with  =  , the query  is generic over pseudo- nite states over a special model V  U with jV j =  23

The following proof works not only for L0 = f jLj + @0, and cf() > . Let V  U be a special model of power . Let I be an in nite L-indiscernible sequence in V . Suppose -state (r; r0) over I is pseudo- nite in V and r can be transformed to r0 by a partial L0-isomorphism g in V , whose domain is A, the active domain of r, and whose range is A0 , the active domain of r0. We need to show that () holds in (V; r) i (0) holds in (V; r0). We may assume that (V; A; A0; g ) is -saturated. (Indeed, consider a special model (V0; r0; r00 ; g0; I0) of power  elementarily equivalent to (V; r; r0; g; I ). It suces to prove the claim 27

for (V0; r0; r00 ; g0; I0); but it is -saturated as cf() > .) Using a FrasseEhrenfeucht game, we will show that g is an L()-elementary map from (V; r) to (V; r0). Due to the L-indiscernibility of I , the map g is an L-elementary map, and, in particular, a partial L()-isomorphism. Therefore, due to the Pseudo- nite Homogeneity Property, to complete the proof of the theorem, using Theorem 5.2, it suces to prove the following lemmas: Lemma 5.5. The active domain of any pseudo- nite database state is a pseudo- nite set. Lemma 5.6. Let A be a pseudo- nite set in V; and a 2 V . Then A [ fag is a pseudo- nite set. Lemma 5.7. If h : C ! D is a partial isomorphism in V , and M = (V; C; D; h) is -saturated, then M 0 = (V; C [ fcg; D [ fdg; h [ f(c; d)g) is -saturated, for any c; d 2 V . Proof of Lemma 5.5. Consider the database scheme  = fP g, where P is a unary relation name. For any L( )-sentence , there is an L()-sentence  such that (V; s) j=  i (V; AD(s)) j= , for any -state s. Suppose a state s is pseudo- nite in V , and 2 F (V;  ). Since the active domain of any nite state is nite, we have (V; r) j=  for all nite -states r. So (V; s) j= , and hence (V; AD(s)) j= . Proof of Lemma 5.6. Consider the database scheme  = fP g, where P is a unary relation name. Let  2 F (V;  ). Let  (x) be the result of replacement of every occurrence of P (y ) in  with P (y ) _ y = x, where x is a new variable. Then 8x (x) belongs to F (V;  ) and so holds in (V; A). Hence  holds in (V; A [ fag). Thus, A [ fag is pseudo- nite. Proof of Lemma 5.7. M 0 is de nable in the -saturated structure M with parameters c; d. Let A0 be A, B0 be B , and g0 be g . It suces to prove that Duplicator has a winning strategy in the game. The desired winning strategy is to ensure that, after each round i of the game, Ai and Bi are pseudo- nite sets in V , gi : Ai ! Bi is an elementary map in V with -saturated (V; Ai; Bi ; gi), g  gi, and gi is a partial L()-isomorphism from (V; r) to (V; r0). Suppose Spoiler starts a new round i + 1 and chooses an element ai+1 2 V (or bi+1 2 V ). By the Pseudo- nite Homogeneity Property, there is bi+1 2 V (correspondingly, ai+1 2 V ) such that gi+1 = gi [ f(ai+1 ; bi+1)g is an elementary map in V . Then Duplicator chooses bi+1 (correspondingly, ai+1 ). Let Ai+1 = Ai [ fai+1 g and Bi+1 = Bi [ fbi+1g. By the de nition of g, gi+1 is a partial L()-isomorphism from (V; r) to (V; r0). By Lemma 5.6, Ai+1 and Bi+1 are pseudo- nite sets. By Lemma 5.7, (V; Ai+1; Bi+1; gi+1 ) is -saturated. The collapse result is proved. 28

Now we introduce a certain property of complete theories which is strictly stronger than the Pseudo- nite Homogeneity Property, and so ensures the collapse result, too. We say that a complete theory T has the Isolation Property, if there is a cardinal  such that, for any pseudo- nite set A and any element a in a model of T , there is A0  A with jA0 j <  such that tp(a=A0 ) isolates tp(a=A). Theorem 5.8. The Isolation Property implies the Pseudo- nite Homogeneity Property. Proof. We show that  witnessing that T has the Isolation Property witnesses that T has the Pseudo- nite Homogeneity Property. Let V be a -saturated model of T , and A; B pseudo- nite sets in V , and a 2 V . Let h : A ! B be an L-elementary map in V . We show that there is b 2 B such that h [ f(a; b)g is an L-elementary map in V . Choose A0  A with jA0j <  such that p0 = tp(a=A0) isolates p = tp(a=A). Since the map h is elementary, h(p) is a type over B , and h(p0) isolates h(p). (For a set q (x) of formulas over A we denote by h(q ) the set f(x; h(c)) : (x; c) 2 q g.) As V is -saturated, there is b 2 V realizing h(p0 ) and hence h(p). So h [ f(a; b)g is an L-elementary map. We will show that any theory with the Isolation Property is unstable. To prove the unstability of a theory, it suces to show that there exists an in nite indiscernible sequence in one of its model which is not an indiscernible set (see [She90], Theorem II.2.13). Theorem 5.9. Any theory with the Isolation Property is unstable. Proof. First we show that, for any complete L-theory T with in nite models, there is an in nite indiscernible sequence in a model of T whose members form a pseudo- nite set. Consider the in nite set ? of rst-order sentences of the signature L [ f y , for every linear combination y of x1; : : :; xn. For a new unary relation name Q, let F (T; Q) be the set of all L(Q)-sentences which hold in all (M; X ), where X is a nite set in a model M of Tds . Consider the signature L0 which is obtained by adjoining to L(Q) a set of new constant symbols fc; ci : i < g. Consider the set ? of L0 -sentences, which is the union of F (T; Q), fQ(ci) : i < g, and all pn (c; ci1 ; : : :; cin ), qn (c; ci1; : : :; cin ), for i1 < : : :in < . Suppose (M; A; a; ai)i c, for any linear combination of the ai 's. In particular, jAj > . We will show that tp(a=A) is not isolated by tp(a=A0), for any proper subset A0 of A. Let r(x) be the union of all pn (x; b1; : : :; bn) and qn (x; b1; : : :; bn), where n < ! and b1; : : :; bn are distinct elements of A0. Clearly, r(x)  t(a=A0 ); moreover, r(x) isolates t(a=A0 ) as Tds admits quanti er elimination. We have :P (x ? a0) 2 tp(a=A), for any a0 2 A n A0 . Therefore it suces to prove that r(x) [ fP (x ? a0 )g is consistent. Since A is pseudo- nite, there d 2 M with d > jaj, for every a 2 A. As M is saturated over d, there is b 2 M with b > d for all 2 F . Since P + a0 is dense, there is e 2 P + a0 with e > b. Then e realizes r(x). Indeed, let b1; : : :; bn 2 A0. Then pn (e; b1; : : :; bn ) holds because e 2 P + a0 and pn(a0; b1; : : :; bn) holds. We have 1 b1 +    + n bn 6 (j 1j +    + j nj)d < b < e; for any i 2 F ; therefore qn (e; a1; : : :; an) holds. So, to complete the proof, it suces to show that ? is nitely satis able. To prove the latter, we show that, for every n < ! , the set ?n = F (T; Q) [ fQ(ci) : i < ng [ pn (c; c0; : : :; cn?1) [ qn (c; c0; : : :; cn?1) has a model. Consider an @0-saturated model (V; V0) of Tds . Due to the @0-saturation of (V; V0), we can choose c0; : : :; cn?1 which are linearly independent over V0 . It suces to show that there is c such that pn (c; c0; : : :; cn?1 ) and qn (c; c0; : : :; cn?1 ) hold because then we can take fc0; : : :; cn?1g as Q. 38

Let d be the greatest element among jc0j; : : :; jcn?1j: Due to the saturation of (V; V0) over d, there is b 2 V with b > d for all 2 F . Clearly, for every c, if c > b then qn (x; c0; : : :; cn?1) holds. For a subset S of F , consider the set of formulas f:P (x ? (1c1 +    + ncn)) : 1; : : :; n 2 S g [ fx > bg: Since V0 is of in nite index and dense in V , the set is realized in (V; V0), for every nite S . Since (V; V0) is saturated over b, there is c 2 V with c > b such that pn (c; c0; : : :; cn?1) holds, and we are done. The following picture presents our collapse results.

Tds

Tdt

Divisible ordered Abelian groups o -minimality Ordered semigroup of natural numbers Ordered group of integer numbers Quasi{o -minimality Isolation Pseudo- nite Homogeneity

6. Open questions 6.1. Problem 1. Further work needs to be done for integer numbers. For instance, the authors are under impression that an e ective translation algorithm for locally generic queries over (Z;