Incremental Maintenance of Nested Relational Views - Semantic Scholar

Report 3 Downloads 88 Views
Incremental Maintenance of Nested Relational Views Jixue Liu

Millist Vincent

Mukesh Mohania

School of Computer and Information Science The University of South Australia Email: fj.liu, vincent, [email protected]

Abstract Incremental view maintenance techniques are required for many new types of data models that are being increasingly used in industry. One of these models is the nested relational model that is used in the modelling complex objects in databases. In this paper we derive a group of expressions for incrementally evaluating query expressions in the nested relational model. We also present an algorithm to propagate base relation updates to a materialized view when the view is de ned as a complex query. Keywords: view maintenance, data warehousing, nested databases, incremental computation.

1. Introduction Ecient maintenance of materialized views is important in improving the performances of database systems, especially of data warehouses. To maintain materialized views, one has a choice between recomputing the views from scratch or maintaining the views incrementally. The incremental method is generally considered to be a less expensive [10, 3, 5] since the size of an update to the source data is generally small in relation to the size of the source data. To maintain a view incrementally, one computes the new view using the updates to the source data and the old view. For example, let the view V be de ned in at relational model (using set semantics) as V = R1 ./ R2 . For an insertion R1 to R1 , the incremental technique calculates the change to V as V = R1 ./ R2 and computes the new view, V new , by V new = V old [ V (where V old equals R1 ./ R2 ) [10, 5]. This expression is called an incremental propagation expression (or incremental expression (IE) for short) for the Join operator. Incremental expressions for updating materialized views depend on the data model and query language. Up to now, incremental equations have been derived for the models of at relations [10], bags [3], and tem-

porary [16]. Incremental equations for the nested relational model, on the other hand, have not been studied. The nested relational model is important because of its usage in modelling complex objects, a feature that has been incorporated in several commercial database systems such as Oracle8 and Illustra [13]. The nested model has also been used in data warehouses to model complex semantics [2], where incremental view maintenance has critical impact on system performances [15]. Further, the nested relational model is an important subclass of the object- relational model; a model that has been predicted to become the industry standard within the next few years [13]. Based on these necessities, we derive IEs and develop view maintenance algorithms for the nested relational model in this paper. Several di erent nested relational models have been proposed, depending on whether null values are permitted [8], whether empty sets are permitted [1] and what data manipulation operators are required [11, 12]. The model we use in this paper is the one proposed by [1] and called the Verso Model. The reason for adopting this model is because of its exibility in supporting empty sets, the assumption that relations are in partitioned normal form (which has clearer semantics than general nested relations), and its ability to allow partial updates. However, we also point out that many of the principles discussed in this paper are applicable to other nested data models. There are four main contributions of this paper. Firstly, we derive incremental expressions for the data manipulation operators in the Verso model. Interestingly, these expressions di er signi cantly from those derived for the at relational model [10]. Secondly, we propose an algorithm to propagate base relation updates to a materialized view when the view is de ned as a complex nested relational algebra expression. Thirdly, we extend the join operator in the Verso model to join two relations with more general semantics and derive incremental equations for it. Lastly, we investigate the simpli cation of the IEs when insertions sat-

isfy the strict disjointedness property (the key values of the inserted tuples are disjoint from the key values of the tuples in the relation)and deletions satisfy the containment property (the deleted tuples are a subset of the relation). The rest of this paper is organized as follows. Section 2 surveys the related work in the area. Section 3 gives the notation used in the paper. Section 4 presents Verso operators and extends the de nition of join operator given in [1]. We derive incremental equations in Section 5 and propose our algorithm for maintaining complex views in Section 6. In Section 7, we demonstrate that the IEs simplify under the assumption that the updates satisfy the strict containment or disjointedness property. Section 8 contains some concluding comments.

2. Related work

union, di erence, projection, selection, and join, while the operators for the time-instant operator contain selection and join. Their temporal views are derived from non-temporal relations. In the nested relation model, [7] proposed incremental calculation of nested relations using the counting algorithm of [5]. The paper discussed referenced complex objects in nested relations. In the model, if values for an attribute are sets, the names of the sets are used in the relation instead sets themselves. The sets are modelled in other stand-alone relations. The paper used extraction functions to eliminate the relation names that are variables of other relations. The paper also uses a structure named nested descriptors in log les to record updates that happened to all relations over the same schema. However, the issue of developing incremental equations for nested operators was not addressed in this paper.

The problem of developing incremental equations has been addressed by several investigators for di erent data models. In the at relational model, [10] derived incremental equations for the ve basic operators of the relational algebra. The paper assumed that the updates to base relations are disjoint from the base relations if they are insertions, and are contained in the base relations if they are deletions. They presented a complete set of propagation rules and an incremental computation algorithm to obtain the change to a view de ned as an arbitrary relational algebra expression. They also claimed that their incremental algorithm had the property that any tuples that it computed to be inserted into the view were disjoint from the view and any tuples to be deleted were currently in the view. However, this claim was recently shown in [4] to be incorrect and a new version of the algorithm was developed and proven to possess the desired properties. For the relational model with duplicates (bags), [3] derived a set of incremental propagation expressions for eight bag operators: selection, projection, additive union, minus, minimum intersection, maximum union, duplicate elimination, and Cartesian product. The incremental computation of aggregate functions was also addressed in the paper. Besides the study of incremental equations, [5] proposed counting algorithms which use duplicate semantics to accomplish incremental computations of relational queries and datalog queries. In the temporal data model, [16] presented two groups of incremental expressions depending on whether the time-instant reducible operator or the time-interval operator is used. The operators in incremental equations for the time-interval operator include

Let U denote a universal set of atomic attribute names. Each atomic attribute Ai 2 U has a domain denoted Dom(Ai ). A nested relational schema has the form R = A1 :::Am (R1 ):::(Rn ), where m  1, n  0, all Ai 2 U and all Rj are other nested relational schemas, called structured attributes of R. An attribute is said at level l if there are l layers of brackets outside it. For example, A1 is at level 0 and R1 is at level 1. The set of atomic attributes at level 0 is called the root of a schema. The domain of schema R is given by Dom(R) = Dom(A1 ):::Dom(Am )P (Dom(R1 )) :::  P (Dom(Rn )), where P (Dom(Rj )) is the power set of Dom(Rj ). A nested relation r on R is a set of elements (tuples) from Dom(R). Suppose x is a tuple in r. Then the atomic value of x for Ai and the structured value (a subrelation) for (Rj ) are denoted x[Ai ] and x[(Rj )] respectively. We de ne r[Ai ] by r[Ai ] = fx[Ai ]jx 2 rg. We take the assumption from verso algebra that for all i = 1; :::; m, x[Ai ] is not null and any x[(Rj )] can be the empty set . Example 3.1 This example shows a nested relation and its schema. A student (Name) can study many subjects (Subj ) which started in year Y ear. For each subject and year, there are several Test type and each test type has a corresponding Mark. A student can also have many addresses (Addr). These semantics can be modeled in a nested schema Stud = Name(Subj Y ear(Test type Mark))(Addr), where Mark is at level 2 and Addr and Subj are at level 1. Table 1 illustrates a nested relation r on schema Stud. The student Jack has two addresses and has taken two subjects which started in di erent years. Every subject

3. Notation

has two marks. 2 Table 1. A nested relation

Name Jack

(

Subj Y ear DB 98

Java

97

(

r on schema Stud

Test type Mark) ) (Addr) SQL 81 Adelaide exam 70 Sydney proj 60 exam 70

De nition 3.1 (Prime Subschema) Let

R = A1 :::Am (R1 ):::(Rn ) and S = B1 :::Bp (S1 ):::(Sq ) (0  q  n). S is a prime subschema of R, denoted by S p R, if (i) p = m and 8i(Ai = Bi ); (ii) for each Sk (k 2 [1; :::; q]), there exists a Rj (j 2 [1; :::; n]) such that Sk is the prime subschema of Rj . 2 Example 3.2 Let S1 = Name(Addr). Then S1 p Stud, where Stud is shown in Example 3.1. 2 De nition 3.2 (Subschema) Let R and S be two schemas. S is a subschema of R, denoted S  R, if there exists a structured attribute Rk at some level of R so that S p Rk . 2 Example 3.3 Let S2 = Subj Y ear(Test type Mark). Then S2  Stud, where Stud is shown in Example 3.1. 2

In this paper, we distinguish between a prime subschema and a subschema. This is in contrast to the Verso model. In the Verso algebra, a subschema S of R means that S and R have the same roots. So, Verso subschema is equivalent to prime subschema in this paper. In this paper, we de ne S to be a subschema of R if S and R are trees in the standard fashion [8] and S is a subtree of R. To simplify the notation, for a schema R = A1 :::Am (R1 ):::(Rn ), we let A = A1 :::Am and/or (R) = (R1 ):::(Rn ) when no Rj is speci cally concerned. If r is on R and x is a tuple in r, then x[(R)] =< x[(R1 )]; :::; x[(Rn )] >. x[(R)] =  means 8i 2 [1; :::; n], x[(Ri )] =  and x[(R)] 6=  means 9i(x[(Ri )] 6= ). For a unary operator ", "(x[(R)]) denotes the tuple < "(x[(R1 )]); :::; "(x[(Rn )]) >. For two tuples x1 and x2 in r and for a binary operator , x1 [(R)]x2 [(R)] denotes < (x1 [(R1 )]x2 [(R1 )]); :::; (x1 [(Rn )]x2 [(Rn )]) >. We also simplify 'and' as comma ',' when we discuss conditions. For example, fxjx 2 r; x 2 sg means x 2 r and x 2 s. In the Verso model, relations also satisfy the partitioned normal form (PNF) property [6]. A nested relation is recursively de ned to be in PNF if for every pair of tuples x1 and x2 in r, x1 [A] 6= x2 [A] and for all x 2 r, x[(R1 )]; :::; x[(Rn )] satisfy the PNF prop-

erty. We also refer to the set of atomic attributes in the schema of a PNF relation to be a key.

4. Operators on Verso relations In the Verso model, ve data manipulation operators were presented in [1]. These were: union operator (r  s), the di erence operator (r s), the intersection operator (r s), the selection operator (c (r)), the projection operator (S (r) where S p R), the join operator (r ./r s where r and s have the same root). The reader is referred to [1] or [9] for precise de nitions of these operators. We now present two new operators which increase the exibility of data manipulation. The rst operator, called the expansion operator, allows us to extend a relation de ned over a prime subschema to the whole scheme by padding out the relation with empty sets. More precisely, we have the following de nition. De nition 4.1 (Expansion Operator) Let S = A(S1 ):::(Sm ) be a prime subschema of R = A(R1 ):::(Rn ). Let s be a relation de ned over S . The expansion of s to schema R is a relation over R, denoted by R (s), is de ned recursively by: R (s) = fxj9v 2 s, x[A] = v[A], 8i 2 [1:::n] ( 9j 2 [1:::m](Sj  Ri ); x[(Ri )] = R (v[(Sj )]) or 6 9j 2 [1:::m](Sj  Ri )x[(Ri )] =  )g2 Example 4.1 Let s = f< Tony; f< Perth >; < Adelaide >g >g be a relation on schema S = Name (Addr). Let Stud be the schema described in Example 3.1. Then, Stud (s) = f< Tony; fg; f< i

Perth >; < Adelaide >g >g 2

Next, we extend the Verso join operator. The original Verso join operator (we refer to it as a root-join) joins two relations if their schemas have the same root. Our extension removes this limitation. The extended operator can join two relations if the root of the schema of one relation is a node in the other schema. The extended operator outputs a relation in PNF if its inputs are in PNF. De nition 4.2 (Joinable schemas) Let S = AS (S1 ):::(Sm ) and R = AR (R1 ):::(Rn ) be two schemas. Then R and S are joinable schemas if there exists a schema T = AT (T1 ):::(Tp ) such that (1) all attributes of T are composed of those in R and S ; (2) both R and S are subschema (not necessary prime) of T . We call T the joined schema of R and S . The common attribute sets are called joints.2 Example 4.2 Let Stud = Name(Subj Y ear(Test type Mark))(Addr) given in Example 3.1. Let SubjLect = Subj Y ear (Lect) store the lecturers for every subject started in year Y ear. There is a schema T =

Name(Subj Y ear(Test type Mark) (Lect))(Addr) such that Stud p T and SubjLect  T . So, Stud and SubjLect are joinable and the joint is fSubj Y earg.

2

De nition 4.3 (Join operator) Let S = AS (S ):::(Sm ) and R = AR (R ):::(Rn ) be two joinable schemas with joined schema being T = AT (T ):::(Tp ). Let r and s be relations over R and S respectively. The join of r and s is the relation over T , denoted by r ./ s, de ned recursively by (i) r ./ s = fxjx 2 r and x 2 sg, if R = S = T = AR , i.e. at; (ii) r ./ s = fxj(AT = AR = AS , 9u 2 r, 9v 2 s, x[AT ] = u[AR ] = v[AS ] 1

1

1

x[(Ti )] = u[(Rj )] ./ v[(Sk )], if Rj p Ti , Sk p Ti or x[(Ti )] = u[(Rj )], if Rj p Ti , 8Sk 6 Ti or x[(Ti )] = v[(Sk )], if 8Rj 6 Ti , Sk p Ti ) or (AT = AR 6= AS , 9u 2 r, x[AT ] = u[AR ], x[(Ti )] = u[(Rj )] ./ s, if S  Rj or x[(Ti )] = u[(Rj )], if S 6 Rj ) ) or (AT = AS 6= AR , 9v 2 s, x[AT ] = v[AS ], x[(Ri )] = r ./ v[(Sk )], if R  Sk ) or x[(Ri )] = v[(Sk )], if R 6 Sk ) ) g 2 Example 4.3 Let r and Stud be as described in Example 3.1. Let SubjLect as described in Example 4.2. Let sj be the relation de ned on SubjLect and shown

in (a) of Table 2. The result of the join operation on r and sj is shown in (b) of Table 2. Since there are no lecturers for Java subject in sj , java is not in the join result. Kalven taught in a di erent year and so his name is not in the result either. 2

s

r = r ./ sj

Table 2. Relation j and join j

Subj Y ear DB 98 DB

96 (a)

sj

Lect) Ben Tom Kaven (

Name (Subj Y ear (Test type Mark) (Lect)) (Addr) Jack DB 98 SQL 81 Ben Sydney exam 70 Tom Adelaide (b) rj = r ./ sj

5. Incremental equations for the operators of the nested model In this section, we derive incremental equations for the nested operators de ned in Section 4. We assume that the update to a relation is a full tuple update, i.e. the updating tuples and the relation have the same

schema. Otherwise, if the schema of the update is the prime subschema of the updated relation, we assume that the expansion operator has been applied to expand the updating tuples into full tuples. We also note that we do not assume that insertions satisfy the disjointedness property (the inserted tuples are disjoint from the relation) and deletions satisfy the containment property (the deleted tuples are contained in the relation). The reason for not making this assumption in this section is to derive incremental equations under the most general conditions possible since sometimes the containment and disjointedness properties are too restrictive. In a later section we demonstrate that the incremental equations simplify considerably if we assume containment and disjointedness. Proofs of the equations in this section are given in [9]. We rstly give a general overview of what we are aiming to derive in this section of the paper. We are aiming to derive expressions of the form opu (r  r) = f (opu (r); r; r) and opu (r r) = f (opu (r); r; r) in the case of a unary query operator opu , and opb (r  r; s) = f (opb (r  s); r; s; r) and opb (r  r; s) = f (opb (r s); r; s; r) in the case of a binary operator opb . In this notation r and s are called base relations ; r is called the update (or increment )to the base relation and f is a function. We call opu (r) (opb (r; s)) the old view, opu (r  r)(opb (r  r; s)) ( recomputation ), and f (opu (r); r; r) (f (opb (r  s); r; s; r))incremental computation. The aim of deriving IE is to nd a cheaper way (an incremental way) to compute a query when a base relation is updated by computing the RHS of and IE rather than the LHS. It is particularly desirable if the RHS of the IE for an operators take the simple form of opu (r)  r (opb (r; s)  r). We call this form of IEs standard form. The advantage of this form is that if the size of the increment is small, then in general it is much more ef cient to compute the new view incrementally than by recomputation. Standard IEs may not exist for some operators, but we can in some cases still derive IEs in the limited standard form which means a standard form attached with some conditions. 5.1. Incremental equations for the expansion operator

IEs for the expansion operator are in the standard from and are given in the next theorem. Theorem 5.1 Let S be a prime subschema of a schema R and let r and s be two instances over S . Then (i) R (r  s) = R (r)  R (s)

(i) (r  s) t = (r t)  (s t) (ii) (r s) t = A62s[A] (r t)  ((A2s[A] (r)) s) t.

(ii) R (r s) = R (r) R (s) 2 5.2. Incremental equations for the intersection operator

The next Lemma indicates that IE for the intersection operator with union is in the standard form. However, IE for the intersection operator with di erence can not be in the standard form since (r t) (s t) is contained in (r s) t, but not the other way around.

Lemma 5.1 Let R be a schema and r, s, and t be instances over R. Then (i) (r  s) t = (r t)  (s t) (ii) (r s) t = (r t) (s t)  (r s) t. 2 If we place restrictions on the relations, then it is possible to derive a standard IE for the intersection operator as shown in the following result. Lemma 5.2 (r s) t = (r t) (s t), if recursively 9u 2 r, 9v 2 s, 9w 2 t, u[A] = v[A] = w[A], 9i(u[(Ri )] v[(Ri )] 6= , (u[(Ri )] w[(Ri )]) (v[(Ri )] w[(Ri )]) 6= ). 2 We now use Example 5.1 to show that a standard IE is not valid unless the relations satisfy the conditions of Lemma 5.2. Example 5.1 Let r, s and t be relations over schema R = A(B ), shown in Table 3. After recomputation and incrementally computing the view, we see that (r s) t 6= (r t) (s t) since the rst tuple in r, s and t violates the condition. 2 Table 3. Relations for Example 5.1

A a1 a2 a3 a4

B)

A a1 a2 a3

(

fb1 ; b2 g fb2 ; b3 g fb4 g fb5 g r

A a1 a2 a4 (r

B) fg fb3 g fb5 g

(

s) t

B)

(

fb1 ; b3 g fb1 ; b2 g fb3 ; b4 g s

A a2 a4

A a1 a2 a3 a4

B)

(

fb1 g fb2 ; b3 g fb4 g fb5 g t

B)

(

fb3 g fb5 g (r t) (s t)

A closer inspection of this example and the previous results indicates that the reason for not being able to derive a standard IE for the intersection operator is caused by the tuples in r and s with the same key values. If we treat these tuples separately then we can derive an IE in general standard form that is more ef cient than recomputation. Theorem 5.2 Let R be a schema and r, s, and t be instances over R. Then

2

The next example shows the usage of Equation (ii) of the theorem. The example also shows that using Equation (ii) is more ecient than recomputation. Example 5.2 Let r, s and t be in Table 3. The computed results using Equation (ii) of Theorem 5.2 are given in Table 4. We see that z1  z2 is the same as recomputation (r s) t in Table 3. We note that we did not recompute the intersection of tuple < a4 ; fb5g > in r and t. 2 Table 4. Usage of Equation (ii) of Theorem 5.2

A a1 a2 a3 a4

B)

A A (B) (B ) a1 A (B) A a1 fg a f  g a2 a4 fb5 g a12 fb3 g a2 fb3 g a3 a4 fb5 g z1 z2 s[A] z1  z2 z1 = A62s[A] (r t) z2 = ((A2s[A](r)) s) t (

fb1 g fb2 ; b3 g fb4 g fb5 g r t

5.3. Incremental equations for the selection operator

The next lemma shows that it is not possible to derive a standard IE for the selection operator. Lemma 5.3 Let r and s be two relations over schema R = A(R1 ):::(Rn ). Let c = (ckey ; 1 (c1 ); :::; n (cn )) be a selection condition. Then c (r) may not be contained in c (r  s) and c (r r). 2 This lemma is supported by the next example. It indicates that a standard form of IE for the selection operator is impossible. Example 5.3 This example shows that c(r) is not always contained in c (r  s). Let the selection condition be c = (9(B ) 6 9(C )). r, s, c (r), and c (r  s) are shown in Table 5. Obviously, c (r) is not contained in c (r  s). 2 Table 5. Relations for Example 5.3

A (B) (C ) a1 fb1 g fg a5 fb5 g fg r

A (B) (C ) a1 fb1 g fc2 g a5 fb5 ; b6 g fg s

A (B) (C ) a5 fb5 ; b6 g fg c (r  s)

A (B) (C ) a1 fb1 g fg a5 fb5 g fg c (r) A (B) (C ) a1 fb1 g fg c (r s)

However, as shown in the following result, we can derive a limited standard form for selection IEs if we impose restrictions on r and r.

Lemma 5.4

(i) c (r  s) = c (r)  c (s) i once 9u 2 r and v 2 s, u[A] = v[A], then recursively 8i 2 [1; :::; n], i (c (u[(Ri )])  c (v[(Ri )])) = true () i (c (u[(Ri )])) = true and i (c (v[(Ri )])) = true (ii) c (r s) = c (r) c (s) i once 9u 2 r and v 2 s, u[A] = v[A], then recursively 8i 2 [1; :::; n], i (c (u[(Ri )]) c (v[(Ri )])) = true () i (c (u[(Ri )])) = true and i (c (v[(Ri )])) = true and 9i(u[(Ri )] v[(Ri )] 6=  () c (u[(Ri )]) c (v[(Ri )]) 6= ) 2 Example 5.4 This example shows the importance of the condition in the Equation (i).1. Let R = A(B )(C ) and let r and s be two instances over R described as in Table 6. We see that c (r  s) 6= c (r)  c (s). This is because the update s does not satisfy the condition: 6 9(fc1 g{z fg}) = 6|9(fg)^{z6 9(fc1 g}) 6= true | i

i

i

i

i

i

Lemma 5.6 T (r s) = T (r) T (s) , if recursively 9u 2 r, 9v 2 s, u[A] = v[A], 9j 2 [1:::m], u[(Rj )] v[(Rj )] = 6  =) (S u[(Rj )] S v[(Rj )]) 6= . 2 Example 5.5 Let R = A(B)(C ) and T = A(B). Let r and s be two instances over R shown in Table 7. We see that T (r s) = 6 T (r) T (s) because in the rst tuple of r and s, projected subrelations fb g = fb g j

j

1

which violates the attached condition. 2

1

i

i

A a1 a2 a3

i

i

false

false

In this case, we have to use equation (i) to compute the correct answer. 2 Table 6. Relations for Example 5.4

A (B) (C ) A (B) (C ) A (B) (C ) A (B) (C ) a1 fb1 g fc1 g a1 fb2 g fg  a1 fb2 g fg r s c (r)  c (s) c (r)  c (s)

The reason for not being able to derive a standard IE for the selection operator is caused by the tuple in r and r which have the same key values. The next theorem shows that a general standard form IE can be derived if one recomputes the tuples having the same key value while the tuples in r and r having the same key values are recomputed while other tuples are computed incrementally. The IEs in the theorem can applied without limitation.

Theorem 5.3 (i) c (r  s) = A62s A (c (r))  c (A2s A (r)  s) (ii) c (r s) = A62s A (c (r))  c (A2s A (r) s) 2 [

]

[

]

[

]

[

]

5.4. Incremental equations for the projection operator

Lemma 5.5 Let S = A(S ):::(Sm ) be a prime subschema of R = A(R ):::(Rn ). Let r and s be instances 1

1

over R. Then, the following two equations hold. (i) S (r  s) = S (r)  S (s) (ii) S (r s) = S (r) S (s) S (r s). 2 Equation (ii) shows that the IE for di erence can not be in the standard form. A limited standard form for di erence is given in the next lemma.

Table 7. Relations for Example 5.5

B)

C)

(

(

fb1 g fc1 g fb1 ; b2 g fc1 ; c2 g fb2 ; b3 g fc1 ; c3 g r

A (B) a1 fg a3 fb2 g T (r s)

A a1 a2 a3

B)

C)

(

(

fb1 g fc2 g fb1 ; b2 g fc1 ; c2 g fb3 g fc3 g s

A (B) a3 fb2 g T (r) T (s)

A general standard form of IEs for the projection is given in the following theorem. Theorem 5.4 Let S = A(S1 ):::(Sm) be a prime subschema of R = A(R1 ):::(Rn ). Let r and s be instances over R. Then, the following two equations hold. (i) S (r  s) = S (r)  S (s) (ii) S (r s) = A62s[A] (S (r))  S (A2s[A] (r) s).

2

5.5. Incremental equations for the r-join operator

Lemma 5.7 Let R and S be two r-joinable schemas. Let r and s be instances over R and t over S . The following two equations hold. (i) (r  s) ./r t = (r ./r t)  (s ./r t) (ii) (r s) ./r t = (r ./r t) (s ./r t)  (r s) ./r t 2 A limited standard form for Equation (ii) is (r s) ./r t = (r ./r t) (s ./r t) if r and s do not have overlapping key values on top level. Example 5.6 shows the importance of the condition in the limited standard form. Example 5.6 Figure 1 shows schemas R, S , and their joint schema T . There are relations r and s over R and t over S . The joined results (r s) ./r t and (r ./r t) (s ./r t) are given in the gure. Since r and s have overlapping key values, (r s) ./r t and (r ./r t) (s ./r t) are not equivalent. 2 The general standard form of IEs for the r-join operator is given in Theorem 5.5. Theorem 5.5 Let R and S be two r-joinable schemas. Let r and s be instances over R and t over S . The following two equations hold.

R= A (B) r: a1 {b1} a2 {b1} a3 {b1,b2} r:

(C) {c1} {c2,c3} {c1}

a1 {b1} {c1} a2 {b1} {c3} a3 {b2,b3} {c1}

S= A t: a1 a2 a3

(C) (D) {c1} {d1} {c1} {d2} {c1} {d3}

T= A (B) (C) (D) (r- r ) r t: a2 { } { } {d2} a3 {b1} { } {d3} r r t- r r t: a3 {b1} {

} {

}

Figure 1. Update of r-join in Example 5.6

(i) (r  s) ./r t = (r ./r t)  (s ./r t) (ii) (r s) ./r t = A62s[A] (r ./r t)  (A2s[A] (r) s) ./r t 2 5.6. Incremental equations for the join operator

Lemma 5.8 Let R and S be two joinable schemas and

let r and s be instances over R and t over T . Then the following two equations hold. (i) (r  s) ./ t = (r ./ t)  (s ./ t) (ii) (r s) ./ t = (r ./ t) (s ./ t)  (r s) ./ t. 2 Equation (ii) can be expressed in a limited standard form of Lemma 5.9 if restrictions are placed on the update s. Lemma 5.9 (r s) ./ t = (r ./ t) (s ./ t) if r and s do not have overlapping key values at level 0. 2 Example 5.7 The example (see Figure 2) shows the application of Lemma 5.9. Two relations r and s are created over schema R and another relation t is set up over S . Both sides of Lemma 5.9 are calculated with the given relations. The results indicate that the two sides are not equivalent. There are two reasons for (r s) ./ t (LHS) and (r ./ t) (s ./ t) (RHS) not being equivalent in our example. One is that on the joint node C , the tuples in di erentiating relations have overlapping key values. To see this, the tuple < c1 ; fd1 g; fe2; e3 g > for subrelation (C (D)(E )) in r and the tuple < c1 ; fd1g; fe3g > in s have the same key value c1 . So these two tuples, after rst di erentiating and then joining, lead to the tuple f< c1 fg; fg; ff1g >g appearing in (r s) ./ t. However by rst joining and then di erentiating, the result of these two tuples becomes empty in (r ./ t) (s ./ t). This is one reason that LHS does not equal to RHS. Another reason is that on node x that is above the joint node, the key values of tuples from r and s are overlapped and from joint node C down on, the joining result is empty. This can be seen from the second tuple < a2 ; fb2 g; f:::g > of r and the second tuple < a2 ; fb2g; f:::g > of s. By rst di erentiating and then joining, they become < a2 ; fg; fg > on LHS.

However, by rst joining and then di erentiating, they become empty on RHS. This becomes the second reason that LHS does not equal to RHS. To make LHS and RHS equivalent, the recomputation has to be used, as described in equation (ii), to add the missing tuples to RHS. 2 A general standard form for the join operator is given in Theorem 5.6. Theorem 5.6 Let R and S be two joinable schemas and let r and s be instances over R and t over T . Then the following two equations hold. (i) (r  s) ./ t = (r ./ t)  (s ./ t) (ii) (r s) ./ t = A62s[A] (r ./ t)  (A2s[A] (r) s) ./ t

2

6. Incremental maintenance of complex views In this section, we present an algorithm for the incremental maintenance of views expressed as complex queries using the incremental equations derived previously. Although our technique can handle both insertions and deletions, for simplicity of exposition we assume that the update is a single insertion to a base relation are insertions. Our technique is rstly to represent the query expression for the view as an expression or operator tree. In this representation, the leaves of the tree are the base relations, the interior nodes are the query operators and the root of the tree represents the nal view. Our technique then computes the change to the view in a 'bottom' up fashion starting with the changes to the leaf nodes and then propagating the changes upwards in the expression tree using the incremental equations derived previously. To be more precise, we can express the technique as follows (iri represents the intermediate relation corresponding to node i in the tree, iri and iri are the intermediate relations corresponding to the child nodes of node i). 0

00

Algorithm 6.1 (Maintaining views with complex queries)

R= A (B) ( C (D) (E) ) r: a1 {b1,b2} c1 {d1} {e2,e3} c2 {d1,d2} {e1} a2 {b2}

r:

a1 {b1}

c3 {d1} c4 {d2}

{e2} {e1}

c1 {d1}

{e3}

T= A (r- r ) S= C t:

(E)

a2 {b2}

{ c4 {d2}

( C

a1 {b2}

t:

(F)

c1 {e1} {f1} c2 {e1} {f2}

a2 {

r t- r

c2 {d2,d3} {e1}

(B)

t:

(D)

(E)

c1 { } { c2 {d1} {

(F) ) } {f1} } {f2}

} {

}

a1 {b1} { c2 {d1} {

} {

} }

{e1} }

Figure 2. Update of join in Example 5.7

Input: A set of insertions r ; : : : ; rn to the base relations and the height h of the tree. Output: Change to the view 1

For d =1 to h do for each node i of depth d do if iri or iri is non-empty, then compute iri according to IEs in Section 5. endif; endfor; endfor. 2 Example 6.1 Let r and its schema be described as in Example 3.1. Let sj and its schema SubjLect be described as in Example 4.3. Let a query be to list the name of each student, the subject started in 1998 and the lecturers for that subject if the student gets all marks over 69 for it. The result schema would be StudLect = Name(Subj Y ear(Lect)). The query can be expressed as V iew = Name(Subj Y ear(Lect)) ((9(Y ear=98;9(8Mark>69)) (r)) ./ sj ). After computation, the view instance is given in Table 8. 00

0

Table 8. The materialized view update

Name Jack

(

Subj Y ear DB 98

V iew

before

(Lect) )

Ben Tom

An update of insertion r to r is given in Table 9. The update is propagated through the following steps.  V3 = 9(Y ear=98;9(8Mark>69)) (r)=f< Sean; f< DB 98f< exam 90 >g >g; fg >g  V2 = V3 ./ s[(Rj )]=f< Sean; f< DB 98f< exam 90 >g; f< Ben >; < Tom >g >g; fg >g  V1 = Name(Subj Y ear(Lect)) (V2 )=f< Sean; f
; < Tom >g >g >g  V iew = V iew  V 1

2

Table 9. An update r to

r

Name ( Subj Y ear ( Test type Mark) ) (Addr) Sean DB 98 exam 90  Table 10. The materialized view V iew after update

Name Jack Sean

(

Subj Y ear DB 98

DB

98

(Lect) )

Ben Tom Ben Tom

7. IEs under assumptions of Disjointedness and Containment In this section we derive IEs when we assume that the updates satisfy containment and disjointedness conditions. We rstly precisely de ne these terms. De nition 7.1 (containment) Let s and r be two relations. s is de ned to be contained in r if s  r. 2 De nition 7.2 (disjointedness) Let s and r be two relations on schema R = A(R1 ):::(Rn ). s and r are de ned to be disjoint if s[A] \ r[A] = . 2 Let r be an update to relation r in a view de nition. r is either an insertion or a deletion. When an inserting r is strictly disjoint to r and a deleting r is exactly contained in r, the following equations are correct. (r  r) s = (r s)  (r s) (r r) s = (r s) (r s) c (r  r) = c (r)  c (r) c (r r) = c (r) c (r) S (r  r) = S (r)  S (r) S (r r) = S (r) S (r) (r  r) ./r s = (r ./r s)  (r ./r s) (r r) ./r s = (r ./r s) (r ./r s) (r  r) ./ s = (r ./ s)  (r ./ s) (r r) ./ s = (r ./ s) (r ./ s)

(r  r)  s = (r  s)  r (r r)  s = (r  s) (r s) r  (s  s) = (r  s)  s r  (s s) = (r  s) (s r) (r  r) s = (r s)  (r s) (r r) s = (r s) r r (s  s) = (r s) s r  (s s) = (r s)  (s r)

8. Discussion In this section we highlight some of the important features of the results derived in this paper and discuss the di erences between incremental expressions for at relations and for nested relations. Firstly, we note that some (but not all) of the incremental equations result in eciency gains since the expressions do not involve recomputing the new view. This is the case when the update is an insertion and the operators are expansion, intersection, projection, r-join, or join. However, the incremental equations for some operators are not as ecient as recomputation. Selection, for example, is a more complicated operator and in general the expressions we have derived do not avoid view recomputation unless restrictions are placed upon the update. A similar situation occurs with the at relational projection operator where the incremental equation involves view recomputation [10]. This highlights the fact that some views are impossible to eciently maintain if the only information available is the view itself. This has lead to the development of other techniques which use counts [5] or auxiliary relations [14] to improve eciency by avoiding view recomputation. Similarly, one expects that views involving nested operators can be more ef ciently maintained if more information than just the view is stored. In comparing the equations derived in this paper and those in [10] for the at relational model, one notes that not only are the equations in this paper generally more complex but also the symmetry shown in the equations of [10] are absent in the expressions of the nested operators. For example, in at relations the incremental equations for selections and joins are similar for both insertions and deletions and are computed as c (r  s) = c (r)  c(s) and (r  s) ./ t = r ./ t  s ./ t respectively. This symmetry is absent in the equations derived in Section 5.

References [1] S.Abiteboul and N.Bidoit, Non rst normal form relations: An algebra allowing data restructuring. Jour-

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

nal of Computer and System Sciences, 33(3):361-393, 1986. S.Dekeyser, B.Kuijpers and J.Paredaens, Nested Data Cubes for OLAP. LNCS 1552, Advances in Database Technologies, 1998. T.Grin and L.Libkin, Incremental maintenance of views with duplicates. Proc. of 1995 SIGMOD Conf., page 328-339, 1995. T.Grin, L.Libkin and H.Trickey, An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. IEEE Transactions on Knowledge and Data Engineering, 9(3):508-511 1997. A.Gupta and I.S.Mumick, Maintaining view incrementally. Proc. of 1993 SIGMOD Conf., 1993. G.Hulin, On Restructuring Nested Relations in Partitioned Normal Form. Proc. of 1990 VLDB Conf., page 626-637, 1990. A.Kawaguchi, D.F.Lieuwen, et al, View maintenance in nested data model. Proc. of the workshop on materialized views: techniques and applications, Canada, page 72-83, 1996. M.Levene, The nested universal relatioon database model, LNCS 595 Springer-verlag, Berlin, 1992. J.Liu, M.Vincent and M.Mohania, Update propagation in nested databases. Tech. Report, CIS-98-011, School of Computer and Information Science, Uni. of South Australia, 1998. X.Qian and W.Gio, Incremental recomputation of active relational expressions. IEEE transactions on knowledge and data engineering, 3(3):337-341, 1991. M.A.Roth and J.E.Kirkpatrick, Algebras for Nested Relations. Data Engineering Bulletin, 11(3):39-47, 1988. M.A.Roth, H.F.Korth and A.F.Silberschatz, Null values in nested relational databases. Acta Informatica, 26(7):615-642, 1989. M.Stonebraker, Object management in POSTGRES using procedures. International Workshop on ObjectOriented Database Systems, page 66-72, 1986 . M.Vincent, M.K.Mohania and Y.K.Kambayashi, A Self Maintainable View Maintenance Technique for Data Warehouses. Proc. of 8th Intl. Conf. on Management of Data, Madras, India, page 7-22, 1997. J.Widom, Research problems in data warehousing. 4th CIKM, , 1995. J.Yang and J.Widom, Maintaining Temporal Views Over Non-Historical Information Sources For Data Warehousing. Proc. of the 6th Intl. Conf. on Extending Database Technology (EDBT '98), page 389-403, 1998.