Schema Conversion from Relation to XML with Semantic Constraints 1,2
1,†
Teng Lv and Ping Yan 1 College of Mathematics and System Science, Xinjiang University, Urumqi 830046, China 2 Teaching and Research Section of Computer, Artillery Academy, Hefei 230031, China E-mail:
[email protected],
[email protected] Abstract This paper studies the schema conversion from relational schemas to XML DTDs. As functional dependencies play an important role in schema conversion process, the concept of functional dependency for XML DTDs is proposed to preserve the semantics implied by functional dependencies and keys of relational schemas. A conversion method called NeT-FD (NestingBased Translation with Functional Dependencies) is proposed to convert relational schemas to XML DTDs in the presence of functional dependencies and keys. The method presented here can preserve the semantics implied by functional dependencies and keys of relational schemas and can convert multiple tables to XML DTDs at the same time.
1. Introduction XML (eXtensible Markup Language) [1] has become one of the primary standards for data exchange and representation on the World Wide Web and is widely used in many fields. Traditionally, lots of data and information are still stored in and managed by relational database systems, such as Oracle, Sybase, SQL Server, etc. So it is necessary and urgent to develop some efficient methods to convert relational data to XML data in order to take advantage of all the merits of XML data. As DTDs (Document Type Definitions) are still the most frequently used schemas for XML documents in these days [2], we will use DTDs as schemas of XML documents. The schema semantics plays a very important role in the schema conversion, and functional dependencies of relational schema are very important representations of semantic information. So it is significant that the conversion method from relational schemas to XML schemas must consider the semantics implied by functional dependencies and keys, and the final XML schemas can represent such semantics in a similar way.
1.1 Related work
Functional dependencies for XML. Some definitions of XML functional dependencies[3~5] do not Corresponding author. E-mail:
[email protected].
deal with the cross-branch situations properly in XML trees. There is no need (even ambiguity) to compare value equality of nodes crossing some branches when determine whether an XML FD is satisfied. Ref.[6] also gave a definition of XML functional dependencies which only considers the string values of attributes and elements of XML documents. The definition of XML functional dependencies proposed in this paper improves the related definitions in the following two aspects: (a) It captures the characteristics of XML structure, differentiates between global functional dependencies and local functional dependencies for XML, and deals with the cross-branch situations of XML functional dependencies. (b) It considers not only the string values of attributes and elements but also the elements themselves of XML documents. Relational schema to XML schemas conversion. An intuitive method called FT (Flat Translation) [7] converts tables and attributes of relational schemas to elements and attributes of XML DTDs, respectively. Another two methods called NeT (Nesting-based Translation Algorithm) and CoT (Constraints-based Translation Algorithm) [8, 9] can treat more complicated relational schemas than FT. Ref. [10] proposed a method considering some semantics. Ref. [11] proposed an automatic generator for translating ER models to XML DTDs, which mainly considers cardinality constraints, composite attributes, multi-valued attributes, weak entity, and strong entity. But all of them omit the semantics implied by functional dependencies of relational schemas, which are very important representation of semantic information. Ref.[12] converted relational schemas to XML schemas through denormalization by joining the normalized relations into tables which are mapped into DOMs. Then DOMs are integrated into a user specified XML document trees, which is finally mapped into XML DTDs. Ref.[13] translated relational schemas into an Extended Entity Relationship (EER) Model, which is then mapped to an XML Schema Definition Language (XSD) Graph as an XML conceptual schema. The XSD Graph will be finally mapped into XSD as an XML logical schema. We do not apply DOM, ER model, or EER model in our work, so these approaches are orthogonal to our work presented in this paper. Furthermore, we mainly
focus on functional dependencies and keys in relational schemas when converting relational schemas to XML DTDs. The final converted XML DTDs can preserve these semantic constraints in XML DTDs’ functional dependencies, which are major contributions of our work. XML schema to Relational schema conversion. Another related work is the problem of converting XML schemas to relational schemas. Ref.[14] gave a method to partially supervised extracting relations from XML documents without consistent schemas such as DTDs. Some studies[15~17] also proposed XML-to-relational mapping frameworks through annotations in an XML schema considering different mapping methods without functional dependencies. For the problem of mapping XML DTDs to relational schemas, several mapping methods[18~21] are proposed, but all these methods do not consider the semantics of DTDs in the mapping process from XML DTDs to relational schemas. For the problem of mapping XML DTDs to relational schemas considering the semantics of DTDs, two methods which can map XML DTDs to relational schemas while preserve the semantics implied by XML keys and cardinality constraints, etc. are given in Refs. [9, 22]. These methods do not consider the semantics of DTDs implied by functional dependencies over DTDs in the process of mapping DTDs to relational schemas. Several methods proposed in Refs. [23~25] focus on the semantics of DTDs implied by functional dependencies over DTDs in the process of DTDs to relational schemas conversion. Main contributions. In this paper, we give the definition of functional dependencies over XML DTDs to represent the semantics of DTDs. Then based on the semantics of functional dependencies over DTDs, we propose a conversion method called NeT-FD (Nestingbased Translation with Functional Dependencies) based on NeT to convert relational schemas to XML DTDs in the presence of functional dependencies and keys. The method NeT-FD has a significant improvement over NeT in the aspect of preserving the semantics implied by functional dependencies of relational schemas. Another NeT-FD’s improvement over NeT is that NeT-FD can convert multiple tables to XML DTDs. Organization: The rest of the paper is organized as follows. Section 2 gives some notations and the definition of functional dependencies for XML as a preliminary work. Section 3 presents the conversion method NeT-FD to convert relational schemas to XML DTDs with functional dependencies and keys. Section 4 gives an example to illustrate the application of the method NeTFD. Section 5 concludes the paper.
2. Notations We give the definition of relation schema based on the counterpart of Ref. [7].
Definition 1. A relational schema R is defined to be R =(T, C, M, N, ∆), where (1) T is a finite set of table names; (2) C is a finite set of column names; (3) M is a mapping from a table name t ∈ T to a finite set of column names where each column name c ∈ C; (4) N is a mapping from a column name c ∈ C to column type definition α::=(baseType, unique, null, domain, default ), where baseType is atomic base types defined by RDBMS (Relational Database Management System), unique is true or false determined by the value of c is unique or not, null is true or false determined by the value of c can be null or not, domain is the domain type of c or ε if not known, default is the default value of c or ε if not known; (5) ∆ is a finite set of FDs (functional dependencies), keys, and foreign keys over R. We give the nest operation of a table according to the ideas of Refs. [7,26] Definition 2. Let T1 be a table with n tuples (t1, t2, …, tn) and column set C. Suppose c is a column of C (c ∈ C) and C/c denotes the sub-set where c is excluded from C. Let ti (c) and ti(C/c) denote the value of column c and the set of value of column set C/c in tuple ti (i=1, 2, …, n), respectively. Nestc(t)={t1(c), t2(c), …, ti(c) | t1(C/c)=t2(C/c)=…=ti (C/c), i ∈ [1, n]}, i.e., the tuples with same values on columns C/c are nested on column c. If all sets {t1(c), t2(c), …, ti (c) | t1(C/c)=t2(C/c)=…=ti (C/c), i ∈ [1, n]} has only one item, then Nestc(T1) is failure, otherwise succeeded. We give the definitions of DTD, path, and XML tree as Ref.[5] . Definition 3. A DTD (Document Type Definition) is defined to be D=(E, A, P, R, r), where (1) E is a finite set of element types; (2) A is a finite set of attributes; (3) P is a mapping from E to element type definitions. For each τ ∈ E, P( τ ) is a regular expression ξ defined as
ξ::= S | ε | τ | α | α | α , α | α , where S denotes the '
*
string type , ε is the empty sequence , τ ∈ E, “|”, “ , ” and “ ∗ ” denote union (or choice), concatenation and Kleene closure, respectively; (4) R is a mapping from E to the power set (A); (5) r ∈ E is called the element type of the root. A path p in D=(E, A, P, R, r) is defined to be p=ω1.….ωn, where (1) ω1=r; (2) ωi ∈ P(ωi-1), i ∈ [2,…,n1]; (3) ωn ∈ P(ωn-1) if ωn ∈ E and P(ωn)≠∅, or ωn=S if ωn ∈ E and P(ωn)= ∅, or ωn ∈ R(ωn-1) if ωn ∈ A. Let paths(D)={p | p is a path in D }. last(p) denotes the last symbol of path p, and p-last(p) denotes the remaining path excluding last(p) of p. For two paths p,q∈ paths(D), p ⊆ Path q (or q ⊇ Path p) denotes path p is a prefix of path '
q or they are the same path; p ⊂ Path q (or q ⊃ Path p) denotes path p is a prefix of path q but they are not the same path.
Definition 4. Let D=(E, A, P, R, r). An XML tree T conforming to D (denoted by T|=D) is defined to be T=(V, lab, ele, att, val, root), where (1) V is a finite set of vertexes; (2) lab is a mapping from V to E ∪ A; (3) ele is a partial function from V to V* such that for any v ∈ V, ele(v)=[v1, …,vn] if lab(v1), …, lab(vn) is defined in P(lab(v)); (4) att is a partial function from V to A such that for any v ∈ V, att(v)=R(lab(v)) if lab(v) ∈ E and R(lab(v)) is defined in D; (5) val is a partial function from V to S such that for any v ∈ V, val(v) is defined if P(lab(v))=S or lab(v) ∈ A; (6) lab(root)=r is called the root of T. Given a DTD D and an XML tree T|=D, a path p in T is defined to be p= v1.….vn, where (1) v1=root; (2) vi ∈ ele(vi-1), i ∈ [2,…,n-1]; (3) vn ∈ ele(vn-1) if lab(vn) ∈ E, or vn ∈ att(vn-1) if lab(vn) ∈ A, or vn=S if P(lab(vn1))=S. Let paths(T)={p | p is a path in T}. If n is a node in an XML tree T|=D and p is a path in D, then the last node set of path p passing node n is n[[p]]. Specifically, root[[p]] is just simplified as [[p]]. A path p is denoted as p(n) if its last node is node n. For paths p1, p2,…, and pn in T, the maximal common prefix is denoted as p1∩p2∩…∩pn, which is also a path in T. We give definitions of value equality of two nodes and functional dependency of XML [25]. Intuitively, two nodes are value equal iff the two sub-trees rooted on the two nodes are identical. Definition 5. Two nodes x and y of an XML tree are value equal denoted as x=vy iff (1) lab(x)=lab(y); // Nodes x and y have same labels. (2) If x,y ∈ A or x=y=S, then val(x)=val(y); //Nodes x and y have same values if they are attributes or the string type. Else (i. e. , x,y ∈ E): // Nodes x and y are elements. (2.1) If ∃ a ∈ att(x), then ∃ b ∈ att(y) such that a=vb, and vice versa; //The attributes of nodes x and y are value equal. (2.2) If ele(x)=v1,…,vk, then there exists ele(y)=w1,…,wk, and for ∀ i ∈ [1,k], then ∃ vi=vwi , and vice versa. //The sub-elements of nodes x and y are value equal. Definition 6. An XML functional dependency (xFD) fxml over a given DTD D has the form (Sh,[Sx1,…,Sxn]Æ[Sy1,…,Sym]), where (1) Sh∈paths(D) is called the header path of fxml which defines the scope of fxml over D. And the last symbol of path Sh is an element name, i.e., last(Sh) ∈ E. If Sh≠∅ and Sh≠r, then fxml is called a local xFD which means that the scope of fxml is the sub-tree rooted on last(Sh); otherwise, fxml is called a global xFD which means the scope of fxml is the overall D. (2) [Sx1,…,Sxn] is called the left path of fxml. For i=1,…,n, it is the case that Sxi ∈ paths(D), Sxi ⊇PathSh, Sxi ≠∅, and last(Sxi) ∈ E∪A∪S. (3) [Sy1,…,Sym] is called the right path of fxml. For j=1,…,m, it is the case that Syj ∈ paths(D), Syj ⊇PathSh, Syj≠∅, and last(Syj) ∈ E∪A∪S.
Definition 7. For an XML tree T|=D and an xFD fxml: (Sh,[Sx1,…,Sxn]Æ[Sy1,…,Sym]), we call T satisfies fxml(denoted as T|= fxml) iff for any nodes H∈[[Sh]](let H=root if Sh=∅) and X1, X2 ∈ H[[Sx1∩…∩Sxn]] in T, if there exist nodes X1[[Sx1]]=vX2[[Sx1]],…, X1[[Sxn]]=vX2[[Sxn]], and it is the case that for any nodes Y1, Y2 ∈ H[[Sy1∩…∩Syn]] and H(p(X1)∩p(Y1)),H(p(X2)∩p(Y2)) ∈ H[[Sx1∩…∩Sxn∩Sy1∩…∩Sym]] such that Y1[[Sy1]]=v Y2[[Sy1]], …, Y1[[Sym]]=v Y2[[Sym]]. For fxml: (Sh,[Sx1,…,Sxn]Æ[Sy1,…,Sym]) over D, the right path of fxml can be always divided into a set of single path, so fxml can be represented as the following m FDs fxml j: (Sh,[Sx1,…,Sxn]Æ[Syj]), where j=1,…,m. We introduce the header path of xFD in Definition 6 and deals with the cross-branch situation properly in XML tree in Definition 7 by specifying the strict order of nodes considered in xFD.
3. NeT-FD: converting relational schemas to XML DTDs with keys and functional dependencies Given a relation schema R=(T, C, M, N, ∆), it can be converted to a DTD D=(E, A, P, R, r) by the following conversion method NeT-FD: 1. If there are more than one table in T, build a root element type r. 2. Each table t ∈ T is converted to an element type e in D, P(r)={P(r), e} if there is a root element type r, and E=E ∪ {e}. 3. For each table t ∈ T, do nest(t) on each column. Suppose the final nested table is t(c1, …, ck-1, ck, …, cn) and the nesting succeeded on the set of columns {c1, …, ck-1}, where ci ∈ C, i=[1, …, n]. (3.1) If k=1, then do one of the followings: (3.1.1) (Element-preferred mode) Each column ci is converted to an element type ei and P(e) ={P(e), ei}, let E=E ∪ {ei }. Go to step 4. (3.1.2) (Attribute-preferred mode) Each column ci is converted to an attribute type ai and R(e)= R(e) ∪ {ai}, let A=A ∪ {ai}. Go to step 4. (3.2) Otherwise, do: (3.2.1) For each column cj ∈ { ck, …, cn }, do one of the followings: (3.2.1.1) cj is converted to an attribute type aj when cj has domain or default value, R(ei)={R(ei), aj }, and let A=A ∪ {aj}. If N(cj )’s null is true, then aj is implied. Otherwise, aj is required. And the domain/default value of cj is applied in aj. Go to step 3.2.2. (3.2.1.2) cj is converted to an element type ej, and let E=E ∪ {ej}. If N(cj )’s null is true, then P(ei) ={P(e), e?j }, otherwise P(ei ) ={P(e), ej }. Go to step 3.2.2. (3.2.2) Each column ci ∈ { c1, …, ck-1} is converted to an element type ei , and let E=E ∪ {ei }. If
N(ci )’s null is true, then P(e)={P(e), e*i }, otherwise P(e) =P(e) ∪ {e+i }. 4. The set of FDs ∑ over D include: for each FD f ∈ ∆ with the form (ti.ci1,…,ti.cim)Æ(ti.cj1,…, ti.cjn), xFD [r.ei.ci1,...,r.ei.cim] Æ(r.ei.cj1,…,r.ei.cjn) is in ∑. Furthermore, if the key of ti is (ck1,…,ckl), then xFD [r.ei.ck1,...,r.ei.ckl]Ær.ei is also in ∑, where ci1,…,cim , cj1,…,cjn, ck1,…,ckl ∈ M(ti ). Brief explanations: (1) Step 1 constructs a root element for all other elements in the case that more than one table to be converted. The method NeT-FD can convert multiple tables to XML DTDs just by making each table as a sub-element of the root element r in the result DTD. (20 Step 2 constructs individual element type for each table. (3) Step 3 deals with columns of each table by applying nest operator to each table. In details, step 3.1 considers the special case when nest operator failed, and there are two preferred modes, i.e., attribute-preferred mode and element-preferred mode, respectively. Step 3.2 treats the case when nest operator succeeded. It is more natural and rational to convert a column in a relational table to an attribute in a DTD when the column has domain or default value as in Step 3.2.1.1. Step 3.2.2 deals with the columns on which the nest operator succeeded. (4) Step 4 converts the set of FDs over ∆ to the set of xFDs ∑ over D.
The following shows the final DTD corresponding to table Student in an attribute-preferred mode of NeT-FD: The set of xFDs ∑ over D includes: xFD1: [student.sno, student.course]Æ[student] and xFD2:[student.sno]Æ[student.name, student.gender]. The above two DTDs are different in dealing with columns sno and name. In element-preferred mode DTD, columns sno and name are converted into elements sno and name, while in attribute-preferred mode DTD, columns sno and name are converted into attributes sno and name. As which DTD (the element-preferred mode DTD or the attribute-preferred mode DTD) is more suitable and applicable, it is determined by the application specific requirements.
4. An example of Net-FD
5. Conclusions
In this section, we give an example to illustrate the application of the conversion method NeT-FD in Section 3. Consider the following table: Student(sno, name, gender, course), where the student number (sno) is unique, and a student can choose more than one course. The relational schema for table Student is R=(T, C, M, N, ∆), where T={Student}, C={ sno, name, gender, course }, M(Student)={ sno, name, gender, course }, and suppose N(sno)=(integer, true, false, ε, ε), N(name)=(string, false, true, ε, ε), N(gender)=(string, false, true, {male, female}, ε), N(course)=(string, false, false, ε, ε), ∆={{sno, course} is the key of Student, snoÆname, snoÆgender}. The final nested table for table Student is Student(course+, sno, name, gender), where the superscript “+” means the nesting succeeded on the attribute course. The following shows the final DTD corresponding to table Student in an element-preferred mode of NeT-FD:
We give a method called NeT-FD to convert relational schemas to XML DTDs in the presence of keys and functional dependencies in the paper, which can preserve the semantics implied by functional dependencies and keys of relational schemas and can convert multiple tables to XML DTDs. We choose DTD rather than other XML schemas such as XML Schema as a start point to research mapping method from relation to XML considering the following facts: (1) there is much similarity between XML Schema and DTD in structure. Both the structure of XML Schema and DTD can be represented as a tree model as described in the paper. (2) FDs over XML Schema can also be represented as relationship between paths as those over DTDs defined in the paper. So the concepts and method of mapping relational schemas to DTDs used in the paper can be generalized to the field of XML Schema just with some modifications in related formal definitions.
6. Acknowledgment This work is supported by Natural Science Foundation of Anhui Province (No.070412057), National Natural Science Foundation of China (No. 60563001), College Science and Research Plan Project of Xinjiang (No.XJEDU2004S04), and Foundation of Xinjiang University (No.QN040101).
7. References [1] [2] [3] [4]
[5]
[6]
[7]
[8]
[9]
[10]
[11] [12] [13]
T. Bray, J. Paoli, etc. “Extensible Markup Language rd (XML) 3 edition”, http://www.w3.org/TR/REC-xml. B. Choi, “What are real DTDs like”, Proc. of the 5th Workshop on the Web and Databases (WebDB'02), Madison, Wisconsin, USA. ACM Press, 2002, pp.43-48. M. W. Vincent and J. Liu, “Functional dependencies for th XML”, Proc. of 5 Asian-Pacific Web Conference (APWeb’03), LNCS, 2642, Springer, 2003, pp.22-34. M. Arenas and L. Libkin, “A normal form for XML documents”, Proceedings of Symposium on Principles of Database Systems (PODS’02), ACM press, Madison, WI, 2002, pp.85-96. W. Fan and L. Libkin, “On XML integrity constraints in the presence of DTDs”, Proceedings of Symposium on Principles of Database Systems(PODS’02), ACM Press, Madison, WI, 2001, pp.114-125. M. L. Lee, T. W. Ling, and W. L. Low, “Designing th functional dependencies for XML”, Proceedings of 8 Conference on Extending Database Technology (EDBT’02), LNCS 2287, Springer, 2002, pp.124-141. D. Lee, M. M. Chu, F. Chiu, and W. W. Chu, “Nestingbased relational-to-XML schema translation”, Proc. of the th 4 International Workshop on the Web and Database Systems (WebDB’01), 2001, pp.61-66. D. Lee, M. Mani, F. Chiu, and W. W. Chu, “NeT and CoT: Translating relational schemas to XML schemas using semantic constraint”, Proc. of the 2002 ACM International Conference on Information and Knowledge Management(CIKM’02), ACM Press, Madison, WI, 2002, pp.282-291. D. Lee, M. Mani, and W. W. Chu, “Schema conversion methods between XML and relational models”, Knowledge Transformation for the Semantic Web, Frontiers in Artificial Intelligence and Applications, Vol. 95, IOS Press, 2003, pp.1-17. T. Lv and P. Yan, “Mapping relational schemas to XML st DTDs with constraints”, Proc. of the 1 International Multi-Symposium on Computer and Computational Sciences (IMSCCS’06), Volume 2, IEEE Computer Society Press, 2006, pp.528-533. C. Kleiner and U. Lipeck, “Automatic generation of XML DTDs from conceptual database schemas”, GI Jahrestagung (1) 2001, pp.396-405 J. Fonga , H. K. Wonga, Z. Cheng, “Converting relational database into XML documents with DOM”, Information and Software Technology, 2003, 45(6), pp.335-355. J. Fong and S. K. Cheung, “Translating relational schema into XML schema definition with data semantic preservation and XSD graph”, Information and Software Technology, 2005, 47(7), pp.437-462.
[14] E. Agichtein, C. T. Howard Ho, V. Josifovski, and J. Gerhardt, “Extracting relational from XML documents”, Proceedings of the First International Workshop on XML Schema and Data Management (XSDM2003), LNCS 2814, Springer, 2003, pp.390-401. [15] S. Amer-Yahia, F. Du, and J. Freire, “A generic and flexible framework for mapping XML documents into relations”, Technical report, OGI/OHSU, 2004. [16] P. Bohannon, J. Freire, P. Roy, and J. Simeon, “From XML-Schemas to relations: a cost-based approach to XML storage”, Proceedings of the 18th International Conference on Data Engineering (ICDE’02), IEEE Computer Society Press, 2002, pp.564-80. [17] P. Bohannon, J. Freire, J. R. Haritsa, M. Ramanath, P. Roy, and J. Simeon, “Bridging the XML-relational divide with LegoDB: a demonstration”, Proceedings of the 19th International Conference on Data Engineering (ICDE’03), IEEE Computer Society, 2003, pp.759-760. [18] A. Deutsch, M. Fernandez, and D. Suciu, “Storing semistructured data with STORED”, Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD’99), ACM Press, 1999, pp. 431-442. [19] D. Florescu and D. Kossemann, “Storing and querying XML data using an RDBMS”, Bulletin of the Technical Committee on Data Engineering, 1999, pp.431-442. [20] S. Lu, Y. Sun, M. Atay, and F. Fotouhi, “A new inlining algorithm for mapping XML DTDs to relational schemas”, ER workshops 2003, LNCS, 2814, Spinger, 2003, pp.366-377. [21] J. Shanmugasundaram, K. Tufte, etc.”Relational databases for querying XML documents: limitations and th opportunities”, Proc. of the 25 VLDB Conference, Morgan Kaufmann Publisher, 1999, pp.302-314. [22] Y. Chen, S. B. Davidson, and Y. Zheng, “Constraints preserving XML storage in relations”, Technical Report, MS-CIS-02-04, University of Pennsylvania, 2002. [23] T. Lv, P. Yan, and Z. Wang, “Mapping DTD to relational schema with constraints”, Journal of Computer Science, 2004, 31(10S.), pp.443-444,457. [24] T. Lv, Q. Huang, and P. Yan, “Mapping XML DTDs to relational schemas in the presence of functional th dependencies over DTDs”, Proc. of the 10 Joint International Computer Conference (JICC2004), International Academic Publishers, Beijing, China, 2004, pp.242-246. [25] T. Lv and P. Yan, “Mapping DTDs to relational schemas with semantic constraints”, Information and Software Technology, 2006, 48(4): 245-252. [26] G. Jaeschke and H. J. Schek, “Remarks on the algebra of st non-first normal form relations”, Porc. of the 1 ACM SIGACT-SiGMOD Symposium of Principles of Database Systems (PODS1982), ACM Press, 1982, pp.124-138.