XML-Based Applications Using XML Schema Martin Kempa and Volker Linnemann Institut f¨ ur Informationssysteme, Universit¨ at zu L¨ ubeck, Osterweide 8, 23562 L¨ ubeck, Germany {kempa,linnemann}@ifis.mu-luebeck.de
Abstract. Xml Schemas provide a generalization of Document Type Definitions for describing the validity of a set of Xml documents. There is a growing number of applications that deal with Xml documents in various respects. One area of programs is concerned with analyzing Xml documents arriving, for example, over the internet. Another rapidly expanding area is best described by the term Xml generators. Xml generators usually are part of a WWW system, for example generators for Xml documents serving as views of data bases. Although Xml Schemas provide a concise means for describing the syntax of correct Xml documents in a specific domain, Xml generators usually treat the Xml documents as unstructured strings or, in the context of the Document Object Model, as trees the nodes of which belong to an unspecific Element-interface. The syntactical correctness, i.e. the validity of the generated Xml documents cannot be guaranteed at compile time but must be tested at runtime. This means that, in general, there is no ultimate proof that an Xml generator generates only valid documents according to an underlying Xml schema. This paper addresses this problem by introducing a new distinct interface for each element defined within an Xml schema. Each interface extends the Element-interface of the Document Object Model. This mechanism provides a generalization of a previous approach based on the weaker concept of Document Type Definitions presented by the authors.
1
Introduction
Xml Schemas [24] provide a generalization of Document Type Definitions [1,22] for describing the validity of a set of Xml documents. There is a growing number of applications that deal with Xml documents in various respects. One area of programs is concerned with analyzing Xml documents arriving, for example, over the internet. Another rapidly expanding area is best described by the term Xml generators. Xml generators usually are part of a WWW system, for example generators for Xml documents serving as views of data bases. A special class of Xml generators is given by Html generators because the Hypertext Markup Language Html [19] is redefinied as a special Xml application [23]. Although Xml Schemas provide a concise means for describing the syntax of correct Xml documents in a specific domain, Xml generators usually treat the Xml documents as unstructured strings or, in the context of the Document A.B. Chaudhri et al. (Eds.): EDBT 2002 Workshops, LNCS 2490, pp. 67–90, 2002. c Springer-Verlag Berlin Heidelberg 2002
68
M. Kempa and V. Linnemann
Object Model, as trees the nodes of which belong to an unspecific Elementinterface. The syntactical correctness, i.e. the validity of the generated Xml documents cannot be guaranteed at compile time but must be tested at runtime. This means that, in general, there is no ultimate proof that an Xml generator generates only valid documents according to an underlying Xml schema. Experiences in our media archive project [6,7] show that current tools like Java Server Pages [18,10] or JavaScript [16,9] for generating Html pages or Xml structures are not adequate because the syntactical correctness, i.e. the validity of generated Html pages and Xml structures is not guaranteed statically by the program. Instead, the validity must be “proven” dynamically by appropriate test runs. The following Java Server Page illustrates this problem. <TITLE> A Simple Server Page
Besides being quite cumbersome, the language mechanism of Java Server Pages do not guarantee the validity of the generated Html phrase. For example, changing the program to <TITLE> A Wrong Server Page
still results in a correct Java Server Page in the sense that the Server Page processor and the Java compiler accept the program although the program does not generate correct Html. When using Java Server Pages, problems of this
XML-Based Applications Using XML Schema
69
kind have to be found dynamically by appropriate test runs. Similar problems arise with other server page mechanisms like PHP [5] or Informix Webdriver [11] or JavaScript on the client side. In previous work [14] we addressed this problem on the basis of Document Type Definitions (DTDs) [1,22]. In the meantime, DTDs are somewhat outdated because the capabilities of describing the document structure on the basis of regular expressions is rather limited. Nowadays, DTDs are being replaced by Xml Schemas [24], a mechanism which is much more general than DTDs and provides finer structuring mechanisms. Therefore, we address the problem of guaranteeing the validity of Xml documents according to an Xml schema. The goal is to develop language constructs that allow the compiling system to find errors within the generated structures concerning the validity with respect to an underlying Xml schema. In other words, the language constructs are defined such that the validity of all generated structures is guaranteed without any test runs. The basic idea is to introduce an interface for every syntactic construct. In the Xml-setting this means to introduce an interface for every element defined in an underlying Xml schema. Values of this interface can be inserted in other constructs only in places where the corresponding element is allowed according to the underlying Xml schema. This mechanism guarantees the syntactical correctness of all generated documents statically. Xml Schemas allow deeply nested content models. In this context the problem of naming of corresponding interface structures arises. For this purpose, a naming scheme is proposed in this paper. This paper is organized as follows. We start with a short description of Xml and the Document Object Model (Dom) in Sect. 2. In Sect. 3 we introduce our Validating Document Object Model (V-Dom) as an extension of the Dom. As opposed to Dom, trees in V-Dom are guaranteed to be correct according to an underlying Xml schema. This is accomplished by defining an interface for each element type. Trees according to these interfaces are statically allowed only in places which are intended for by the underlying schema. A naming scheme is introduced and we outline how to cope with the additional facilities of Xml Schema in V-Dom. Section 4 introduces Parametric Xml called P-Xml. P-Xml allows to generate Xml structures, more specifically, V-Dom structures, that are valid according to the underlying Xml schema. In contrast to plain V-Dom, PXml allows to generate Xml structures with much more clarity by using text oriented Xml with parameters instead of tree oriented V-Dom. Section 5 gives an example based on Wml, the markup language for the presentation of hypertext content on mobile phones. Section 6 gives some implementation details, Section 7 discusses related work before we conclude the paper with Sect. 8.
70
2
M. Kempa and V. Linnemann
Extensible Markup Language and Document Object Model
This section presents the basic concepts of Xml and its application programming interface Dom. A detailed introduction to Xml is given in [1]. Documents in Xml mainly consist of elements. Elements are enclosed by start tags and end tags. Elements are nested and a distinguished root element always contains the whole document. The content of an element can be a nested structure, character data, or a mixture of these. Moreover, elements can be augmented by attributes. The values of the attributes are assigned in the start tag of an element. Figure 1 presents a document of a purchase order, taken from the Xml Schema Primer [24]. It describes a purchase order generated by a home products ordering and billing application. The purchase order document consists of a 1 2 <shipTo country="US"> 3 Alice Smith 4 <street>123 Maple Street 5 Mill Valley 6 <state>CA 7 90952 8 9 10 Robert Smith 11 <street>8 Oak Avenue 12 Old Town 13 <state>PA 14 95819 15 16 Hurry, my lawn is going wild 17 18 19 <productName>Lawnmower 20 1 21 148.95 22 Confirm this is electric 23 24 25 <productName>Baby Monitor 26 1 27 39.98 28 <shipDate>1999-05-21 29 30 31
Fig. 1. Purchase Order document.
main element purchaseOrder (1-31) and the subelements shipTo (2-8), billTo (9-15), comment (16), and items (17-30). These subelements (except comment (16)) in turn contain other subelements, and so on, until a subelement such as USPrice (21,27) contains a number rather than any subelements.
XML-Based Applications Using XML Schema
71
In addition to specifying the nested structure of documents, Xml provides the use of a language description, sometimes called schema, allowing to describe a set of similar documents. This set of similar documents is called a markup language. The language description is defined by a Document Type Definition (Dtd) or alternatively by an Xml schema [24]. Because the content of the elements can consist of elements again, the language description specifies content models for element types describing which elements, how often, and in which order can appear in the content of an element type. Moreover, the possible attributes for element types are determined as well as their range of values. Elements that contain subelements or carry attributes are said to have complex types, whereas elements that contain numbers (and strings, and dates, etc.) but do not contain any subelements are said to have simple types. Some elements have attributes; attributes always have simple types. As a short example of an Xml schema, Figures 2 and 3 show the purchase order schema defining a purchase order markup language. The purchase order schema consists of a schema element (1) and a variety of subelements, most notably element, complexType, and simpleType which determine the appearance of elements and their content in instance documents. The element type purchaseOrder is declared (8) of type PurchaseOrderType. The element type comment is declared of type string (9). For the complex type PurchaseOrderType (10) a content model is defined (11-23). It consists of a sequence of the element types shipTo, billTo, comment and items. Additionally the attribute orderDate is declared (22). Analogously the complex type USAddress is defined (24-33), which has a sequence of name, street, city, state and zip element types as content model (25-31) and the attribute country (32). The complex type definition of Items (34-55) introduces an anonymous type definition for element type item (36-53). Similarly an anonymous type definition for simple types is possible (57-61). For documents of a markup language specified by Xml two properties are important. The weaker one is the property of being well-formed which requires that the logical structure of a document obeys a bracket-like order, thus for every start tag a corresponding end tag at the same level has to exist. A stronger property is defined by the property of being valid requiring that all declarations of the language definition, the Dtd or the Xml Schema, have to be respected. According to these definitions, the content of an element has to match the content model of its element type and its attribute values have to match the declared attribute types. In our example the document in Fig. 1 fulfills the requirements given in the schema of Figures 2 and 3. Therefore it is valid according to the given schema. The definition of the Document Object Model (Dom) [21] targets the standardization of programming interfaces for Xml-based applications. It specifies the logical structure of a document, the way to access the content data, and the way to manipulate it. The logical structure of a document in Dom is called structure model and conforms to a tree-like representation of the document. One important characteristic of the object representation is that it is determined by a state and a behavior.
72
M. Kempa and V. Linnemann 1
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
2 3 4 5 6 7
<xsd:annotation> <xsd:documentation xml:lang="en"> Purchase order schema for Example.com. Copyright 2000 Example.com. All rights reserved.
8
<xsd:element name="purchaseOrder" type="PurchaseOrderType"/>
9
<xsd:element name="comment" type="xsd:string"/>
10 11 12 13 14 15 16 17
<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:choice> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element name="singleUSAddress" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> <xsd:attribute name="orderDate" type="xsd:date"/>
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
<xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> <xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
Fig. 2. Purchase Order Schema (part 1).
A fragment of the document shown in Fig. 1 is represented in Dom as illustrated in Fig. 4. The root node stands for the element purchaseOrder and refers to the four child nodes acting for the elements shipTo, billTo, comment and items in the document. Analogous to every element in the document, a corresponding tree-like structure of objects is created. Dom as an object interface supporting documents of any markup language has a very general definition. Every document consists of nested elements containing suitable attributes, as shown in Fig. 4. Addressing every document in the same way is a disadvantage if only valid documents of a given schema should be processed and generated. Unfortunately a document not corresponding to the requirements of the underlying schema can be created without difficulties in Dom. Invalid documents usually cannot be detected until runtime requiring extensive testing.
XML-Based Applications Using XML Schema 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
73
<xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date" minOccurs="0"/> <xsd:attribute name="partNum" type="SKU" use="required"/> <xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/>
62
Fig. 3. Purchase Order Schema (part 2).
3
Validating Document Object Model
This section describes our extension to Dom named V-Dom. The idea of VDom is described in [13], where we present a document object model depending on the underlying Dtd. With Validating Dom (V-Dom) we present an extension to Dom to overcome its essential limitations. The idea is to generate strictly typed object interfaces for every markup language which is specified by a language description. These generated interfaces are in contrast to similar approaches [4] and [8] an extension of Dom. The strict typing just ensures thereby the static properties of validity at compile time of the document or document fragment being processed by an Xml-based application. Thereby, in applications using V-Dom, errors can be determined significantly earlier than in Dom-based applications where invalid documents are recognized not until runtime. This idea is inspired by [15] where context-free grammars and derivation trees are used to ensure the syntactical correctness of programs produced by program generators. This can be nicely applied to the context of Xml-based applications.
74
M. Kempa and V. Linnemann
Fig. 4. Document fragment represented in Dom.
The goal is to extend the Dom interfaces in such a way that the static characteristics of the validity can be ensured at compile time. In order to do this, the schema specifying a markup language is used to generate an extended interface for every element type in the language. The extended interfaces contain specific methods to ensure the content model of the element types as well as attributes and attribute values. The fact that the specification of Dom already defines such a specialization for Html knowing only specific methods for attributes shows that this is a natural extension of Dom. As in our approach, the Dom specialization for Html specifies a specific interface specializing the Dom interface Element for every element type being defined in the Dtd for Html. We illustrate the transformation by the following example of a complex type definition.
XML-Based Applications Using XML Schema
75
<sequence> <element name="singAddr" type="USAddress"/> <element name="twoAddr" type="twoAddress"/> <element ref="comment" minOccurs="0"/> <element name="items" type="Items"/>
Consider the complex type definition of PurchaseOrderType which has a sequence group as content model. The first component of the sequence is a choice expression, which has two alternatives, namely a singAddr element or a twoAddr element. The second and third component in the sequence are simply an optional comment element followed by an items element. We can generate a corresponding interface to PurchaseOrderType using union types. Figure 5 shows this interface. Three attributes singAddrORtwoAddr, 1 interface PurchaseOrderTypeType { ... 2 typedef union singAddrORtwoAddrGroup 3 switch (enum singAddrORtwoAddrST(singAddr,twoAddr)){ 4 case singAddr: singAddrElement singAddr; 5 case twoAddr: twoAddrElement twoAddr; } 6 attribute singAddrORtwoAddrGroup singAddrORtwoAddr; 7 attribute commentElement comment; 8 attribute itemsElement items; 9 }
Fig. 5. PurchaseOrderTypeType with union type.
comment and items (6-8) are declared, which stand for the three components of the sequence group.1 To reflect the choice group the union type singAddrORtwoAddrGroup (2-5) is generated. Using this approach leads to multiple extension problems. One familiar difficulty comes up when we want to extend one choice group in the complex type by a new alternative. We extend our example to the following type definition, for instance. <sequence> <element name="singAddr" type="USAddress"/> <element name="twoAddr" type="twoAddress"/> <element name="multAddr" type="multAddress"/> 1
Analogous to Dom we note the interface in IDL stressing the independence of a programming language for our transformation.
76
M. Kempa and V. Linnemann
<element ref="comment" minOccurs="0"/> <element name="items" type="Items"/>
In the programming code, using the generated interfaces, we have to change every piece of code which accesses the content of the attribute representing the choice group, singAddrORtwoAddr in the example above, by another case branch. We can solve this problem by generating separate interfaces for the sequence and choice groups using interface inheritance instead of union types. Another problem arises through the chosen naming scheme for unnamed group expressions which depends on the nested subexpressions of the unnamed expression. We call this naming scheme synthesized naming. Applying this naming scheme to our extended example results in the new name singAddrORtwoAddrORmultAddr for the choice group in the schema. This means that we can’t continue using our already written code, because all type names have to be changed from singAddrORtwoAddr to singAddrORtwoAddrORmultAddr. We can avoid this by generating interface names depending on the defining complex type name rather than on the choice alternatives. We call that naming scheme inherited naming. If we define this naming scheme recursively, we receive the following names in our example. The entire expression is named by PurchaseOrderTypeC, the first element of the sequence, the choice group, by PurchaseOrderTypeCC1, the second element, the comment element by PurchaseOrderTypeCC2 and the third element, the items element by PurchaseOrderTypeCC3. Recursively the singAddr in the choice expression gets the name PurchaseOrderTypeCC1C1 and the twoAddr element the name PurchaseOrderTypeCC1C2. Using this naming scheme allows the extension of choice groups without a change of the derived type names. Unfortunately this naming scheme is not appropriate for sequence groups. If we extend a sequence expression we receive the old name as well. But this time the type and values are really changing, i.e. a new name is desired. Thus the naming scheme for sequences has to be synthesized naming. As we can see, we have to merge both naming schemes, depending on the group expression to name. If we have a choice group we use inherited naming and if we have a sequence group or a list expression2 we take the synthesized naming. Applying this idea we get the following interfaces for our example. Figure 6 shows that the interface name PurchaseOrderTypeCC1Group appears because of the inherited naming scheme. Additionally we now use interface inheritance to reflect the choice group, as mentioned above. For this reason the super type PurchaseOrderTypeCC1Group is declared (2) from which the interfaces singAddrElement and twoAddrElement inherit (3,4). Another problem which we can’t solve with either naming scheme is, when a new element is inserted in the middle of a sequence expression. In this case the names of nested choice expressions, which appear after the inserted element, change as well. For this reason we suggest explicit naming by using named group declarations. 2
A list expression is a group with an attribute maxOccurs > 1.
XML-Based Applications Using XML Schema
77
1 interface PurchaseOrderTypeType { 2 interface PurchaseOrderTypeCC1Group {} 3 interface singAddrElement: PurchaseOrderTypeCC1Group { attribute USAddressType content;} 4 interface twoAddrElement: PurchaseOrderTypeCC1Group { attribute twoAddressType content;} 5
interface itemsElement { attribute ItemsType content;}
6
attribute PurchaseOrderTypeCC1Group PurchaseOrderTypeCC1; 7 attribute commentElement comment; 8 attribute itemsElement items; 9 }
Fig. 6. Purchase order type V-Dom interface.
<element name="singAddr" type="USAddress"/> <element name="twoAddr" type="twoAddress"/> <sequence> <element ref="comment" minOccurs="0"/> <element name="items" type="Items"/>
For example, this declaration yields a named interface AddressGroup as a super type of the interfaces singAddrElement and twoAddrElement. When given an Xml Schema, V-Dom creates object oriented interfaces as follows. First the given schema is transformed into its normal form. The normal form is defined onto element declarations, type and group definitions. 1. Element type declarations are in normal form, if they have a named type as content model. 2. Complex type definitions are in normal form, if they have no nested group expressions as content. Unnamed types are converted to named types, a type name is generated. 3. Every unnamed nested group expression has to be expressed by separate named group definitions, a group name is generated. Our example in normal form looks as follows. <element name="singAddr" type="USAddress"/> <element name="twoAddr" type="twoAddress"/>
78
M. Kempa and V. Linnemann
<sequence> <element ref="comment" minOccurs="0"/> <element name="items" type="Items"/>
Second, V-Dom transforms the normalized schema into object oriented interfaces by the following rules. 1. Element declarations in Xml Schema are mapped to interfaces. Because all element declarations have a named type as content, the interface has only one attribute of this type. For example, the interface purchaseOrderElement is created for the element type purchaseOrder. 2. Interfaces are created for type definitions that are declared in the Xml schema. To reassemble their content the definition is used. In our example the interface PurchaseOrderTypeType is introduced for the complex type PurchaseOrderType. 3. Group definitions are mapped to interfaces. 4. Content models of type sequence are transformed into separate attributes for every sequence element. For instance, the attributes comment and items are created for the elements of the sequence in the content model of PurchaseOrderType. 5. Content models of type list (maxOccurs > 1) are mapped to attributes of a generated list interface. The generated list interfaces are specializations of a generic list interface.3 Occurrence constraints are, as we can see, one restriction concerning the general validity. The resulting interface does not allow to check statically whether the number of elements matches the value of the occurrence attributes as required by the schema. 6. Content models of type choice are transformed into an attribute. This attribute has the type of the super type of all choice alternatives. In our example the interface PurchaseOrderTypeCC1Group is the super type of the interfaces singAddrElement and twoAddrElement, which correspond to the alternatives in the original choice group. 7. Attributes are mapped to attributes of suitable type. Attribute groups are normalized by mapping there definition to attributes. 8. The Xml Schema simple types are mapped to primitive types. If we look at our example in Fig. 1 in the representation of the generated interfaces of V-Dom, we get the instance illustrated in Fig. 7. In contrast to Dom, the static validity of the object hierarchy is ensured by the specialization of the object interfaces. 3
We use parametric polymorphism as mentioned in [3] as an extension to IDL to illustrate the generic list interface.
XML-Based Applications Using XML Schema
79
Fig. 7. Document fragment represented in V-Dom.
In the following we shortly describe how we tackle the additional facilities of Xml Schema comparing to Dtds, like type extension, type restriction, substitution groups, abstract elements, and abstract types. Currently we do not handle identity constraints and wildcards. All groups are treated like sequence groups. Xml Schema introduces type extension for complex types. In V-Dom we can reflect this relation simply by inheritance. In the example below a complex type Address is defined. In the definition of type USAddress we extend the base type Address by two more elements. <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> <extension base="Address">
80
M. Kempa and V. Linnemann
<sequence> <element name="state" type="string"/> <element name="zip" type="string"/>
Using type extension, Xml Schema allows to place elements of type USAddress at a location where an element Address is expected. In V-Dom this behavior is reflected by inheritance. The following interfaces are generated in V-Dom from the previous example. interface AddressType { attribute nameElement name; attribute streetElement street; attribute cityElement city; ... } interface USAddressType: AddressType { attribute stateElement state; zipElement zip; ... }
The interface USAddressType inherits from the interface AddressType. Conforming to the concept of inheritance instances of the subtype are allowed at locations where objects of the super type are required. Additionally Xml Schema introduces type restriction for complex and simple types. In our opinion type restriction is nothing else than a specific sort of inheritance, where some values of the super type are restricted to a subtype. This approach is already known in programming languages from Modula 2 [26], where base types could be restricted to certain subrange types. To enforce the restricted values validation checks at runtime are necessary. Another mechanism which Xml provides is called substitution group. It allows elements to be substituted for other elements. In the following example we declare two elements shipComment and customerComment and assign them to a substitution group with head element comment. <element name="shipComment" type="string" substitutionGroup="comment"/> <element name="customerComment" type="string" substitutionGroup="comment"/>
In this example the two elements can be used anyplace where comment could be used. In V-Dom this can be expressed by inheritance as well. The corresponding V-Dom interfaces for the example above are as follows. interface shipCommentElement: CommentElement { String content; }
XML-Based Applications Using XML Schema
81
interface customerCommentElement: CommentElement { String content; }
In this model we have no problems to use objects of type shipCommentElement or customerCommentElement in places where an object comment is required. Additionally in Xml Schema elements or types can be declared to be abstract. If an element is declared abstract only elements of the substitution group can appear in an instance document. If a type is declared abstract only derived types of this type can appear in an instance document. These restrictions conform nicely to the inheritance approach in V-Dom, if we declare the corresponding interfaces of the abstract elements and types to be abstract too. In the rest of this paper we assume that a mapping from V-Dom to Java classes is given.
4
Parametric XML
The object model described in the previous section provides a facility to process documents by using a programming language. It is based on the abstract syntax of the markup language. The dynamic creation of documents has to be done by nested application of constructors and methods of the model. For Xml-based applications, this way of programming is too tedious. Therefore, it turned out that a more page oriented programming technique is more appropriate. This programming view leads to the development of several script-like extensions of Html for this purpose, like PHP or Informix Webdriver. The enhancements enrich Html by programming languages constructs. This allows to generate page fragments at runtime at the server and combine it with a Html base frame to form a complete Html page. These pages are also possible in Java Server Pages, although the concepts of Java Server Pages go beyond this programming style. In Fig. 8 a part of a typical Java Server Page is shown as an example. The page which is taken from our media archive project [6,7] generates the current directory in the media structure of the archive. The directory is presented as a select element and each subdirectory is expressed in an option element. The dynamic aspect is performed by a for-loop. To simplify the handling of Xml-based applications, we enrich a Java program using V-Dom interfaces by so called Xml constructors. Xml constructors are expressions which return a newly created element object of the corresponding V-Dom interface. The syntax is expressed in Xml notation. These constructors basically allow to use ordinary Xml document fragments. In the following example we declare a variable s of type shipToElement being a generated V-Dom interface. shipToElement s; s = <shipTo country="US">
82
M. Kempa and V. Linnemann
<select name="directories"> ..
Fig. 8. Java Server Page example in Wml. Alice Smith <street>123 Maple Street Mill Valey <state>CA90952 ;
After the declaration the introduced constructor variant is used. A shipTo element consisting of a start tag and an end tag and five nested elements, name, street, city, state, and zip, as content is assigned to the object variable s. Another extension we introduce is the possibility to use variables of V-Dom objects inside our Xml constructors. Inside an Xml constructor a V-Dom element variable can be used as an Xml element variable. The variable is marked by the notation $. A variable is allowed only in places where the corresponding element is intended for according to the underlying Xml schema. Using this notation, we can specify and manipulate a V-Dom object somewhere in the program and use this element elsewhere in an element constructor in such a way that the validity according to the underlying schema can be guaranteed statically. shiptToElement s; nameElement n; n = Alice Smith; s = <shipTo country="US">
XML-Based Applications Using XML Schema
83
$n$ <street>123 Maple Street Mill Valey <state>CA90952 ;
The short example shows a declaration of two variables s and n of the generated V-Dom interfaces shipToElement and nameElement. Thereafter, values are assigned to the variables n and s by using the newly defined constructors. In the document fragment s the variable n appears which means that the current value of n is inserted. The validation of our P-Xml programs is done statically by a preprocessor program which is generated out of the language description (Fig. 9). The Xml Schema Preprocessor Generator generates P-XmlP-Xml Java program Preprocessor V-Dom Java program −−−−−−−−−−−−−−−−→ −−−−−−−−−−−−−−−−−→
Fig. 9. The validation process.
preprocessor parses the Xml constructors and validates them against the underlying document description, the schema. Note that this is done statically without having to run the Java program. The constructors are substituted by suitable V-Dom code. The substitute code consists of V-Dom constructors and content setting method calls. For the first example above, the constructor is replaced by the following generated code. shipToElement s; s = purchaseOrderDocument.createShipTo( purchaseOrderDocument.createUSAddress( purchaseOrderDocument.createName("Alice Smith"), purchaseOrderDocument.createStreet("123 Maple Street"), purchaseOrderDocument.createCity("Mill Valey"), purchaseOrderDocument.createState("CA"), purchaseOrderDocument.createZip("90952"))); s.setCountry("US");
84
M. Kempa and V. Linnemann
The shipTo object of the generated V-Dom interface is created using the V-Dom constructor createShipTo. According to the schema the content of createShipTo is declared of type USAddress. Therefore the parameter of the constructor createShipTo is of the V-Dom interface USAddressType. Because the complex type USAddress in the purchase order schema requires a sequence of five elements, its constructor has one parameter for every element. The types of these parameters correspond with the element types in the sequence. Therefore we have to create V-Dom objects for the elements name, street, city, state, and zip. Afterwards the attribute country is set to its value. After running the preprocessor, the resulting program code uses only V-Dom methods to process Xml documents. Therefore, the validity of the documents is guaranteed.
5
Example in WML
This section illustrates our approach utilizing a more extensive example. We assume that V-Dom generates the interfaces WMLPElement, WMLSelectElement and WMLOptionElement for a give Wml schema. Figure 10 shows the definition of a dynamic Wml page which generates the same pages as the example mentioned in Fig. 8. Although the code should be self-descriptive, we give some explaining remarks. As additional declarations three variables s, o and p are introduced the types of which conform to VDom interfaces WMLSelectElement, WMLOptionElement and WMLPElement (35). These V-Dom interfaces are generated for Wml. The content including one option element is assigned to variable s using an Xml constructor (12-14). The content of the element option is built by using variable parentDir (13). Variables of interface String can be used as short-hand for objects of the Dom interface Text. The select element is extended by further elements of type WMLOptionElement (21), which are created by Xml constructors (18-20) within the for-loop (15-22). These elements utilize the current String array subDirs for their content, which is used likewise instead of Text objects. In the last step element p is produced by an Xml constructor (23-28) using the element variable s. The generated code after running the preprocessor appears in Fig. 11. The Xml constructor select is replaced by the suitable V-Dom method applications (12-14). Note that an auxiliary variable has to be declared to set the attribute value. Respectively the constructors option (18-19) and p (22-26) have changed to the V-Dom variant.
6
Implementation
The preprocessor generator is implemented in a straightforward manner. The preprocessor generator parses the language description. The result is the abstract syntax tree of the language description which is used for producing the preprocessor source code.
XML-Based Applications Using XML Schema 1 2 3 4 5
String[] subDirs; String parentDir, currentDir, subDir; WMLPElement p; WMLSelectElement s; WMLOptionElement o;
6 7 8 9 10 11
subDirs = mdmo.getChilds(1); currentDir = mdmo.getFullPath(); parentDir = currentDir.substring(0,parentDir.length()mdmo.getName().length()-1); if (parentDir.trim().equals("")) parentDir = "/workspace";
85
12 s = <select name="directories"> 13 .. 14 ; 15 for (int i = 0; i < subDirs.length; i++) 16 { 17 subDir = currentDir + "/" + subDirs[i]; 18 o = 19 $subDirs[i]$ 20 ; 21 s.add(o); 22 } // for 23 p =
24 $currentDir$ 25
26 $s$ 27
28
;
Fig. 10. Example in Parametric Xml.
The generated preprocessor source code is mainly an Xml-parser restricted to one markup language generating the code for constructing the V-Dom objects. For this purpose a parser generator can be used, so that the result of the preprocessor generator is a parser generator source file. The generated grammar is built by using an algorithm of [2], which constructs deterministic finite automata from regular expressions. The actions of the grammar rules in the generated source file are generated V-Dom constructor and V-Dom method calls. Additional context checks are necessary to recognize attributes and all groups, which we treat in similar ways. The implementation of a generated P-Xml-preprocessor has three phases. The first phase parses every Xml constructor, which is in fact an Xml document fragment extended by V-Dom object variables. Additionally, the variable declarations of V-Dom objects have to be analyzed to get their interface type. The result of this step is usually a parse tree in an abstract syntax or alternatively a V-Dom tree in the Xml context. This structure is passed to the second phase where for every node and every object in the tree the required V-Dom constructors and method applications have to be generated. This code is stored at the corresponding nodes. In the last step the attributed V-Dom tree is traversed to merge the resulting code, which replaces the given Xml constructor.
86
M. Kempa and V. Linnemann 1 2 3 4 5
String[] subDirs; String parentDir, currentDir, subDir; WMLPElement p; WMLSelectElement s; WMLOptionElement o;
6 7 8 9 10 11
subDirs = mdmo.getChilds(1); currentDir = mdmo.getFullPath(); parentDir = currentDir.substring(0,parentDir.length()mdmo.getName().length()-1); if (parentDir.trim().equals("")) parentDir = "/workspace";
12 WMLOptionElement _o = WMLDocument.createOption(".."); 13 _o.setValue(parentDir); 14 s = WMLDocument.createSelect(_o); 15 for (int i = 0; i < subDirs.length; i++) 16 { 17 subDir = currentDir + "/" + subDirs[i]; 18 o = WMLDocument.createOption(subDirs[i]); 19 o.setValue(subDir); 20 s.add(o); 21 } // for 22 23 24 25 26
p = WMLDocument.createP(); p.add(WMLDocument.createB("currentDir")); p.add(WMLDocument.createBr()); p.add(s); p.add(WMLDocument.createBr());
Fig. 11. Example in Java using V-Dom.
7
Related Work
The differences between our approach and existing tools and languages for developing WWW applications has been discussed in Sect. 1. Independently from these tools and languages a number of similar approaches have been presented facilitating Xml in existing programming languages. These approaches mainly differ in managing the structure of Xml documents. One direction is to allow the access of arbitrary Xml documents. For this very general programming interfaces [21,12], sometimes called low-level binding, have been developed. They are widely accepted and supported. It is the only standardized and language independent way for Xml processing. The major disadvantage of this approach is the expensive validation at run-time. Recently a series of proposals [4,8,20], called high-level bindings, have been presented. These approaches deal with the assumption that all processed documents follow a given structure, the language description (usually a Dtd or an Xml schema). This description is used to map the document structure onto language types, which reproduce directly the semantics intended for the language description. They provide no facilities to cope with constant Xml document fragments. Therefore the formulation of constant Xml document fragments has to be done by nested constructor or method calls, or by parsing of fixed documents, called marshaling. The first procedure is tedious for the programmer the second
XML-Based Applications Using XML Schema
87
one needs validation at run-time. Additionally they have been developed only for specific programming languages and are far away from becoming a standard. A third approach [17] can be classified between the two previous ones. It uses language types for expressing the document structure, but needs validation at run-time for verifying. The mechanism we use to guarantee the correctness in P-Xml is similar to an idea introduced in the setting of program generators about 20 years ago [15]. The basic idea of that work was to introduce a data type for each nonterminal symbol of a context free grammar. So called generating expressions allow the program generator to insert values of these data types in places where the corresponding nonterminal symbol is allowed according to the underlying grammar. This mechanism guarantees the syntactical correctness of all generated programs statically. Additionally it is worth noticing here that our approach can easily be coupled with XQuery [25] extending XQuery to a typed query language.
8
Concluding Remarks
This paper investigates the problem of guaranteeing the validity of Xml documents which are generated by an Xml generator program. For the underlying mechanism for describing validity we use Xml Schemas which provide a generalization of DTDs. Mechanisms like V-Dom and P-Xml are defined and generalized to Xml Schemas thus allowing to write Xml-generating programs, i.e. server pages that are guaranteed to generate only valid Xml expressions according to an underlying Xml schema. No test runs are necessary to “prove” the validity. This is accomplished by introducing an interface corresponding in a one-to-one manner to the element types of the Xml schema. Each interface extends the general Element-interface in Dom. The corresponding object model is called Validating Dom, abbreviated by V-Dom. P-Xml is an extension allowing to generate Xml by using an Xml-like notation instead of having to call corresponding methods in V-Dom. Using P-Xml, it is no longer necessary to generate the V-Dom tree manually by program. The corresponding methods to generate a V-Dom tree are generated automatically. In the future, we plan to investigate extensions to the upcoming standard query language XQuery [25] in such a way that a query which is applied to appropriate VDOM-objects can be guaranteed to result only in documents which are valid according to an underlying Xml schema. Of course, these extensions are not intended to restrict the flexibility of Xml in general. If there is no underlying Xml schema, there will be no schema for the query result and nothing can be guaranteed as far as validity of query results is concerned. But if there is an underlying Xml schema, the full potential of Xml schemas should be used in order to guarantee valid query results.
88
M. Kempa and V. Linnemann
References [1] Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web, From Relations to Semistructured Data and XML. Morgan Kaufmann Publishers, San Francisco, California, 2000. [2] A.V. Aho, R. Sethi, and J.D. Ullman. Compilers - Principles, Techniques and Tools. Addison-Wesley Publishing Company, 1986. [3] Suad Alagi´c. Type-checking oql queries in the odmg type system. ACM Transactions on Database Systems, 24(3):319–360, 3. Spetember 1999. [4] Chuck Altman. Using Dynamic XML, Developer’s Guide. ObjectSpace, Inc., Dallas, Texas, third edition, 1997-1999. Software version 1.1. [5] Stig Saether Bakken, Alexander Aulbach, Egon Schmid, Jim Winstead, Lars Torben Wilson, Rasmus Lerdorf, Zeev Suraski, Andrei Zmievski, and Jouni Ahto. PHP Manual. PHP Documentation Group, August 2001. Edited by Stig Saether Bakken and Egon Schmid. [6] Ralf Behrens. A Grammar-Based Model for XML Schema Integration. In British National Conference on Data Bases (BNCOD), pages 172–190, 2000. [7] Ralf Behrens and Volker Linnemann. XML-basierte Informationsmodellierung am Beispiel eines Medienarchivs f¨ ur die Lehre. Technical Report A-00-20, Schriftenreihe der Institute f¨ ur Informatik/Mathematik, Medizinische Universit¨ at zu L¨ ubeck, Dezember 2000. available at http://www.ifis.mu-luebeck.de/public, (in German). [8] Borland. XML Application Developer’s Guide, JBuilder. Borland Software Corporation, Scotts Valley, CA, 1997,2001. Version 5. [9] ECMA Standardizing Information and Communication Systems. ECMAScript Language Specification. Standard ECMA-262, ftp://ftp.ecma.ch/ecma-st/Ecma262.pdf, December 1999. [10] Duane K. Fields and Mark A. Kolb. Web Development with Java Server Pages, A practical guide for designing and building dynamic web services. Manning Publications Co., 32 Lafayette Place, Greenwich, CT 06830, 2000. [11] Informix Press. Informix Web DataBlade Module Users’s Guide. Informix Software, Inc., 4100 Bohannon Drive, Menlo Park, CA 94025-1032, May 1997. Version 3.3. [12] JDOM Project. JDOM FAQ. http://www.jdom.org/docs/faq.html. [13] Martin Kempa. VDOM: Dokumentenmodell f¨ ur XML-basierte World-WideWeb-Anwendungen. In Gunter Saake and Kai-Uwe Sattler, editors, Proceedings GI-Workshop Internet-Datenbanken, Berlin, pages 47–56. Otto-von-GuerickeUniversit¨ at Magdeburg, 19. September 2000. Preprint Nr. 12, (in German). [14] Martin Kempa and Volker Linnemann. V-DOM and P-XML – Towards A Valid Programming Of XML-based Applications. In Akmal B. Chaudhri and Awais Rashid, editors, OOPSLA ’01 Workshop on Objects, XML and Databases, Tamba Bay, Florida, USA, October 2001. [15] Volker Linnemann. Context-free grammars and derivation trees in algol 68. In Proceedings International Conference on ALGOL68, Amsterdam, pages 167–182, 1981. [16] Netscape Communications Corporation. JavaScript 1.1 Language Specification. http://www.netscape.com/eng/javascript/index.html, 1997. [17] Oracle Corporation, Redwood City, CA 94065, USA. Oracle9i, Application Developer’s Guide - XML, Release 1 (9.0.1), June 2001. Shelley Higgins, Part Number A88894-01.
XML-Based Applications Using XML Schema
89
[18] Eduardo Pelegr´i-Llopart and Larry Cable. Java Server Pages Specification, Version 1.1. Java Software, Sun Microsystems, 30. November 1999. [19] Dave Raggett, Arnaud Le Hors, and Ian Jacobs. HTML 4.0 Specification. Recommendation, http://www.w3.org/TR/REC-html40-971218/, 18. December 1997. W3Consortium. [20] Sun Microsystems, Inc. The Java Architecture for XML Binding, User Guide. http://www.sun.com, May 2001. [21] W3Consortium. Document Object Model (DOM) Level 1 Specification, Version 1.0. Recommendation, http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/, 1. October 1998. [22] W3Consortium. Extensible Markup Language (XML) 1.0. Recommendation, http://www.w3.org/TR/1998/REC-xml-19980210/, 10. February 1998. [23] W3Consortium. XHTML 1.0: The Extensible HyperText Markup Language, A Reformulation of HTML 4.0 in XML 1.0. Recommendation, http://www.w3.org/TR/2000/REC-xhtml1-20000126/, 26. January 2000. [24] W3Consortium. XML Schema Part 0: Primer. Recommendation, http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/, 2. May 2001. [25] W3Consortium. XQuery 1.0: An XML Query Language. Working Draft, http://www.w3.org/TR/2001/WD-xquery-20011220/, 20. December 2001. [26] Niklaus Wirth. Algorithmen und Datenstrukturen mit Modula-2. Leitf¨ aden und Monographien der Informatik. B. G. Teubner Stuttgart, 4. edition, 1986. ISBN 3-519-02260-5.
A
Generated V-DOM Interfaces
The following interfaces are generated from the simplified purchase order schema in Figures 2 and 3 using the previous transformations. 1 2 3
interface purchaseOrderElement { attribute PurchaseOrderTypeType content; }
4
interface commentElement { attribute string content;}
5 6
interface PurchaseOrderTypeType { interface shipToElement { attribute USAddressType content;} interface billToElement { attribute USAddressType content;} interface itemsElement { attribute ItemsType content;}
7 8 9 10 11 12
attribute attribute attribute attribute
shipToElement shipTo; billToElement billTo; commentElement comment; itemsElement items;
13 attribute Date orderDate; 14 }
90
M. Kempa and V. Linnemann
15 interface USAddressType { 16 17 18 19 20
interface interface interface interface interface
nameElement { attribute string content;} streetElement { attribute string content;} cityElement { attribute string content;} stateElement { attribute string content;} zipElement { attribute decimal content;}
21 22 23 24 25
attribute attribute attribute attribute attribute
nameElement name; streetElement street; cityElement city; stateElement state; zipElement zip;
26 attribute NMToken country; 27 } 28 interface itemsType { 29 interface itemElement { 30 interface productNameElement { attribute string content;} 31 interface quantityElement { 32 interface resPositiveInteger: positiveInteger { ... } 33 attribute resPositiveInteger content; 34 } 35 interface USPriceElement { attribute decimal content;} 36 interface shipDateElement { attribute date content;} 37 38 39 40 41 42 43
attribute attribute attribute attribute attribute }
productNameElement productName; quantityElement quantity; USPriceElement USPrice; commentElement comment; shipDateElement shipDate;
attribute SKU partNum;
44 attribute list itemList; 45 } 46 interface SKU: string { ... }