LNCS 7046 - The Role of Class Dependencies in ... - Springer Link

Report 4 Downloads 85 Views
The Role of Class Dependencies in Designing Ontology-Based Databases Chedlia Chakroun, Ladjel Bellatreche, and Yamine Ait-Ameur LISI/ENSMA - Poitiers University Futuroscope, France {chakrouc,bellatreche,yamine}@ensma.fr

Abstract. Recently, an important number of applications are producing and manipulating mountains of ontological data. Managing them efficiently needs the development of scalable solutions. Ontology-based databases (OBDB) are one of these solutions. An OBDB stores both ontological data and the ontology describing their meanings in the same repository. Several architectures supporting these OBDB were proposed by academicians and industrial editors of DBMS. Unfortunately, there is no available methodology for designing such OBDB. To overcome this limitation, this paper proposes to scale up the traditional database design approaches to OBDB. Our approach covers both conceptual and logical modeling phases. It assumes the availability of a domain ontology composed by primitive (canonical) and defined (non-canonical) concepts. Dependencies among properties and classes are captured and exploited to define a normalized logical model. A prototype implementing our design methodology on the OBDB OntoDB is outlined.

1

Introduction

Roughly speaking, ontologies have been introduced in information systems as knowledge models that provide with definitions and descriptions of the concepts of a target application domain [14,16,18,19]. Therefore, the number of ontological data manipulated and produced by different applications in various domains has increased. This is mainly due to three factors: (i) the development of domain ontologies, (ii) the availability of software tools for building, editing and deploying ontologies and (iii) the existence of ontology model formalisms (OWL, PLIB, etc.) which have considerably contributed to the emergence of ontology based applications. Facing this situation, managing, storing, querying these data and reasoning on them require the development of software tools capable of handling these activities. Moreover, due to the wide usage of such data, the need of a systematic and consistent development process appeared, taking into account the central objectives of persistence, scalability and high performance of the applications. Persistence of ontological data [7] was advocated by academic and industrial researchers. As a consequence, a new type of databases, named ontology-based databases (OBDB) dedicated to store and manage the emerged ontological data. An OBDB is a database model that allows both ontologies and their instances to be stored in a single repository [9]. Since an OBDB is a R. Meersman, T. Dillon, and P. Herrero (Eds.): OTM 2011 Workshops, LNCS 7046, pp. 444–453, 2011. c Springer-Verlag Berlin Heidelberg 2011 

The Role of Class Dependencies in Designing Ontology-Based Databases

445

database, it should be designed according to the classical design process dedicated to the database development, identified in the ANSI/X3/SPARC architecture. At the origin, this architecture does not recommend the use of ontologies as a domain knowledge model, but it refers to the notion of dictionary. In [10], an extension of this architecture giving ontologies their right place during the life cycle of database scheme process design has been proposed. But, the reality is different. Indeed, when exploring the literature, we figure out that most of the research efforts were concentrated on the physical design phase, where various storage models for ontological data were given. OntoDB [9], Sesame [4], Oracle [8], DLDB2[15], etc. are examples of such systems. Note that each system is designed according to a given architecture and has its own storage model. Three main architectures have been proposed. In this first architecture, the ontology and its associated data are stored in an RDF triples structure: (subject, predicate, object). There is no identified separation between ontology and data. This architecture is proposed in Oracle [8], etc. In the second architecture, ontology and its ontological data are stored independently into two different schemes. Sesame is an example of such architecture. The storage model used for the ontology is based on RDFS, whereas data may be represented using different storage models. These two architectures share a common property: the ontology model (RDF or RDFS) is fixed. Unlike logical models in the ANSI/X3/SPARC architecture, their physical structure is static and does not evolve according to the stored ontological data model. The third architecture extends the second one, by adding a new part, called, the meta-schema part. This part offers flexibility of the ontology part modeling, since it is represented as an instance of the meta-schema. Indeed, it may allow: (1) a generic access to the ontology part, (2) the evolution of the used ontology model by adding non-functional properties (functional dependencies [2], Web services, etc.) and (3) a storage of different ontology models (OWL, PLIB, etc.). This third architecture is proposed in OntoDB [9], with PLIB as the ontology model. For instance, OntoDB [9] uses a horizontal storage model, where a single table is associated to each ontology class with one column per each property. Unlike previous architectures where the ontology model is fixed, the third architecture supports the evolution of the ontology model through the meta-schema. To facilitate the exploitation of these OBDB systems, different query languages were proposed (OntoQL [12], SPARQL1 , etc.). Based on the spectacular development of OBDB, the development of an OBDB design methodology becomes a crucial issue for companies. In order to encourage designers to advocate a such methodology, it should then scale up the traditional database design approaches by offering similar steps: conceptual, logical and physical. Actually, to design an OBDB from a given domain ontology, we have to choose a favourite architecture, to identify the relevant storage models and to establish mappings between ontological concepts and the target entities of the chosen storage model. This design procedure suffers from (a) the presence of duplicated and inconsistency data when populating the target OBDB (lack of integrity constraints) and (b) the lack for data access transparency. 1

http://www.w3.org/TR/rdf-sparql-query/

446

C. Chakroun, L. Bellatreche, and Y. Ait-Ameur

Recently, some research studies recommending the ontologies use to design traditional databases and data warehouses have been described in the literature. The similarities between ontologies and conceptual models [18] and the reasoning capabilities offered by ontologies have motivated such studies. Indeed, [19] used linguistic ontologies to design traditional databases while [14,16] used conceptual ontologies to design multidimensional data warehouses. We may observe that in the last decade, data warehouses faced the same phenomena as for OBDBs, where most of the studies focused on the physical design [6]. So, although conceptual design and requirement analysis are two key steps within the database and data warehouse design processes [11], they were neglected in the first era of OBDBs. In classical databases, redundancy is addressed by normalizing the logical model obtained from a conceptual model. This normalization process is performed thanks to the exploitation of the available functional dependencies (FD) between properties. Recently, a couple of studies enriched ontology models by FD defined on properties [2,5]. In [2], we proposed a design methodology for OBDB by considering FD defined only on properties of canonical classes. In the context of ontologies, two types of classes may be distinguished: (1) canonical (CC) and (2) non-canonical (NCC) (see Section 2.1). The presence of the non-canonical classes generates a new form of dependencies between classes [17]. The objectives of our work consist in (1) exploiting definitions of classes to extract class dependencies (CD), (2) exploiting CD to identify CC and NCC, (3) developing a methodology for designing OBDB, including the conceptual and logical models in order to identify the persistent classes (CC) and the views (NCC) and (4) deploying the methodology on a particular OBDB system. This paper is divided into five sections. Section 2 describes the basic concepts related to dependencies between ontological concepts, a formal ontology model and the notion of dependency graph. The main steps of our methodology are presented in Section 3. Section 4 shows a validation of our methodology on the LUB ontology. Finally, Section 5 concludes the paper by summarizing the main results and suggesting future work.

2

Basic Concepts and Formalization

In this section, we first present some concepts, definitions and formalizations related to (i) ontologies and (ii) dependency graph exploiting class dependencies. 2.1

Taxonomy of Ontologies

A taxonomy of ontologies has been proposed in [13] where three layers are identified: (1) Conceptual Canonical Ontologies (CCO) can be considered as shared conceptual models containing the core classes concepts, (2) Non Conceptual Canonical Ontologies extend CCO by allowing the designers to introduce derived classes and concepts and (3) Linguistic Ontologies. This taxonomy helps designers to identify CC and NCC of an ontology as shown in the next example.

The Role of Class Dependencies in Designing Ontology-Based Databases

447

Example 1. Figure 1 represents an extended fragment of the Lehigh University Benchmark (LUB) 2 , where the University (U) class defined as the union of the two CC: PublicUniversity (PU) and PrivateUniversity (PRU) (U ≡ P U ∪ P RU ) is added. Based on the ontology definition, CC and NCC are identified as follows: N CC LUB = {U, Student_Employee, M asterCourse, F renchP rivateU niversity}, CC LUB = C LUB − N CC LUB , where C LUB represents all LUB classes. Considering that NCC are defined from CC, a dependency relation between CC and NCC may exist. For example, if we consider the definition of U , a dependency is defined as: (P U, P RU ) −→ U . Note that other types of dependencies between ontological concepts (see the next section) have been studied in the literature.

Fig. 1. Extended LUB Ontology

2.2

Fig. 2. Dependencies Classification

Dependencies between Ontological Concepts

In this section, we show the interest of capturing dependencies between ontological concepts for designing OBDB. Our dependencies analysis [2,5,17] gives rise to the following classification (Figure 2): instance driven dependencies (IDD) and static dependencies (SD). IDD is quite similar to functional dependencies (FD) in traditional databases. IDD may concern either properties [5] (IDDP ) or classes [17] (IDDC ) of the ontology. In [5], authors proposed a formal framework for handling FD constructors for any type of OWL ontology. Three categories of FDs are identified. In [2], we supposed the existence of FD involving simple properties of each ontology class. For instance, if we consider a class Student with a set of properties id and age, the FD id → age may be defined. In [17], the authors proposed an algorithm to discover FD among concepts of an ontology that exploits the inference capabilities of DL-Lite. A FD among two concepts C1 and C2 (C1 → C2 ) exists if each instance of C1 determines one and only one instance of C2 . For instance, if we consider a role mastersDegreeF rom with a domain and a range P erson and U niversity, the FD P erson → U niversity is defined. SD are defined between classes based on their definitions (Example 1). A SD between two concepts Ci and Cj (Ci −→ Cj ) exists if the definition of 2

The ontology and its class definitions described in OWL are available at: http://www.lehigh.edu/ zhp2/2004/0401/univ-bench.owl

448

C. Chakroun, L. Bellatreche, and Y. Ait-Ameur

Ci is available then Cj can be derived. This definition is supported by a set of OWL3 constructors. For example, if we consider a class Student having level as one of its properties, a class MasterStudent may be defined as a Student with a master level and the dependency Student −→ M asterStudent is obtained. 2.3

A Formal Model for Ontologies

An ontology O may be defined as < C, P, Applic, Sub, P F D, CD >, where: – C is the set of the classes used to describe the concepts of a given domain. – P is the set of all properties used to describe the instances of C. – Applic is a function defined as Applic : C → 2P . It associates to each class of O, the properties that are applicable for each instance of this class. – Sub is the subsumption relationship defined as Sub : C → 2C , where for a class Ci ∈ C, it associates its direct subsumed classes4 . – P F D: a mapping from the powerset of P onto P (2P → P) representing IDD defined on the applicable properties of P of a class Ci ∈ C (IDDP ). – CD: a mapping from the powerset of C onto C (2C → C) representing either static dependencies (SD) or instance-driven dependencies defined on classes (IDDC ). CD = SD ∪ IDDC , where: • IDDC . Let I1 , I2 be respectively the population of C1 and C2 . IDDC : C1 → C2 exists if for each ik ∈ I1 , there exists a unique ij ∈ I2 such that ik determines ij . • SD . Let Ii , Ij be respectively the population of Ci and Cj . SD : Ci −→ Cj means that Ii determines Ij . To facilitate the representation of dependencies between classes, we adopt a graphical representation discussed in the next section. 2.4

A Formal Model for Dependency Graph

Dependencies between different ontology classes may be represented by a directed graph G, called a dependency graph. Formally, G may be defined as a pair (C LO , A), where C LO are the nodes and A are edges. An edge ak ∈ A between a pair of classes Ci and Cj (∈ C LO ) exists, if a CD between Ci and Cj is established i.e. Ci −→ Cj ∈ SD. Figure 3 presents an example of a dependency graph of the LUB ontology. A minimum coverage-like classes C + may be deduced from G as a minimal set of CD allowing the knowledge of the set of all CD. Therefore, C + is a subset of CD. Consider the dependency graph illustrated in Figure 3. The minimum coverage-like classes is C + = {C2 , C3 −→ C1 ; C1 −→ C3 ; C5 , C6 −→ C4 ; C9 , C8 −→ C7 ; C3 −→ C15 }.

3

Our Proposal

In this section, we propose a complete methodology to design OBDB from an OWL domain ontology O respecting the ontology formal model. 3 4

http://www.w3.org/TR/owl-guide/ C1 subsumes C2 iff ∀x instance of C2 , x is instance of C1 .

The Role of Class Dependencies in Designing Ontology-Based Databases

Fig. 3. Example of a dependency graph

3.1

449

Fig. 4. Steps of our approach

Different Phases of our Methodology

Inspired from the database design process, our methodology starts from a conceptual model to provide logical and physical models. It is a five step method (Figure 4). It starts from the extraction of a local ontology (step 1) and then identifies CC and NCC by exploiting CD (step 2). As a further step, it defines in parallel a placement of the NCC (step 4) and a logical model for each CC (step 3). Finally, a logical model for the OBDB is generated (step 5). Step 1. The designer extracts a fragment of O (called local ontology(LO)) according to her/his requirements. The LO plays the role of conceptual model. Three extraction scenarios may occur: (1) LO = O means that O covers all the designer requirements. (2) LO ⊂ O means that O is rich enough to cover the user requirements. (3) LO ⊇ O means that O does not fulfill the whole designer requirements. The designer extracts from the O a fragment corresponding to the requirements and enriches it by adding new concepts/properties/dependencies. Step 2. Once the LO is extracted, an analysis is required to identify CC and NCC. Let C LO , CC LO and N CC LO be the set of all, canonical and non-canonical classes of the LO. A dependency graph is built from CD. This graph is used to determine the minimum set of CC. Our graph dependency is quite similar to the FD graph defined for classical databases to generate a minimum coverage and normalized tables [1]. The difference is that we have classes as nodes whereas attributes represent nodes in a FD graph. Based on this similarity, we adapt [1]’s algorithm to generate CC LO (for lack of space this algorithm is described in technical report available at: http://www.lisi.ensma.fr/ftp/pub/documents/reports/ 2011/2011-LISI-3.pdf). This algorithm has as input CD and generates CC. It starts by the calculation of isolated classes (CIsolated : classes not involved in

450

C. Chakroun, L. Bellatreche, and Y. Ait-Ameur

CD). These classes have to be canonical since they can not be derived from other ones. Then, the minimum coverage-like classes (C + ) is computed. It represents the minimum subset of basic CD to generate all the others. C + has to be treated so as to eliminate cycle relationships between classes. Then, CC are deduced. For example, if we consider the dependency graph described in Figure 3, CIsolated are {C10 , C11 , C12 , C13 , C14 }. Then, a cycle between C1 , C2 and C3 is identified (C1 −→ C3 ; C2 , C3 −→ C1 ). Therefore, the CD: C1 −→ C3 is eliminated. Finally, CC LO are calculated as {C2 , C3 , C5 , C6 , C8 , C9 , C10 , C11 , C12 , C13 , C14 }. Note that different sets of CC may be obtained. In this case, the designer may choose his relevant set. This distinction between class types is important in order to optimize the data representation and reduce redundancy in the final OBDB. Step 3. Based on the CC LO and N CC LO , two scenarios are distinguished: 1- N CC LO = φ: Only CC LO exists. Then, the FD defined on their properties are used for normalization and for the definition of their primary keys [2]. 2- CC LO = φ and N CC LO = φ. For each class in CC LO , the same mechanism described in Step 3.1 is applied. Then, for each class in N CC LO , a relational view is computed. For example, let nccj ∈ N CC LO be a NCC defined as the union of two CC cc1 (p1 , p2 , p3 ) and cc2 (p1 , p2 , p4 ), a view corresponding to nccj is defined as follows: ((Select p1 , p2 F rom cc1 ) U nion (Select p1 , p2 F rom cc2 )). One of the advantages of using views to represent NCC is to ensure the transparency in accessing data. A user may query an OBDB via these classes without worrying about the physical implementation of those classes. Step 4. The C LO may be defined without specifying the subsumption relationship. Thus, we propose to store all C LO regardless of their types (CC and N CC) in the OBDB taking into account the subsumption hierarchy of classes using a reasoner such as Pellet [3], etc. For example, the classes PU and PRU (U ≡ P U ∪ P RU ), will be subclasses of the U niversity class. Step 5. Now, the database administrator may choose any existing database architecture offering the storage of ontology and ontological data.

4

Case Study

In this section, we propose a case study of our design approach. We use OntoDB [9] as storage model architecture for our physical design phase (Step 5) for two main reasons: (i) it belongs to the third architecture (Section 1) that allows us to enrich the meta-schema by dependencies. (ii) A prototype is available, it has been used in several industrial projects and it is associated with a query language called OntoQL [12] (SPARQL like language) defined as an extension of SQL. It offers DDL statements for creating entities, classes and views. Our two algorithms for determining CC and NCC and for normalizing each class using PFD were encoded within Java language. To validate our proposal, a deployment process is done in OntoDB onto three parts: meta-schema, ontology and data.

The Role of Class Dependencies in Designing Ontology-Based Databases

4.1

451

Meta-Schema Part Deployment

In this section, we show how dependencies and canonicity identification can be made persistent in OntoDB. The meta-schema of OntoDB contains two main tables Entity and Attribute encoding the meta-meta-model level of the MOF architecture. To conduct our validation, the support of both ontological dependencies and canonicity in the initial kernel of OntoDB is required. To do so, we extended the meta-schema of OntoDB by incorporating dependencies and canonicity concepts. First, we develop a meta-model describing both dependencies and concepts type (CC and NCC). Figure 5.(a) shows the UML model of this meta-model (MMPCD). The following OntoQL statement instantiate the tables Entity and Attribute (Figure 5.(b)) with class dependency description. CREATE ENTITY #CD.LP (#its.LP.Classes REF(#Class) ARRAY) CREATE ENTITY #CD.RP (#its.RP.Class REF(#Class) ) CREATE ENTITY #CD (#its.CD.RP REF(#CD.RP), #its.CD.LP REF(#CD.LP))

Fig. 5. (a) MMPCD of meta-model, (b) Extended OntoDB by PFDs and CDs

Fig. 6. Ontology part Deployment

Fig. 7. Data part Deployment in OntoDB

Once this model is encoded, the obtained extended meta-schema gives us the possibility to store ontologies with their dependencies and class types.

452

4.2

C. Chakroun, L. Bellatreche, and Y. Ait-Ameur

Ontology Part Deployment

Once the meta-schema is extended, the LO is created. After creating the structure of a class, their dependencies should be attached. For example, the dependency U −→ P U is defined by the following OntoQL statement: INSERT INTO #CD (#its.CD.RP, #its.CD.LP) VALUES((SELECT #oid from #Class c WHERE c.#name=’PU’),(SELECT #oid from #Class c WHERE c.#name=’University’))

Instantiation of dependencies between properties is handled in a similar way. To identify CC and NCC (Step 2), we use the algorithm described in Section 3.1. To assure the subsumption relationships (Step 4), classes are placed in the ontology hierarchy of OntoDB (subclassOf Table) using Pellet 1.5.2 [3]. Figure 6 presents deployment results in the ontology part. 4.3

Data Part Deployment

In this part, we store the generated normalized logical model related to the used ontology (Step 3). We exploit PFD defined on each CC to generate normalized relations per CC. For NCC, views are generated. For instance, a view corresponding to the NCC University (U ≡ P U ∪ P RU ) is generated as follows: CREATE VIEW University as ((SELECT*FROM PRU) UNION (SELECT*FROM PU))

Figure 7 shows an example of the generated normalized logical model of the CC P ublicU niversity and the NCC U niversity. Up-to-date, the proposed methodology has been validated by a deployment on a third type architecture with OntoDB. However, it can be deployed on any database system if its ontology model supports dependencies between ontological concepts. Our approach generates a normalized logical model from a given domain ontology. Once the logical modeling phase is achieved, several scripts in different query languages (SQL, OntoQL, XML, etc.) are generated and run on the appropriate database.

5

Conclusion

This paper presented a five step methodology handling the consistent design of an OBDB from the conceptual model till the logical model. The approach considers LOs as conceptual models and borrows formal techniques issued from graph theory for dependency analysis, from description logics reasoning for class placement and from relational database theory for creating relational views. To the best of our knowledge, this work is the sole that considers different dependencies types between ontological concepts at different design levels of an integrated methodology. The different refinement of our approach preserve the functionalities offered by the LO. It is based on formal models and is independent of the chosen OBDB architecture. A deployment of our methodology in a particular OBDB system (OntoDB) is proposed. Currently, we are studying the problem of the deployment of OBDB based on different database architectures and storage models in order to cover the different phases of the life cycle of OBDBs.

The Role of Class Dependencies in Designing Ontology-Based Databases

453

References 1. Ausiello, G., D’Atri, A., Saccà, D.: Graph algorithms for functional dependency manipulation. Journal of the ACM 30(4), 752–766 (1983) 2. Bellatreche, L., Aït Ameur, Y., Chakroun, C.: A design methodology of ontology based database applications. Logic Journal of the IGPL (2010) 3. Evren, S., Bijan, P.: Pellet: An owl dl reasoner. In: International Workshop on Description Logics (DL 2004), pp. 6–8 (2004) 4. Broekstra, J., Kampman, A., van Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 54–68. Springer, Heidelberg (2002) 5. Calbimonte, J.P., Porto, F., Maria Keet, C.: Functional dependencies in owl abox. In: Brazilian Symposium on Databases (SBBD), pp. 16–30 (2009) 6. Chaudhuri, S., Narasayya, V.: Self-tuning database systems: A decade of progress. In: VLDB, pp. 3–14 (September 2007) 7. Chen, L., Martone, M.E., Gupta, A., Fong, L., Wong-Barnum, M.: Ontoquest: Exploring ontological data made easy. In: Proceedings of the International Conference on Very Large Databases, pp. 1183–1186 (2006) 8. Das, S., Chong, E.I., Eadon, G., Srinivasan, J.: Supporting ontology-based semantic matching in rdbms. In: VLDB, pp. 1054–1065 (2004) 9. Dehainsala, H., Pierra, G., Bellatreche, L.: OntoDB: An Ontology-based Database for Data Intensive Applications. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 497–508. Springer, Heidelberg (2007) 10. Fankam, C., Jean, S., Bellatreche, L., Aït-ameur, Y.: Extending the ANSI/SPARC Architecture Database with Explicit Data Semantics: An Ontology-based Approach. In: Morrison, R., Balasubramaniam, D., Falkner, K. (eds.) ECSA 2008. LNCS, vol. 5292, pp. 318–321. Springer, Heidelberg (2008) 11. Golfarelli, M., Rizzi, S.: Data Warehouse Design: Modern Principles and Methodologies. McGraw Hill (2009) 12. Jean, S., Aït Ameur, Y., Pierra, G.: Querying ontology based databases - the ontoql proposal. In: Proceedings of the 18th International Conference on Software Engineering & Knowledge Engineering (SEKE 2006), pp. 166–171 (2006) 13. Jean, S., Pierra, G., Aït Ameur, Y.: Domain ontologies: A database-oriented analysis. In: WEBIST (1), pp. 341–351 (2006) 14. Nebot, V., Berlanga, R., Pérez, J.M., Aramburu, M.J., Pedersen, T.B.: Multidimensional Integrated Ontologies: A Framework for Designing Semantic Data Warehouses. In: Spaccapietra, S., Zimányi, E., Song, I.-Y. (eds.) Journal on Data Semantics XIII. LNCS, vol. 5530, pp. 1–36. Springer, Heidelberg (2009) 15. Pan, Z., Zhang, X., Heflin, J.: Dldb2: A scalable multi-perspective semantic web repository. In: International Conference on Web Intelligence, pp. 489–495 (2008) 16. Romero, O., Abelló, A.: Automating multidimensional design from ontologies. In: DOLAP, pp. 1–8 (2007) 17. Romero, O., Calvanese, D., Abelló, A., Rodriguez-Muro, M.: Discovering functional dependencies for multidimensional design. In: DOLAP, pp. 1–8 (2009) 18. Spyns, P., Meersman, R., Jarrar, M.: Data modelling versus ontology engineering. SIGMOD Record 31(4), 12–17 (2002) 19. Sugumaran, V., Storey, V.C.: The role of domain ontologies in database design: An ontology management and conceptual modeling environment. ACM Transactions on Database Systems 31(3), 1064–1094 (2006)