F-CUBE FACTORY: A FUZZY OLAP SYSTEM FOR SUPPORTING ...

Report 2 Downloads 61 Views
November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company °

F -CUBE FACTORY: A FUZZY OLAP SYSTEM FOR SUPPORTING IMPRECISION

Miguel Delgado Dpt. Computer Science and Artificial Intelligent, University of Granada, c/Periodista Daniel Saucedo Aranda, Granada, 18071, Spain [email protected] Carlos Molina∗ Dpt. Computer Science, University of Ja´ en, Campus Las Lagunillas, Ja´ en, 23071, Spain [email protected] L´ azaro Rodr´ıguez-Ariza Dpt. Financial Economy and Accounting, University of Granada Granada, 18071, Spain [email protected] Daniel S´ anchez Dpt. Computer Science and Artificial Intelligent, University of Granada, c/Periodista Daniel Saucedo Aranda, Granada, 18071, Spain [email protected] M. Amparo Vila Dpt. Computer Science and Artificial Intelligent, University of Granada, c/Periodista Daniel Saucedo Aranda, Granada, 18071, Spain [email protected]

Received (received date) Revised (revised date) The special needs of the OLAP technology was the main cause of the use of a multidimensional view of the data. To model complex or nor well defined domains or to integrate data from semi/non-structured sources (e.g. Internet) or with incompatibilities in their schemata is complicated using crisp multidimensional models. In these situations we need a model able to manage imprecision in the structures and data as a result of the modelling and/or integration. If we want to use expert’s knowledge in the analysis, we have to keep in mind that expert users are more comfortable when they use linguistic expressions instead of exact values. In this paper we present an extension of a fuzzy multidimensional model to support the use of linguistic labels in the definition of the

∗ Corresponding

author 1

November 21, 2005

2

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila hierarchies and the OLAP system that implements this model. Keywords: Multidimensional Model, Imprecision, Linguistic label, OLAP

1. Introduction Since the appearance of the OLAP technology6 different proposals have been made to give support to the special necessities of this technology. In the literature we can see two different approaches. One of this is to extend the relational model to support the structures and operations typical of OLAP. The first one following this idea was proposed by Gray et al.12 . From then on, more proposes have appeared14 and most of the present relational systems include extension to represent datacubes and operate over them. The other approach is to develop new models using a multidimensional view of the data. Many authors proposed model in this way1,4,5,17 . In the early 70’s, the necessity of flexible models and query languages to manage the ill-defined nature of information in DSS is identified11 . Nowadays, the application of the OLAP technology to other knowledge fields (e.g. medical data) and the use of semi-structured (e.g. XML) and non-structured (e.g. plain text) sources introduce new requirements to the models. Now the systems need to manage imprecision in the data and more flexible structures to represent the analysis domain. New models have appeared to manage incomplete datacube9 , the definition of fact using different levels in the dimensions20 and some approaches using fuzzy logic16,19 . One possibility to extend and improve the hierarchies of multidimensional schemes to handle the aforementioned situations is to incorporate the knowledge of experts about that hierarchies this being usually given in a linguistic manner, but let us point out that in many cases the experts not only use fuzzy or imprecise classes in the concepts but they also express vaguely the hierarchical relations themselves. To deal with these kind of hierarchies, in this paper we will present a model able to deals with both fuzzy classes in the concepts and fuzzy relations among them. This model extend a fuzzy multidimensional model 19 which only consider vaguely defined concept associated to the hierarchy levels. To obtain a model as general as possible three characteristics are desirable: • The model ought to be able to consider crisp and fuzzy classes or relations at the same time in a concrete hierarchy, • The model ought to be able to handle different labels sets for concepts and classes at the same time in a concrete hierarchy. Particularly it ought to handle label sets with different granularity levels, • The management of the hierarchy ought to be efficient from a computational point of view, because otherwise the model will be unuseful. The paper is organized as follows. In the next section we summarize the main features of the previous fuzzy multidimensional model19 . After that we introduce our proposal for a model able to consider both fuzzy concepts and fuzzy relations.

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

3

In section 4 the OLAP system that implements the model is presented. The last section is dedicated to conclusions and future work. 2. Fuzzy Multidimensional Model Although there is no standard multidimensional model, we shall briefly introduce the common characteristics of the first models proposed in literature. In classical multidimensional models, we can distinguish two different types of data: on one hand, we have the facts being analyzed, and on the other, the dimensions are the context for the facts. Hierarchies may be defined in the dimensions1,14,4,5 . The different levels of the dimensions allow us to access the facts at different levels of granularity. In order to do so, classical aggregation operators are needed (maximum, minimum, average, etc.). Other models, which do not define explicit hierarchies on the dimensions, use other mechanisms to change the detail level17,7 . The model proposed by Gray et al.12 uses a different approach. This model defines two extensions of the relational group by (rollup and cube) that are used to group the values during aggregation. The models that define hierarchies normally use many-to-one relations, so one element in a level can only be grouped by a single value of each upper level in the hierarchy. This makes the final structure of a DataCube rigid and well defined in the sense that given two values of the same level in a dimension, the set of facts relating to these values have an empty intersection. The normal operations (roll-up, drill-down, dice, slice and pivot) are defined in almost all the models. Some of these define other operations so as to provide the end user with more functionality1,17,7 . In this section we briefly introduce the fuzzy multidimensional model with explicit hierarchies. A more detailed description can be found in 19 . Here we only present the main concepts needed to understand the model implemented. 2.1. Fuzzy multidimensional structure As we have mentioned, the fuzzy multidimensional model use explicit hierarchies in the dimensions. These structures are defined using a set of levels and the relations between these that defined one or more hierarchies. Definition 1. A dimension is a tuple d = (l, ≤d , l⊥ , l> ) where l = li , i = 1, ..., n so that each li is a set of values li = {ci1 , ..., cin } and li ∩lj = ∅ if i6=j, and ≤d is a partial order relation between the elements of l so that li ≤d lk if ∀cij ∈ li ⇒ ∃ckp ∈ lk /cij ⊆ ckp . l⊥ and l> are two elements of l so that ∀li ∈ l l⊥ ≤d li ≤d l> . We denote level to each element li . To identify the level l of the dimension d we will use d.l. The two special levels l⊥ and l> will be called base level and top level respectively. The partial order relation in a dimension is what gives the hierarchical relation between the levels. An example of dimension on the ages can be found in the Figure 1.

November 21, 2005

4

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

Fig. 1. Example of a hierarchy over ages

The domain of a dimension will be the set of all the values that appears in all the levels defined. S Definition 2. For each dimension d, the domain is dom(d) = li i

In the example above the domain of the dimension Age is dom(Age) = {1, ..., 100, Y oung, Adult, Old, Y es, N o, All}. Definition 3. For each li the set Hi = {lj /lj 6=li ∧lj ≤d li ∧ ¬∃lk lj ≤d lk ≤d li }

(1)

and we call this the set of children of the level li . This set defines the set of all the levels which are below a certain level (li ) in the hierarchy. Moreover, this set gives the set of levels whose values or labels are generalized by the ones included in li . Using the same example of the dimension on the ages, the set of children in level All is HAll = {Group, Legal age}. In all the dimensions which we define, for the base level, this set will always be empty (as the definition shows). In the case of fuzzy hierarchies, an element can be related with more than one element in the upper level and the degree of this relation is in the interval [0,1]. The kinship relation defines this degree of relationship. Definition 4. For each pair of levels li and lj such that lj ∈ Hi , we have the relation µij : li × lj → [0, 1]

(2)

and we call this the kinship relation. The degree of inclusion of the elements of a level in the elements of their parent levels can be defined using this relation. If we use only the values 0 and 1 and we only allow an element to be include with degree 1 by a unique element of its parent levels, this relation represents a crisp hierarchy. Following the example, the relation between the levels Legal age and Age is of this type. The kinship relation in this situation is

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

5

Fig. 2. Kinship relation between levels Group and Age

½

1 if x ∈ [18, 100] 0 in other case ½ 1 if x ∈ [1, 17] µLegalAge,Age (N o, x) = 0 in other case

µLegalAge,Age (Y es, x) =

If we relax these conditions and we allow to use values in the interval [0,1] without any other limitation, we have a fuzzy hierarchical relation. This allows represent several hierarchical relations in a more intuitive way. An example can be seen in the Figure 2 where we present the group of ages according to linguistic labels. Furthermore, this fuzzy relation allows to define hierarchies in which there is imprecision in the relationship between elements in different levels. In this situation, the value in the interval shows the degree of confidence in the relation. Using the relation between elements in two consecutive levels, we can define the relation between each pair of values in different levels in a dimension. Definition 5. For each pair of levels li and lj of the dimension d such that lj ≤d li ∧lj 6=li , the relation ηij : li × lj → [0, 1] is defined as   µij (a, b) if lj ∈Hli L N (3) ηij (a, b) = (µ (a, c) ⊗ η (c, b)) in other case ik kj  l ∈H c∈l k

li

k

where ⊗ and ⊕ are a t-norm and a t-conorm, respectively, or operators from the families MOM and MAM defined by Yager24 , which include the t-norms and tconorms, respectively. This relation is called the extended kinship relation. This relation gives us information about the degree of relation between two values in different levels inside the same dimension. To obtain this value, it considers all the possible paths between the elements in the hierarchy. Each one is calculate aggregating the kinship relation between elements in two consecutive levels using a t-norm. Then the final value is the aggregation of the result of each path using a t-conorm. As an example, we will show how to calculate the value of ηAll,Age (All, 25). In this situation we have two different paths. Let see each one: • All - Legal age - Age. In the figure 3.a you can see the two ways to get to 25 from All going pass the level legal age. The result of this path is

November 21, 2005

6

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

Fig. 3. Example of the calculation of the extended kinship relation. a) path All - Legal age - Age b) path All - Group - Age

(1 ⊗ 1) ⊕ (1 ⊗ 0). • All - Group - Age. This is a situation very similar to the previous one. In the figure 3.b you can see the three different paths going through the level Group. The result of this path is (1 ⊗ 0.7) ⊕ (1 ⊗ 0.3) ⊕ (1 ⊗ 0). Now we have to aggregate these two values using a t-conorm to obtain the result. If we use the maximum as t-conorm and the minimum as t-norm, the result is ((1 ⊗ 1) ⊕ (1 ⊗ 0)) ⊕ ((1 ⊗ 0.7) ⊕ (1 ⊗ 0.3) ⊕ (1 ⊗ 0)) = (1 ⊕ 0) ⊗ (0.7 ⊕ 0.3 ⊕ 0) = 1 ⊕ 0.7 = 1 So the value of ηAll,Age (All, 25) is 1, which means that the age 25 is grouped by All in level All with grade 1. Definition 6. We say that any pair (h, α) is a fact when h is an m-tuple on the attribute domain which we want to analyze, and α ∈ [0, 1]. The value α controls the influence of the fact in the analysis. The imprecision of the data is manage by assigning an α value representing this imprecision. When we operate with the facts, the aggregation operators have to manage this values in the calculations. The arguments for the operator can be seen as a fuzzy bag23,8 due to they are a set of values with a degree in the interval [0,1] than can be duplicated. The result of the aggregation has to be a fact too. So, in the fuzzy case the definition of aggregation operators is the following. ˜ Definition 7. Been B(X) all the possible fuzzy bags defined using elements in X, P˜ (X) the fuzzy power set of X, and Dx a numeric or natural domain, we define an ˜ x ) → P˜ (Dx ) × [0, 1]. aggregation operator G as a function G : B(D When we apply an aggregation operator, we resume the information of a bag of values into an unique value. Not always is possible to undo this operations. So if we want to undo operations that reduce the level of detail in a DataCube, we need something to prevent this problem. So we define the object history that stores the aggregation states of a DataCube.

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

7

Definition 8. An object of type history is the recursive structure H

n+1

H0 = Ω = (A, lb , F, G, H n )

(4)

where • • • • •

Ω is the recursive clausure, F is the fact set, lb is a set of levels (l1b , ..., lnb ), A is an application from lb to F (A : lb → F ), G is an aggregation operator.

This structure enables detail levels of the DataCube to be stored while it is operated on so that it may be restored to a previous level of granularity. Now we can define the structure of a fuzzy DataCube. A DataCube can be considered to be the union of a set of facts (the variable to analyze) and a set of dimensions (the context of the analysis). In order to report the facts and dimensions, we need an application which, for each combination of values of the dimension, gives the fact related to these coordinates in the multidimensional space defined by the dimensions. In addition to these DataCube features, there are also the levels which establish the detail levels to which the facts are defined, and a history-type object to keep the aggregation states during the operations. The DataCube is therefore defined in the following way. Definition 9. A DataCube is a tuple C = (D, lb , F, A, H) such that • • • • •

D = (d1 , ..., dn ) is a set of dimensions, lb = (l1b , ..., lnb ) is a set of levels such that lib belongs to di , F = R ∪ ∅ where R is the set of facts and ∅ is a special symbol, H is an object of type history, A is an application defined as A : l1b × ... × lnb → F , giving the relation between the dimensions and the facts defined.

If for ~a = (a1 , ..., an ), A(~a) = ∅, this means that no fact is defined for this combination of values. Normally, not all the combinations of level values have facts defined. This situation is shown by the symbol ∅ when application A is defined. The basis of the analysis will be a DataCube defined at the most detailed level. We shall then refine the information while operating on the DataCube. This DataCube is basic. Definition 10. We say a DataCube is basic if lb = (l1⊥ , ..., ln⊥ ) and H = Ω. 2.2. Operations Once we have the structure of the multidimensional model, we need the operations to analyze the data in the DataCube. Over this structure we have defined the normal operations of the multidimensional model:

November 21, 2005

8

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

• Roll-up: go up in the hierarchies to reduce the detail level. In this operation we need to know the the facts related with each value in the desired level. The set of facts is obtained using the kinship relations as follow: Definition 11. For each value cij belonging to lr , we have the set  S  Fckp /ckp ∈ lk ∧ µik (cij , ckp ) > 0 if lr 6= lb Fcij = lk ∈Hlr  → −c ) = h} {h/h ∈ F ∧ ∃− c A(→ if lr = lb

(5)

→ where − c = (c1b , ..., cij , ..., cnb ).









Once we have the facts for each value, we have to aggregate them to obtain a new fact according to the new detail level. The influence of each fact in the aggregation will depend on the relation of the fact with the value considered and the α value assigned to the fact. So we need fuzzy operators for this process. Drill-down: go down in the hierarchies to increase the detail level. In this operation we use the history-type object commented. This structure stored the initial aggregation state when applying roll-up operations. So using the information stored in this structure we can get to a previous detail level. Dice: project over the DataCube using a condition. In this operation we have to identify the values in the dimension that satisfy the condition or that are related with a value that satisfy the condition. This relation is obtained using the kinship relation. Once we reduce the values in the dimension, we have to eliminate the facts which coordinates have been remove. Slice: reduce the dimensionality of the DataCube. When we apply this operation we eliminate one dimension of the DataCube, so we have to adapt the granularity of the facts using a fuzzy aggregation operator. Pivot: change the order of the dimensions. This operations does not affect the facts, only the order of the coordinates that defined then.

The properties of the operations have been studied19 . 2.3. User view We have presented a structure that manages imprecision by means of fuzzy logic. We need to use aggregation operators on fuzzy bags in order to apply some of the operations presented. Most of the methods previously documented give a fuzzy set as a result. As this situation can make the result difficult to understand and use in a decision process, we propose a two-layer model: one of the layers is the structure presented in the previous section; and the other is defined on this, and its main objective is to hide the complexity of the model and provide the user with a more understandable result. In order to do so, we propose the use of a fuzzy summary operator that gives a more intuitive result but which keeps as much information as possible. Using this type of operator, we shall define the user view.

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

9

Fig. 4. Graphical way to represent fuzzy numbers

Definition 12. Given a summary operator M , we define the user view of a DataCube C = (D, lb , F, A, H) using M as the structure CM = (D, lb , FM , AM ) where AM (a1 , ..., an ) = M (A(a1 , ..., an )), FM is the range of AM . We can define as many user views of a DataCube as the number of summary operators used. Therefore, each user can have their own user view with the most intuitive view of data according to their preferences by using a DataCube. As an example of this type of operator, we can use the one proposed by Blanco et al.3 . This operator proposes the use of the fuzzy number that best fits, in the sense of fuzziness, the fuzzy set or fuzzy bag. We can use more simple operators as the weighted average. As an example we shall apply both operators to the fuzzy bag {1/1, 1/2, 0.9/0.5, 0.8/2.3, 0.2/0.3, 0.1/2.5}: • Linguistic summary. Using this operator, the result is (1,2,1,0.5) which linguistic expression associated is ”more or less between 1 and 2”. • Weighted average. In this situation, the value shown to the user is 1.4. As you can see, in both case the user get a more intuitive access to the results. To give a intuitive way to interpret the result is important, as shown by Codd et al. in the 11th OLAP product evaluation rule6 . Most of the times the user will understand better a graphic than a table with the results. Present systems use charts to show the result to the decisor. In our model, to provide a graphical way, is even more important due to the fact that to interpret fuzzy values is complicate even to experts in fuzzy logic. We propose two methods to represent fuzzy numbers in a graphical way as an user view. Both approaches are shown in Figure 4. In Figure 4.a the approach followed is to use a color gradient to represent the membership grade of the values. The other approach (Figure 4.b) consists in change the width of a bar to represent the membership. Both can be use to build charts. An example is shown in Figure 5. This example represent fuzzy values related to crisp ones (the labels). In some situation, represent fuzzy values related to fuzzy labels can be interesting. Following the first approach we can do it. So, what we do is to aggregate the membership values in both axis, using a t-norm, and use the result to build the color gradient. Figure 6 shows an example of chart where the labels are defined using linguistic labels.

November 21, 2005

10

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

Fig. 5. Example of fuzzy charts

Fig. 6. Example of a chart with two fuzzy axis using different t-norms. A) product B) minimum C) Lukasiewicz

3. The Linguistic DataCube One possibility to extend and improve the hierarchies of multidimensional schemes is to incorporate the knowledge of experts about that hierarchies this being usually given in a linguistic manner, but let us point out that in many cases the experts not only use fuzzy or imprecise classes in the concepts but they also express vaguely the hierarchical relations themselves. It is obvious that considering a hierarchy with linguistic assessment is equivalent to consider the kinship and the extended kinship g relations to be linguistic. In the following [0, 1] will denote the set of fuzzy numbers on the unit interval.

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

11

3.1. Aggregation operators To aggregate labels with different granularity levels, particularly numerical values three different alternative approaches can be found: • To transform all labels to the finest current granularity and to operate at this level by means of the appropriated arithmetic. The main problem with this approach is that the imprecision associated to the levels with coarser granularity is reduced when these label are translated to more precise ones. In the case in which the finest current granularity level is the numerical one, to use this approach implies to associate to any label or fuzzy subset a characteristic value by means of some procedure (the gravity center of the fuzzy subset and the modal value are usual choices) and to operate with ”classical” arithmetic. • To transform all labels to the coarsest current granularity and to operate at this level by means of the suitable arithmetic. To use this approach, additional imprecision is added to the values with finer granularity. Now extended operators (by means of the Extension Principle) are to be used anyway. This complicates the design and the efficiency of the global procedures. • To operate with the current labels by means of extended operators these being applied to combine fuzzy subsets with different granularity level (even some of them being numerical values). Like in the case before this complicates the design and the efficiency of the global procedures. To cope with our particular problem of defining the operations, we will introduce an aggregation operator for the extended kinship relation trying to avoid the above mentioned inconveniences of former approaches. This will be an extension of the well known OWA operator by Yager 25 . Definition 13. Let w be any weight vector n-dimensional the function OW Aw : Rn → R X OW A(a1 , ..., an ) = wi aσi (6) i

where {σ1 , ..., σn } is a permutation of (a1 , ..., an ) such that aσi−1 ≥ aσi for all i ∈ {2, ..., n}, is called Ordered Weighted Average (OWA) of dimension n . It is well known that OW A(a1 , ..., an ) is always in the between of [M in(a1 , ..., an ), M ax(a1 , ..., an )] and the relative position in this interval depend on the weight values. When (a1 , ..., an ) are boolean this property is equivalent to say that OW A(a1 , ..., an ) is in the between of [AN D(a1 , ..., an ), OR(a1 , ..., an )]. The function α : [0, 1]n → [0, 1] n

α(w) =

1 X (n − i)wi n−1 i

(7)

November 21, 2005

12

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

measures the relative position of OW A(a1 , ..., an ) in the corresponding interval. If the value is near 1, then the operator performs as OR. On the other hand, if the value is near 0 then the operator used is AND. To extend the roll-up operation we need to aggregate linguistic labels and we propose to use an extended OWA operator to do that. Two tools are needed to define a such operator: a method to order labels and the extended arithmetic operations for the underlying fuzzy numbers. Several definitions of linguistic OWA may be found in the literature10,18 . The following definition establishes a very general framework for Fuzzy OWA characterization. Definition 14. Let w be any n-dimensional weight vector and OM a ranking OM en → R e given by ^ method for fuzzy numbers. The function OW Aw : R ^ OW A

OM

(˜ a1 , ..., a ˜n ) =

M

wi e aσi

(8)

i

where {σ1 , ..., σn } is a permutation of (a1 , ..., an ) such that a ˜σi−1 ≤OM a ˜σi i ∈ L {2, ..., n} and is a extended addition will be called Fuzzy Ordered Weighted Average (FOWA) . For the aggregation in the roll-up operation we will use a restricted version of the FOWA before, associated to weight vector with only two values being nonzero. Concretely we will use the OWA associated to a weight vector w such that w1 = β, wn = 1 − β, wi = 0 i ∈ {2, ..., n − 1}, β being a value in [0,1]. This operator will n g1] → [0, g1]. be denoted AOM : [0, β It is not difficult to show that AOM verifies α(w) = β and thus β allows us to β OM define the behavior of Aβ according to our discussion before. 3.2. The linguistic DataCube To use linguistic expressions in the hierarchical relations, the first element to modify is the kinship relation. Definition 15. For each pair of levels li and lj with lj ∈ Hi exists a relation g1] which is called the kinship relation. µ eij : li × lj −→ [0, Now we have to change the definition of the extended kinship relation to be able to manage the labels used in the hierarchical relations. To calculate the value we presented in previous sections. use the operator AOM β Definition 16. For any levels li y lj on dimension d, such that lj ≤d li and lj 6=li , g1] will be given by the extended kinship relation ηeij : li × lj → [0, ½ µ eij (a, b) if lj ∈Hli ηeij (a, b) = (9) (P , ..., P ) AOM l1 ln in other case β

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

where lk ∈ Hli y Plk AOM µik (a, b), ηekj (c, b)). 1−β (e

=

AOM (δc1 , ..., δcm )∀ci β



lk , being δc

13

=

According to this new characterization of the extended kinship relation, the definition of facts related with a value in a dimension is also changed. Definition 17. For any value cij ∈ li belonging to the domain of a dimension d the set  S  F˜ckp /ckp ∈ lk ∧ µ eik (cij , ckp ) 6= ˜0 if lr 6= lb ˜ l ∈H k l Fcij = (10) r  → → {h/h ∈ F ∧ ∃− c A(− c ) = h} if lr = lb will be called the set of facts related with cij The operations that have to be adapted are the ones that use the hierarchical relations during its application. These operations are roll-up and dice. When we apply roll-up we have to aggregate the fact values to adapt the detail level. As we have mentioned, the influence of each fact will depend on the alpha value associated and the relation with the value in the level. We get a unique value aggregating both values and we apply a fuzzy aggregation operator that manage this value. In the case of using linguistic label we will use a defuzzification method to be able to get the unique value needed in the aggregation process. So, the definition of the roll-up operations is extended as follow. g1] → [0, 1] any defuzzification method. The roll-up Definition 18. Let Df : [0, through dimension d, level lr (lr 6= l⊥ ), by the aggregation operator G on the DataCube C = (D, lb , F, A, H) is to be another DataCube C 0 = (D, lb0 , F 0 , A0 , H 0 ) where • lb0 = (l1b , ..., lr , ..., lnb ), • A0 (c1b , ..., crj , ..., cnb ) = G({(b, Df (AOM erb (crj , crb ))))/(b, α) ∈ F˜crj ∧ 1−β (α, η 1 r n A(cb , ...cb , ..., cb ) = (b, α)}), • F 0 is the range of A0 , • H 0 = (A, lb , F, G, H). Dice is the other operation that has to be adapted to work with the linguistic hierarchies. If we use the new extended kinship relation, the definitions will be the next one: Definition 19. The result of applying d ice using the condition β over the level lr of dimension di in the DataCube C = (D, lb , F, A, H) is another DataCube C = (D, lb , F , A, Ω) where: • D = {d1 , ..., di , ..., dn } where di = (li , ≤di , lb , l> ) with l = {lj /lb ≤di dlj } and

November 21, 2005

14

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

d0i .lj0

 

{cjk /cjk ∈ lj ∧ β(cjk )} if lj0 = lr = {cjk /cjk ∈ di .lj ∧ δrj (cjk )} if lj0 ≤d lr  {cjk /cjk ∈ di .lj ∧ δjr (cjk )} if lr ≤d lj0

where δij (c) = ∃cr ∈ lr β(cr ) ∧ η˜ij (cr , c) 6= ˜0, • A0 (c1b , ..., cib , ..., cnb ) = (h, α)/c1b ∈ d01 .lb0 ∧ ... ∧ cnb ∈ d0n .lb0 ∧ A(c1b , ..., cnb ) = (h, α), • F is the range of A. Let us observe that in these definitions we are using β and 1 − β as dual values for aggregation. It is quite possible to substitute 1 − β by another β 0 obtaining a valid characterization as well. The choice of β and β 0 is a matter of design and will depend on the problem domain and the needs of the decisor. 4. F -Cube Factory The fuzzy multidimensional model and the linguistic extension presented have been implemented in a OLAP system prototype. The system is completely built using Java language and it was design keeping in mind future extension for the multidimensional model. Now the software implements three DataCubes models: • ROLAP model: the system can manage DataCubes using a relational database to store the DataCube and to obtain the data to build new DataCubes. • MOLAP crisp model: DataCubes are also stored using a purely multidimensional structure implemented in Java. • MOLAP fuzzy/linguistic model: this model implements the fuzzy and linguistic multidimensional model presented. It uses a MOLAP way to manage the fuzzy DataCubes. Although the system support ROLAP and MOLAP models, we think it can not be consider as an Hybrid OLAP system22,2 (HOLAP) because of each DataCube is built using only one approach. We can differentiate two main parts in the system: the server is the one that implements the main functionality, and the clients, which are the interface to the user to the server functionality trying to give a simple and intuitive access to the DataCube. In next section we present some details of each part of the system. 4.1. F-Cube Factory server The server architecture is shown in Figure 7. The most important modules in the server are these: • DataCubes module: this module implements the three DataCube models previously mentioned. It gives a homogenous access to the multidimensional structure to the rest of the modules. One of the main functionalities is the

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

15

Fig. 7. Architecture of F -Cube Factory

queries. The efficiency is very important because OLAP systems have to give support for ad-hoc queries in a reasonable time. In the fuzzy DataCube this is even more important due to the fact that each query implies the aggregation of a great amount of kinship relations. To improve the efficiency the system pre-computes the extended kinship relations from each level to the basic level. This task is carried out when building the fuzzy DataCube. A DataCube is built one time meanwhile we use the same DataCube for several queries, so the time spent in aggregating the kinship relations is only taken when the user does not suffer the delay. In this module the user views for the fuzzy DataCubes are included. To add new user views to the server is very easy: you only need to extend a Java class and register in the server configuration. The calculation of a user view is only made the first time the system need the fact and stores it to be used the next times the system needs it. • Aggregation functions module: This module interact with the previous one when we want to change the detail level, which is translated in a query. It has implemented the normal function for crisp DataCubes (max, min, sum, average and count) and fuzzy ones, using an adaptation of Rundensteiner and Bic’s operators. Definition 20. Been R an operator defined by Rundensteiner and Bic21 , and F˜ a fuzzy bag over the facts. We define the operator GR as GR (F˜ ) = (R(F˜ 0 ), 1), where F˜ 0 = {α/h such that (h, α) ∈ F˜ }. Adding new aggregation function is as easy as in the case of user views.

November 21, 2005

16

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

Fig. 8. F -Cube Factory web client

• OLAM Module: The use of Data Mining methods is useful to improve the analysis processes. When we use OLAP system as support for the data, these processes are call On-line analytical mining 13 (OLAM). If we use fuzzy logic in the methods or in the DataCubes structure, it is call Fuzzy OLAM15 .We are currently developing Fuzzy OLAM technics over the fuzzy/linguistic multidimensional model. • Server API module: this module implement the API to access all the functionality in the server. This is the access point for the clients. 4.2. F-Cube Factory client The main objectives of the client are: • The client has to be light enough to be use in a normal personal computer. • And the most important is that it has to implement an intuitive access to the server functionality. The client is web based, so the user only need to access to a web site using a normal web browser. Figure 8 shows the aspect of the user interface. The user only has to select the option needed to access to a DataCube without needing to know any specific DML or DDL language. The resulted DataCubes of queries are shown to the user using tables (Figure 9) and charts (Figure 5 was built using this functionality) for all type of DataCubes. 5. Conclusions and Future Work The use of experts’ knowledge may improve the analysis possibilities when using DataCubes. In many cases the knowledge will be given in a linguistic manner. The multidimensional model has to be able to manage concepts and relations between these using this linguistic data. In this paper we have presented an extension of a multidimensional model to use linguistic labels in the hierarchical relations. This extension implies the definition of a new aggregation operator based on Yager’s

November 21, 2005

11:56

WSPC/INSTRUCTION FILE

paper

F-Cube Factory: A Fuzzy OLAP System for Supporting Imprecision

17

a) Without user views

b) Using Linguistic Summary user view. Fig. 9. Tables for a DataCube

OWA to obtain a completely functional DataCube model. An OLAP system prototype implementing the fuzzy multidimensional model and the linguistic extension is presented too. The next step is to study data mining techniques on the fuzzy/linguistic multidimensional model proposed to improve the analysis possibilities and implement then in the system presented. References 1. R. Agrawal, A. Gupta, and S. Sarawagi. Modeling multidimensional databases. Technical report, IBM, IBM Almaden Research Center, Septiembre 1995. 2. Markus Blaschka Barbara Dinter, Carsten Sapia and Gabriele Hfling. OLAP market and research: Initiating the cooperation. Journal of Computer and Information Management, 2(3), 1999. anchez, Jos´e M. Serrano, and Mar´ıa A. Vila. A new proposal 3. Ignacio Blanco, Daniel S´ of aggregation functions: The linguistic summary. Lecture Notes in Computer Science, 2715:127–134, January 2003. 4. L. Cabibbo and R. Torlone. Querying multidimensional databases. In Proceeding of the 6th Int. Workshop on databases programming languages (DBPL6), Estes Pork (USA), 1997. 5. Luca Cabibbo and Riccardo Torlone. A logical approach to multidimensional databases. In EDBT ’98: Proceedings of the 6th International Conference on Extending Database Technology, pages 183–197, Londres, UK, 1998. Springer-Verlag. 6. E.F. Codd. Providing OLAP (On-line Analytical Processing) to user-analysts: An IT mandate. Technical report, E.F. Codd and Associates, 1993.

November 21, 2005

18

11:56

WSPC/INSTRUCTION FILE

paper

Delgado, Molina, Rodr´ıguez-Ariza, S´ anchez & Vila

7. Anindya Datta and Helen Thomas. The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses. Decision Support Systems, 27:289–301, 1999. 8. Miguel Delgado, Mar´ıa J. Mart´ın-Bautista, Daniel S´ anchez, and Mar´ıa A. Vila. On a characterization of fuzzy bags. Lecture Notes in Computer Science, 2715:119–126, January 2003. 9. C. Dyreson. Information retrieval from an incomplete data cube. In Proceeding of the 22nd Int. Conf. on VLDB, pages 532–543, Estambul (Turqu´ıa), 1996. Morgan Kaufman Publishers. 10. J. L. Verdegay F. Herrera, E. Herrera-Viedma. Direct approach processes in group decision making using linguistic OWA operators. Fuzzy Sets and Systems, 79:165–176, 1996. 11. G.A. Gorry and M.S. Scott Morton. A framework for management information systems. Sloan Management Review, 13:50–70, 1971. 12. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, and M. Venkatrao. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Konwledge Discovery, 1:29–53, 1997. 13. Jiawei Han. Towards on-line analytical mining in large databases. SIGMOD Rec., 27(1):97–107, 1998. 14. R. Kimball. The Data Warehouse Toolkit. John Wiley & Sons, New York, 1996. 15. A. Laurent, B. Bouchon-Meunier, and A. Doucet. Towards fuzzy-OLAP mining. In Proc. Work. PKDD Database Support for KDD, pages 51–52, 2001. 16. Anne Laurent. Querying fuzzy multidimensional databases: unary operators and their properties. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 11(Supplement):31–45, 2003. 17. C. Li and X.S. Wang. A data model for supporting on-line analytical processing. In Proceeding of the 5th Int. Conf. in Information and Knoledge Management (CIKM), 1996. 18. L. Martinez-Lopez. Un Nuevo Modelo de Representaci´ on de Informaci´ on Ling¨ u´ıstica Basado En 2-Tuplas Para la Agregaci´ on de Preferencias Ling¨ u´ısticas. PhD thesis, Universidad de Granada, Granada, Spain, 1999. 19. Carlos Molina, L´ azaro Rodr´ıguez-Ariza, Daniel S´ anchez, and M. Amparo Vila. A new fuzzy multidimensional model. accepted in IEEE Transactions on Fuzzy Systems. 20. T.B. Pedersen, C.S. Jensen, and C.E. Dyreson. A foundation for capturing and querying complex multidimensional data. Information Systems, 26:383–483, 2001. 21. E.A. Rundensteiner and L. Bic. Aggregates in posibilistic databases. In Proceeding of the 15th Conf. in Very Large Databases (VLDB’89), pages 287–295, Amsterdam (Holanda), 1989. 22. C. Salka. Ending the ROLAP/MOLAP debate: usage based aggregation and flexible HOLAP. In 14th Int. Conf. on Data Engineering, page 180, 1998. 23. R. R. Yager. On the theory of bags. International Journal of General Systems, 13:23– 37, 1986. 24. R. R. Yager. Aggregation operators and fuzzy systems modeling. Fuzzy Sets and Systems, 67:129–145, 1994. 25. R.R. Yager. On ordered weighted averaging aggregtion operators in multicriteria decison making. IEEE Transactions on Systems, Man and Cybernetics, 18:183–190, 1988.