Data summarization in relational databases ... - Semantic Scholar

Report 5 Downloads 228 Views
Information Sciences 121 (1999) 233±270

www.elsevier.com/locate/ins

Data summarization in relational databases through fuzzy dependencies J.C. Cubero *, J.M. Medina, O. Pons, M.A. Vila Dpto.Ciencias de la Computaci on e Inteligencia Arti®cial, Universidad de Granada, Avenida de Andalucia 38, E-18071 Granada, Spain Received 1 January 1998; accepted 25 December 1998 Communicated by Henri Prade

Abstract In this paper we deal with the problem of treating with dependencies in relational databases which do not hold in an exact manner as classical functional dependencies but in a weaker sense, i.e., we face with relations which satisfy dependencies such that `people with similar age and height have similar weight'. We model this relationship through the concept of fuzzy dependency. We see that these dependencies imply some kind of fuzzy redundancy, and, in order to avoid it, we propose to use a projection operator which leads us to partition a relation r into two projections, say r1 and r2 with a less amount of information. Then, we proceed to replace the original relation by these projections. In this process we must guarantee that we can recover the data we had in the original relation. This will be possible by using a special join operator applied to r1 and r2 . We must also guarantee that we can test the fuzzy dependency for new entries to the database in the same way either if we consider the original relation r or if we work with the projections r1 and r2 . We also show that this de®nition of dependency maintains the good properties of completeness of the classical case. Ó 1999 Elsevier Science Inc. All rights reserved. Keywords: Fuzzy dependencies; Relational databases; Resemblance; Fuzzy projection; Fuzzy join

*

Corresponding author. Fax: +34-958-243-317. E-mail address: [email protected] (J.C. Cubero).

0020-0255/99/$ - see front matter Ó 1999 Elsevier Science Inc. All rights reserved. PII: S 0 0 2 0 - 0 2 5 5 ( 9 9 ) 0 0 1 0 4 - 8

234

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

1. Preliminaries 1.1. Introduction The treatment of non-crisp information in databases has been accomplished over the last decade by several authors. The study of incomplete information has been addressed in [17,19,20,23], and the study of uncertainty in object data models has been accomplished in [27,32,35]. Logic and imprecise information have been studied in [16,24,31], whereas fuzzy [34] relational data models have been introduced in [3,25,29]. In this work we deal with the problem of treatment of fuzzy dependencies [4,8,12,25,26,33] in fuzzy and non-fuzzy (crisp) relational databases. The theory of normalization introduced by Codd (see [5,6,28]) is a systematic approach to properly design the schema of a database. The main idea is that if we are faced with a relation in a database satisfying a functional dependency (not implied by a primary key), then there can be redundancy and update problems. In order to avoid them, we can decompose the original relation into two (or more) new relations, in such a way that we do not lose any information, i.e., the decomposition is loss less. Anyway, in real databases, the design process is accomplished correctly, and so, it is not usual to ®nd such strict dependencies in the relations. Nevertheless, we can ®nd soft dependencies like the following: `the length of the petal of a certain ¯ower increases with respect to the length of the sepal', or `the weight of a person depends (more or less) on his height and age'. In these cases, we propose to accomplish a process of decomposition which allows us to extract the information given by such dependencies, and to compress the original data appearing in a table of a relational database. The idea is to tolerate some imprecision in the database (see [18,19] for other approaches to incomplete information in databases), which allows us to merge several tuples into a single one. Consider for instance the relation r appearing in Fig. 1. We model the imprecision by considering a set of labels attached to each attribute and replacing the corresponding data values by such labels, obtaining then the relation E…r† in Fig. 1. The decomposition of r is given by the relations r1 and r2 in Fig. 2. As we can see, we have reduced some redundancy because the second and third tuples (the values for height and weight) have been merged into a single one

Fig. 1. Incorporation of imprecise data values.

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

235

Fig. 2. Decomposition of r into two projections.

in r2 . A special join operator is used to recover the tuples we had in r (see Fig. 3). Of prime importance is to quantify how much imprecision we can tolerate, in order to guarantee that fuzzy values such as `about 85 kg' are close enough to the original data value 86. In order to do this, we shall use a measure of resemblance among the data elements. We can summarize this process saying that the decomposition of r into r1 and r2 , is characterized by: · r2 stores the information which is implied by the existence of the fuzzy dependency, whereas r1 represents the other information appearing in r. For new entries into the database, we can test the fuzzy dependency just by looking at the tuples in r2 . · The amount of information stored in r1 and in r2 is less than in r. · The original data appearing in r can be recovered (in fuzzy terms) through a fuzzy join of r1 and r2 . It is important to emphasize that the linguistic labels such as `about 85 Kg' are given by an expert. Thus, we should be able to discover some knowledge in the form of fuzzy rules, and after the decomposition process, some fuzzy redundancy will be removed. This allows us to obtain a better understanding of our world, because the fuzzy dependency is isolated in a separate relation (r2 in the example above). It is worth mentioning that future research work should be addressed to ®nd such labels, so the work of the expert decreases notably. The organization of the work is as follows: in this section we overview basic elements of the relational database model and introduce the notion of resemblance relation which will be used to relax the concept of functional dependency; we also give the notation used to treat fuzzy sets. The second section introduces the de®nition of fuzzy dependency for a fuzzy database, extending a resemblance relation to work with fuzzy values, and we give a projection operator which allows us to reduce the information appearing in such a fuzzy

Fig. 3. Recovery of r through the join of its projections.

236

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

database. In Section 3 we apply the concepts of previous sections to the case of a crisp database: we can apply directly the de®nition of fuzzy dependency introduced in Section 2, or alternatively, if we want to compress the information even more, then we can use another projection operator, although the price to pay is a more restricted de®nition of fuzzy dependency. We study its behavior regarding Armstrong's axioms and the possibility to test the fuzzy dependency when new tuples are inserted into the database just by looking at the projected relation. Finally, in Section 4, we deal with the problem of de®ning a fuzzy join operator which allows us to recover almost the same information which appeared in the original relation. We quantify the amount of lost information for consequent attributes through the concept of granularity level. 1.2. Relational database model We assume that the reader has a background in classical relational database theory, including normalization theory; hence, we will only introduce the notation (see [28] for more details). From now on, we de®ne a crisp data value as a usual tuple data value in a relation in the ®rst normal form. We shall use capital letters from the beginning of the alphabet to denote single attributes (A; B; . . .). For compound (set of) attributes we shall use capital letters from the end of the alphabet (X ; Y ; . . .). Each data value must belong to a domain DA attached to the attribute A: this domain represents, in general, a set of possible values for that attribute. Relations will be denoted by small letters such as r; s and tuples by t; u; . . . If we want to emphasize that the attribute A belongs to relation r, we shall use the terminology r:A. The value for an attribute A in a tuple t, will be represented by A…t†. For instance, r:X …t† is the set of tuple values for relation r and attributes X . REL will denote a relational scheme (a set of attributes), so that any relation r is an instance of a relational scheme REL (denoted by REL…r† whenever we want to emphasize this connection). The symbol  stands for the usual cartesian product operator, and PZ …r† denotes the usual projection operator over Z, applied to the relation r. The natural join of two relations r and s with relational schemes R and S is given by r ‚ s ˆ PR[…Sÿs:X † …rr:X ˆs:X …r  s†† ˆ P…Rÿr:X †[S …rr:X ˆs:X …r  s††;

…1†

where rr:X ˆs:X is a selection operator which chooses those tuples belonging to r  s, with equal X -values. The de®nition of the classical functional dependency is given by: De®nition 1. A relational scheme REL ˆ …Ai †iˆ1;...;n satis®es a -classical- functional dependency (f.d.) X ! Y with X ; Y  REL if and only if, every instance r of REL satis®es the following condition: 8t1 ; t2 2 r

If X …t1 † ˆ X …t2 † then Y …t1 † ˆ Y …t2 † must hold

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

237

X represents the antecedent attributes and Y the consequent attributes. If a relation exists in the database verifying the X ! Y f.d, then it should be decomposed into two new relations with relational schemes given by XY and XZ with Z ˆ …REL…r† ÿ XY †. In order to do this, the crisp projection operator is used, which merges the XY values of those tuples with equal antecedent values (and therefore, by applying De®nition 1, equal consequent values), then obtaining a relation called r2 ˆ PXY …r† with fewer tuples than r. It may be proved (see [28] for instance) that this relation satis®es the f.d, and in addition, for new entries to the database, we can test the dependency just by looking at the tuples in PXY …r†, which is computationally more ecient than comparing with the tuples in r. Furthermore, if we decompose the relation r into r1 ˆ PXZ …r†;

r2 ˆ PXY …r†;

…2†

then the natural join of both projections recovers exactly the same original relation r, i.e., it is a loss less decomposition r ˆ r 1 ‚ r2 :

…3†

It is pointed out that a relation satisfying a functional dependency su€ers from redundancy and updating problems. See [28] for more details. For instance, the following relation satis®es X ! Y , and stores the information that x1 is related to y1 , x2 to y2 , and so on, several times: Z z1 z2 z3 z4

X x1 x1 x2 x2 

Y y1 y1 y2 y2

In order to avoid these problems, the database manager has to appropriately choose the attributes to appear in each relation, i.e., he would not merge attribute Z with X and Y in the same relation. Thus, he would design the database with two relations: one with scheme ZX and the other one with scheme XY . This process of normalization (see [5,6]) is done by the database manager before introducing data into the database. Nevertheless, if the design is not correct, and a functional dependency exists, then Eq. (3) assures that we can replace r by its projections r1 and r2 , isolating the information conveyed by the dependency in r2 , and the rest of data into r1 . This is the interpretation we want to extend to the fuzzy case, when the dependency is not crisp but fuzzy.

238

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

1.3. Resemblance relations Associated to each attribute Ai , we consider a relation denoted by RAi , or simply Ri if there is no confusion, which gives us the resemblance between crisp values in the corresponding Di domain. This arbitrary but ®xed relation is de®ned a priori by an expert, and will allow to smooth the concept of strict equality, which is used, for instance, in De®nition 1 of functional dependency. De®nition 2. Let D be a crisp domain. A resemblance relation is a binary relation R : D  D ! ‰0; 1Š satisfying re¯exive and symmetric properties R…x; x† ˆ 1;

R…x; y† ˆ R…y; x†:

The expert may de®ne a resemblance function, which can yield very small resemblance values (say 0:2) for some argument values. So, associated to each attribute Ai , we also consider a ®xed threshold denoted by hAi , or simply hi if there is no confusion, which tells us the minimum degree of resemblance we require for two crisp values to be considered as being resemblant. Instead of using only one threshold for all the attributes, the expert may select di€erent threshold values for di€erent attributes, so he can impose a hard resemblance by choosing a high value (say 0.9) or he may impose a softer resemblance with a smaller one (say 0.5). There is no constraint in the selection of hi except hi > 0. De®nition 3. Let D be a crisp domain, and R a resemblance relation de®ned on D  D. Two crisp values x; y 2 D are resemblant at level h, if and only if R…x; y† P h: In order to emphasize the resemblance relation and the threshold attached to each attribute, we include them in the de®nition of the relational scheme as follows: De®nition 4. An extended scheme is a triple S ˆ …REL; ~ h; ~ R†; h is the set where REL is a classic scheme, i.e., a set of attributes …A1 ; . . . ; An †, ~ R the set of associated resemblance of associated thresholds …h1 ; . . . ; hn †, and ~ relations …R1 ; . . . ; Rn †. For the sake of simplicity, we call the pair …hi ; Ri † the extended scheme of an attribute Ai .

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

239

Remark. A particular resemblance relation is Kronecker's delta, which is denoted by d and de®ned by  1 if x ˆ y; d…x; y† ˆ …4† 0 otherwise: It obviously satis®es re¯exivity and symmetric properties. This relation models equality between two values, i.e., it returns 1 when applied to the same value and 0 otherwise. In fact, the de®nition of functional dependency could be rewritten as 8t1 ; t2 2 r

If d…X …t1 †; X …t2 †† ˆ 1 then d…Y …t1 †; Y …t2 †† ˆ 1 must hold:

Then we can formally suppose that every attribute has associated a resemblance relation. In the case that we do not want to impose a non-trivial resemblance for the values in an attribute A, we simply suppose that A has d as associated resemblance relation. 1.4. Fuzzy information A resemblance relation allows us to weaken the concept of equality. On the other hand, we can smooth the restriction of the ®rst normal form in order to incorporate imprecise information in the data values appearing in the tuples of a relation. Fuzzy set theory represents a very useful tool for representing and managing imprecise information. We refer the reader to Refs. [13,34] for an introduction to this issue. We brie¯y remember the relationship between fuzzy sets and possibility distributions and introduce the notation we are going to follow throughout this paper. Let `X is tall' be a vague proposition. A fundamental aspect of vague propositions in general is that variable X is not attached to a concrete value, but a possibility distribution that associates a number of the interval ‰0; 1Š to every possible value for X . In general, possibility distributions are identi®ed with membership functions. That is, if an element d (say 180 cm) belongs to a fuzzy set F (say `tall') with degree w (say 0.9), then the possibility that a variable X , whose value is known to be F , takes exactly the value d is pX …d† ˆ w. This interpretation can be expressed as pX …d† ˆ lF …d†: So, the membership function of a fuzzy set can be interpreted as a possibility distribution (see [14]). In particular, a precise fact (from now on, a crisp value) corresponds to the distribution that takes the value 0 for every element of the underlying domain except one, at which it takes value 1.

240

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

1.4.1. Notation Generic fuzzy values will be represented by the letters F ; G . . ., and their associated membership functions by lF ; lG ; . . . lF : D ! ‰0; 1Š; where D is a crisp domain. lF …x† represents the degree of membership of the crisp value x 2 D to the fuzzy set F , i.e., the extent to which x is a good assignment of F . For instance, considering the domain of a property as `Height', we can consider the fuzzy set F ˆ `tall', with an associated membership function given in Fig. 4 in Example 4. The set of all the fuzzy subsets de®ned in a crisp domain D, will be denoted by P…D†. The a-cut of a fuzzy value F is the crisp set Fa ˆ fx 2 D such that lF …x† P ag: As a particular case, the kernel (core) of a fuzzy set is the following set: ker…F † ˆ fx 2 D such that lF …x† ˆ 1g



and we shall assume that ker…F † 6ˆ ; 8F 2 P…D†. The fuzzy inclusion between two values F and F 0 is denoted by F  F 0 , and it is de®ned by F  F0

()

lF …x† 6 lF 0 …x† 8x 2 D:

The fuzzy union is given by F _ F 0 , and it is a fuzzy value representing an OR connective (John's height is `tall' or `rather tall') with associated membership de®ned by 8x 2 D;

lF _F 0 …x† ˆ maxflF …x†; lF 0 …x†g:

We do not include the de®nition of fuzzy intersection because we do not use this operator. We consider a general de®nition of Fuzzy Relational Database where a relation r isa subset of niˆ1 Di , where Di is a crisp domain, say Di , or a fuzzy domain, i.e., P…Di †. Thus, the tuple values allowed are scalars, possibility

Fig. 4. Fuzzy linguistic labels.

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

241

distributions, etc. We do not face with the case where each tuple has a membership degree to the relation itself. For instance, the following could be a legal tuple in a fuzzy database: …`28993214'; `John Smith'; 37; high†; where high is a fuzzy value de®ned on the crisp domain DHeight . Each tuple value Ai …t† can be considered as a variable and the assignment Ai …t† ˆ F as a vague proposition. For instance, if Height (t10 ) ˆ `tall', then `the height of the person identi®ed in tuple t10 is tall' is a vague proposition. So, from now on, we can consider that, in general, we have possibility distributions as tuple values, identi®ed by the corresponding membership functions. It is worth mentioning that it is not necessary to build an ad hoc system to store fuzzy information: we propose the use of a general relational database management system, and to build special relations and dictionaries to manage fuzzy information (see [22] for the theoretical aspects, and [21] for implementation details), so that we can take advantage of these ecient management systems.

2. De®nition of fuzzy functional dependency In this section we address the problem of de®ning a fuzzy functional dependency (f.f.d.), in a crisp relational database ®rst, and then, in a fuzzy relational database. We also introduce a fuzzy projection operator which allows us to compress the information conveyed by the fuzzy dependency (in both the previous cases). First, we introduce basic properties that, in our interpretation, a fuzzy extension of functional dependency should verify: (i) The de®nition of a classical functional dependency (De®nition 1) must be a particular case of the de®nition of f.f.d. So, if a relation satis®es an f.d then it must satisfy a particular case of f.f.d (but not an f.f.d in general). In our case, this will be achieved by appropriately choosing the resemblance relations and threshold levels in the de®nition of fuzzy dependency. (iia) Non-resemblant antecedent values do not a€ect the dependency. (iib) Consequent values must be suciently resemblant when antecedent values are suciently resemblant. (iii) Each attribute has its speci®c properties, and for instance, the degree of resemblance (the relation and the threshold) may be di€erent from one attribute to another (see [31]). This fact must be considered when de®ning the f.f.d. In [12], we reviewed the behavior of several approaches to the de®nition of f.f.d regarding these properties (also see [2] for another general survey). Now, our objective is to introduce a de®nition of f.f.d, without violating the

242

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

requirements given by (i), (iia), (iib) and (iii). We ®rst introduce the case of crisp databases. 2.1. De®nition of fuzzy dependency for a crisp relational database An immediate extension of De®nition 1 of classical functional dependency (for crisp databases) is given by the following: If antecedent values are suciently resemblant (the resemblance relation ± De®nition 3 ± associated with each attribute, is used), then the consequent values are also expected to be suciently resemblant. If we denote by X ˆ …Xi †i and Y ˆ …Yj †j the antecedent and consequent attributes respectively, then we can formulate this statement as If Ri …Xi …t1 †; Xi …t2 †† P ai 8i

then Rj …Yj …t1 †; Yj …t2 †† P bj must hold 8j: …5†

As a particular case (Ri being equality and ai ˆ 1) we have the following relaxation of De®nition 1: 8t1 ; t2 2 r

if Xi …t1 † ˆ Xi …t2 † 8i

then Rj …Yj …t1 †; Yj …t2 †† P bj must hold 8j:

…6†

Example 1. Let us consider the following relation: Z z1 z2 z3 z rˆ 4 z4 z6 z7 z8

Height Weight 185 86 185 90 176 79 192 99 194 95 160 65 160 66 174 72

…7†

Note that both the fourth and the ®fth tuples have the same Z-value …z4 †, i.e., Z is not a key. This relation does not satisfy a classic dependency `Height' ! `Weight'. But let us now consider Kronecker's delta as a resemblance relation for the antecedent attribute, i.e., RHeight ˆ d, a ˆ 1, and an arbitrary resemblance relation RW for the consequent attribute. Then, the restrictions stated in Eq. (6) for a fuzzy dependency are given by RW …86; 90† P b;

RW …65; 66† P b:

…8†

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

243

If such restrictions hold we say that r satis®es a fuzzy functional dependency …1;b† between `Height' and `Weight', i.e., `Height' ! `Weight'. 2.2. De®nition of fuzzy dependency for a fuzzy relational database The ®rst step to de®ne fuzzy dependency for a fuzzy database is to extend De®nition 3 of resemblance to work with fuzzy values. It can be accomplished through two possible extensions [9,11,25], namely a weak resemblance and a strong one. De®nition 5. Let us consider a domain D of crisp values, and  a resemblance relation R de®ned on D  D. Two fuzzy values F and F 0 2 P…D† are weakly resemblant at level a, and will be denoted by F a F 0 if and only if R …F ; F 0 † P a

…9†

with 



R : P…D†  P…D† ! ‰0; 1Š; M

R …F ; F 0 † ˆ sup … R…x; y† ^ lF …x† ^ lF 0 …y††; x;y

…10†

where ^ stands for the minimum between numeric values, and sup denotes the supremum. If we consider several domains D1      Dn , fuzzy sets  Fi ; Fi0 2 P…Di †, and thresholds ai for each domain Di , then, the de®nition of weak resemblance is applied to each component F a F 0

Fi ai Fi0 8i ˆ 1; . . . ; n;

()

where F ˆ …F1 ; . . . ; Fn †, F 0 ˆ …F10 ; . . . ; Fn0 †, and a ˆ …a1 ; . . . ; an †. M

Remark. The symbol ˆ reads as is defined by. On the other hand, the symbol  should include the resemblance relation R which is being extended (for instance, we could use the notation R ); nevertheless, we omit the subscript R for the sake of simplicity in our notation. Weak resemblance gives the extent to which, some crisp element in a fuzzy value F is resemblant to some crisp element in the other fuzzy value F 0 . If we consider F as the value an attribute A takes for a tuple t, i.e., F ˆ A…t† and F 0 ˆ A…t0 †, then, this measure is interpreted as the possibility that the value of A for t is in relation R with the value of A for t0 (see [25]).

244

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

De®nition 6. Let us consider a domain D of crisp values, and  a resemblance relation R de®ned on D  D. Two fuzzy values G and G0 2 P…D† are strongly resemblant at level b, and will be denoted by G b G0 ; if and only if R …G; G0 † P b

…11†

with 



R : P…D†  P…D† ! ‰0; 1Š; M

R …G; G0 †ˆ inf fR…x; y† _ …1 ÿ lG …x†† _ …1 ÿ lG0 …y††g; x;y

…12†

where _ stands for the maximum between numeric values, and inf denotes the in®mum.  If we consider several domains D1      Dm , fuzzy sets Gj ; G0j 2 P…Dj †, and thresholds bj for each domain Dj , then the de®nition of strong resemblance is applied to each component G  b G0

()

Gj bj G0j 8j ˆ 1; . . . ; m;

where G ˆ …G1 ; . . . ; Gm †, G0 ˆ …G01 ; . . . ; G0m †, and b ˆ …b1 ; . . . ; bm †. Remark. Strong resemblance gives us the extent to which all the crisp elements in G are resemblant to all the crisp elements in G0 . If we consider G as the value an attribute B takes for a tuple t, i.e., G ˆ B…t† and G0 ˆ B…t0 †, then, this measure is interpreted as the necessity that the value of B for t is in relation R with the value of B for t0 (see [25]). This measure of strong resemblance does not satisfy the re¯exivity property, so we introduce the following de®nition. De®nition 7. Let D be a crisp domain, R a resemblance relation de®ned on D  D,  and b a resemblance threshold. Under these hypotheses, a fuzzy value G 2 P…D† satis®es the level of granularity (or equivalently, the integrity restriction of granularity IR-G) if and only if G b G:

…13†

Remark. b is considered as a ®xed threshold associated to D. Therefore, we de®ne the level of granularity without specifying the level b. Otherwise, we should call it the b-level of granularity. An interesting particular case, which represents a great advantage in computational e€ort is given if we consider trapezoidal fuzzy values (such as those represented in Fig. 4) and R to be a classical distance relation. Then, the restrictions in Eqs. (9) and (11) become respectively (see [9]):

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

245

right right left R…xleft Fa ; xFa0 † _ R…xFa ; xFa0 † P a;

…14†

right right left R…xleft G1ÿb ; xG0 † ^ R…xG1ÿb ; xG0 † P b;

…15†

1ÿb

1ÿb

right where xleft ˆ maxfx; x 2 Fa g. Fa ˆ minfx; x 2 Fa g and xFa The following propositions state some useful properties for weak and strong resemblance, which will be used later. F and G stand for arbitrary fuzzy values.

Proposition 1. Let us consider the resemblance relation represented by the  Kronecker's delta d (see Eq. (4)), two arbitrary fuzzy sets F ; F 0 2 P…D† and a resemblance relation R. Then, the following properties are satisfied: (i) R …F ; F 0 † P d …F ; F 0 †; R …F ; F 0 † P d …F ; F 0 †: (ii) d …F ; F 0 † ˆ supx minflF …x†; lF 0 …x†g which is known as the height of the intersection of F and F 0 . (iii) we have R …x; F † ˆ lF …x† 8x 2 D;



8F 2 P…D†:

…16†

Proof. To prove (i) take into account that R…x; y† P d…x; y† 8x; y, and thus sup minfR…x; y†; lF …x†; lF 0 …y†g P sup minfd…x; y†; lF …x†; lF 0 …y†g; x;y

x;y

inf fR…x; y† _ …1 ÿ lF …x†† _ …1 ÿ lF 0 …y††g x;y

P inf fd…x; y† _ …1 ÿ lF …x†† _ …1 ÿ lF 0 …y††g: x;y

(ii) and (iii) are direct consequences of the de®nition of d and weak resemblance.  The proof of the following propositions can be found in [9]. Proposition 2. (i) If F  F 0 ; H  H 0 and F a H ; then F 0 a H 0 , (ii) F a F 0 8F ; F 0 such that F  F 0 . Proposition 3. If G b G0 ; H  G and H 0  G0 then H b H 0 . Proposition 4. If G and G0 satisfy the level of granularity, and G b G0 , then: (i) …G _ G0 † b G,

246

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

(ii) …G _ G0 † b G0 , (iii) …G _ G0 † b …G _ G0 †, i.e., G _ G0 satisfies the level of granularity. Now, we can state the following de®nition of fuzzy dependency, which extends Eq. (5) to work in a fuzzy database. The reader is referred to Ref. [10] for a complete study of such a dependency. We are going to introduce the basic de®nition and properties which will be used in the next sections. In the de®nition, we have isolated the trivial cases when the consequent attributes are empty or have no empty intersection with the antecedent attributes. De®nition 8. Let us suppose the following Conditions: Let us consider an extended scheme as introduced in De®nition 4, S ˆ …REL; ~ h; ~ R†, two sets of attributes X ; Y  REL, with X 6ˆ ; and associated thresholds for X and Y given by a  ~ h and b  ~ h respectively. Let r be a relation in a fuzzy database satisfying scheme S. Then, r satis®es a …a; b†-f.f.d. denoted by X …a;b† ! Y if and only if one and only one of the following propositions is true: (i) Let us suppose Y ˆ ;. Then, it is always true that X …a;m† ! Y

8X  REL

8m 2 ‰0; 1Š;

(ii) When X \ Y ˆ ;, then r must satisfy the next restrictions: (iia) Y …t† veri®es the level of granularity 8t 2 r, (iib) we have 8 t1 ; t 2 2 r

if X …t1 † a X …t2 † then Y …t1 † b Y …t2 † must hold:

…17†

(iii) If X \ Y 6ˆ ;, then we consider Y 0 ˆ Y ÿ …X \ Y † ˆ Y ÿ X and de®nition of f.f.d becomes …a;b†

r satisfies X ! Y

()

…a;bY 0 †

r satisfies X ! Y 0 ;

where bY 0 is the vector of resemblance thresholds associated to Y 0 . Remark. It can be proved (see [10]) that this de®nition of fuzzy dependency satis®es Armstrong's axioms. On the other hand, when applied to crisp databases without ill known values, De®nition 8 is given by Eq. (5) in the general case, and by Eq. (6) when using Kronecker's delta as antecedent resemblance relation and ai ˆ 1 8i. If, in addition, we take Kronecker's delta as consequent resemblance relation and bj ˆ 1 8j, then the classical de®nition of functional dependency is obtained as particular case.

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

247

2.3. Fuzzy projection operator In [10], we introduced a fuzzy projection operator which allows us to extract the information conveyed in a relation (in a crisp or in a fuzzy database) satisfying an f.f.d., in such a way that the fuzzy dependency still holds in the projection. In addition, the dependency, when new tuples are inserted into the database, can be tested through the projected relation in the same way than if we had used the original relation. We are going to brie¯y introduce this operator, which will be used later, in order to de®ne another fuzzy projection. In the classical case, when a relation r satis®es a dependency between X and Y , the projection operator PXY selects the XY -value of those tuples with equal X values (and therefore with equal Y values), so that PXY …r† is a relation with fewer tuples than r. We can see this new relation as a set of rules extracted from the original relation r, thereby explaining the dependency. Now, we propose to merge two tuples whenever they have equal antecedent values. Now, two tuples t and t0 with X …t† ˆ X …t0 † may have Y …t† 6ˆ Y …t0 †, but if the relation r satis®es De®nition 8, then we can ensure that Y …t† b Y …t0 † and therefore (iii) part of Proposition 4 implies that Y …t† _ Y …t0 † satis®es the level of granularity. This will guarantee that we shall not obtain too fuzzy values (for instance `between 130 and 210 cm') when projecting a relation, using fuzzy union (this is described in Theorem 1). De®nition 9. Let us consider an arbitrary relation r in a fuzzy database with scheme S ˆ …REL; ~ h; ~ R†, REL  XY . We de®ne the fuzzy projection operator denoted by PXXY …r† (or simply PZ …r† or P…r† if there is no confusion), as a kind of projection which merges through fuzzy union the XY values of those tuples fti 2 rg with equal X -values, i.e., Xh …tj † ˆ Xh …tk † 8 Xh 2 X 8 tj ; tk 2 fti g. The merging of fti g over XY ˆ …X1 ; . . . ; Xm ; Y1 ; . . . ; Yk †, is given by …X1 …tk †; . . . ; Xm …tk †; _i Y1 …ti †; . . . ; _i Yk …ti ††; where _ stands for fuzzy union. Therefore, PXXY …r† is a relation with scheme …XY ; hX hY ; RX RY †. From now on, we shall use the following notation: M

PZ ˆ PZZ

8Z  REL;

PZ is a projection over Z which eliminates duplicate tuples, and is the classic projection operator when applied to crisp databases. Example 2. This example shows how the P operator is applied to a relation with fuzzy values a; a0 ; b; b0 :

248

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

A a a rˆ 0 a a0

B b b0 b b

... ... ... ... ...

A a PAAB …r† ˆ 0 a

B b _ b0 b

A a PAB …r† ˆ a a0

B b b0 b

The next result shows that the P operator as introduced in De®nition 9 preserves the dependency as introduced in De®nition 8. The proof can be found in [10]. Theorem 1. Let us consider a relation r under the same conditions as in Defi…a;b† nition 8. Then, r satisfies the f.f.d. X ! Y (Definition 8), if and only if the relation …a;b†

PXXY …r† satisfies X ! Y . Example 3. Let us consider the relation r given in Eq. (7) of Example 1, with RHeight ˆ d, a ˆ 1, and an arbitrary resemblance relation RW for the consequent attribute. We have Height 185 176 192 PHHW …r† ˆ 194 160 174

Weight 86 _ 90 79 99 95 65 _ 66 72

In this case, the original relation r is a classical one, but the projection is a …1;b† relation of a fuzzy database. Theorem 1 says that r satis®es `Height' ! `Weight' if and only if it is satis®ed by the relation PHHW …r†. We have chosen d as antecedent resemblance relation. So, no pair of tuples t1 , t2 satisfy H …t1 † 1 H …t2 † and thus, we have not to test Eq. (17) in De®nition 8. We only have to test (iia) in De®nition 8, i.e., the granularity level for the consequent attributes. Thus, the dependency restrictions for PHHW …r† are given by …86 _ 90† b …86 _ 90†;

…65 _ 66† b …65 _ 66†;

which are equivalent (due to Proposition 4) to those stated in Eq. (8). The next proposition is straightforward:

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

249

Proposition 5. Let us denote PVVW by P. Then, we have: · P…P…r†† ˆ P…r†. V · PVVW …PVZ VZW …r†† ˆ PVW …r†. · Let us consider two arbitrary non-empty subsets (r1 , r2 ) of tuples of r with empty intersection, and such that r ˆ r1 [ r2 , i.e., they are a partition of r. Then P…r† ˆ P…P…r1 † [ P…r2 ††: 3. Rule based fuzzy dependencies In the previous section we have introduced a fuzzy projection (denoted by P) to be applied to crisp or fuzzy databases. In particular, for a crisp database, the de®nition of fuzzy dependency is given by Eq. (5), and the P operator merges those tuples with equal antecedent values. In this section we shall propose another fuzzy criterion of projection and dependency for crisp databases, which allow us to merge more tuples than with P operator, although it will be a more restricted concept. Some authors have addressed the problem of de®ning the concept of functional dependency when there is incomplete information in the attributes appearing in the dependency (see [10] for a review of these works). One of the ®rst solutions was given by Vassiliou [30], in the framework of a crisp database with unknown values (for a discussion of unknown and null values in the framework of fuzzy databases without dependencies, see [24]). He basically proposed the replacement of unknown values by those crisp values which do not break the dependency. This specialization approach can be extended to work with fuzzy databases, substituting the concept of replacement by that of the intersection of fuzzy values (see for instance [15,33]). In our opinion it is dangerous to replace a fuzzy value by a more precise value, if we do not have further information. So, we propose a generalization approach maintaining the original lack of information given by the fuzzy data values, and trying to construct fuzzy rules explaining the dependency (this approach has also been followed recently by Bosc et al. [2]). The idea is to replace each data value in the crisp database relation r by a fuzzy set containing it. Thus, we obtain a fuzzy  database relation r and we proceed to test the general De®nition 8 of depen dency. If such restriction holds, then we can apply the P operator to r , and this is the fuzzy projection we propose. Let us emphasize that, in contrast with the specialization approach, the generalization one can be applied to existing crisp databases, in the sense that our source is not a fuzzy relation but a crisp one. 3.1. The extension operator Let us suppose we have a crisp relation r and that there is an expert who constructs a set of fuzzy linguistic labels on each attribute Ah . If Ah has Dh as its

250

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

domain, then we shall denote the set of linguistic labels as LAh or LDh or Lh if 

there isno confusion. Each label L 2 LD is a fuzzy set belonging to P…D† (i.e., LD  P…D†). For instance (see Fig. 4) LHeight ˆ fmedium; tall; rather tallg; LWeight ˆ fnormal; about 85 kg; rather heavyg: Each set LD must satisfy the next restriction: Integrity Restriction of Sepa ration IR-S. A set of linguistic labels LD  P…D† satis®es the IR-S restriction, if and only if, the next condition is satis®ed: 8L; L0 2 LD ;

ker…L† \ ker…L0 † ˆ ;:

…18†

Then we replace each A…t† by the label L 2 LDA which kernel contains it (it is unique because of the restriction stated in Eq. (18)); if there is not such a label, then, A…t† remains the same. De®nition 10. Let us consider a crisp relation r with extended scheme S ˆ …REL; ~ R; ~ h†, Z  REL, and Lh is a set of linguistic labels associated to each Zh 2 Z, satisfying the IR-S restriction. We de®ne the extension operator with respect to Z and L ˆ fLh gh , as an operator which applied to the crisp relation r yields a fuzzy relation denoted by EZL …r† (or simply E…r† if there is no confusion), constructed in the following way: each tuple t 2 r is modi®ed according to: · If 9L 2 Lh such that Zh …t† 2 ker…L†, then we replace Zh …t† by L. · Otherwise, Zh …t† remains the same. Remark. Note that E…r† represents an operator which, when applied to a relation r in a crisp database, yields another relation in a fuzzy database. On the other hand, from now on, we shall add the set L (it may be Lh ˆ ; for some h) to the extended scheme of a relation, so that we have S ˆ …REL; ~ h; ~ R; L†; where L satis®es the IR-S restriction. Remark. In this fuzzi®cation process, we do not impose the labels to cover the entire domain. Of course, this could be the case, and then, the change of granularity would be uniform, in the sense that we would have only linguistic labels without mixing them with crisp data. Example 4. Let us consider the relation r given in Eq. (7), and the next fuzzy linguistic labels L ˆ fLHeight ; LWeight g;

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

251

LHeight ˆ fmedium; tall; rather tallg; LWeight ˆ fnormal; about 85 kg; rather heavyg; where ker …medium† ˆ ‰170; 175Š; ker …tall† ˆ ‰185; 187Š; ker …rather tall† ˆ ‰190; 195Š ker …normal† ˆ ‰78; 81Š; ker …about 85 kg† ˆ ‰84; 86Š; ker …rather heavy† ˆ ‰95; 100Š with lmedium …176† ˆ 0:8 (this membership degree will be used later, in another example). The extended relation is given by Height tall tall 176 EHW …r† ˆ rather tall rather tall 160 160 medium

Weight about 85 90 normal rather heavy rather heavy 65 66 72

…19†

It can be seen that the substitution criterion in the de®nition of the extension operator is that the crisp value has to belong to the kernel of some fuzzy linguistic label. So, in this step we are only considering the non-fuzzy part of the linguistic labels, i.e., we treat them as intervals. But, in De®nition 12 we shall use the weak and strong resemblance relations, and so, in this second step, we shall consider all the elements in the fuzzy set, not only the values in the kernel. The following results are straightforward: Proposition 6. The E operator is a closed operator, i.e., if we denote EZ by E, then E…r† ˆ E…E…r††. Proposition 7. Let us consider a relation r under the same conditions as in Definition 10, with X  Z  REL. Then, EZ …r† X ˆ EX …r† X where sjX stands for the restriction of the relation s to the values of X, i.e., sjX ˆ fX …t† t 2 sg. This result says that each tuple t 2 EZ …r† is bijectively associated to a tuple u 2 EX …r† with X …t† ˆ X …u†. Proposition 8. Let us consider a relation r under the same conditions as in Definition 10. Let us denote EZ by E, and let us consider a partition r ˆ r1 [ r2 (as stated in Proposition 5). Then, we have E…r† ˆ E…r1 † [ E…r2 †.

252

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

3.2. De®nition of rule based fuzzy dependency In this section we introduce the de®nition of rule based fuzzy dependency between a set of antecedent attributes X , and a set of consequent attributes Y . In order to de®ne this, we are going to use the extension operator EXY introduced in the previous section. As we can see in the example given in Eq. (19), when we construct EXY …r† there are several generalized tuples with the same X values. We X shall merge through fuzzy union those tuples and obtain a relation called PXY …r† (see Eq. (20)), which can be seen as a set of rules explaining the fuzzy dependency (if a person is tall, then he/she is heavy . . .). But we could obtain a rule like `if a person is tall, then he/she is light or heavy or very heavy'. In this case, such rule is rather uninformative, and would say us that the weight does not fuzzily depend on the height. So, we impose a consistency condition for those rules, which, by de®nition, is the restriction of fuzzy dependency. We proceed to formalize this process: De®nition 11. Let us consider a crisp relation r with extended scheme S ˆ …REL; ~ R; ~ h; L†, and XY  REL. We de®ne the fuzzy projection of r over  X ;L

X

XY and with respect to X and L, denoted by PXY …r† (or simply PXY …r†, if there is no confusion), as the next relation X ÿ  PXY …r† ˆ PXXY EXY …r† :

As with the P operator, we shall use the following notation 

M

PZ …r† ˆ PZ …EZ …r††;



i.e., in order to construct PZ …r†, we consider the restriction of r to Z, and replace each Zh …t† by the linguistic label L 2 Lh with Zh …t† 2 ker…L† (if such a label exists); then, merge the tuples with the same Z-value. Example 5. Let us consider the relation r given in Eq. (7). The EHW …r† relation is showed in Eq. (19). Then, we have Height tall 176 H PHW …r† ˆ PHHW …EHW …r†† ˆ rather tall 160 medium V



Weight 90 _ about 85 normal rather heavy 65 _ 66 72

…20†



Proposition 9. Let us denote PVW by P, and PVVW by P. Then we have P…r† ˆ 





E…P…r†† ˆ P…P…r††.

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

253



Proof. Each tuple value A…t† in the relation P…r† satis®es one of the following situations: 1. It has not been replaced by any label. The set of labels is ®xed, and thus it will not be replaced by any label when performing E…P…r††.  2. It is equal to some label. Thus, it remains unaltered in the relation E…P…r††. 3. It is equal to the fuzzy union of some labels. Again, as the set of labels is ®xed, there will be no label containing such fuzzy union, and thus, the value remains unchanged.   Thus, we have P…r† ˆ E…P…r††, or equivalently, P…E…r†† ˆ E…P…E…r†††. Thus, P…E…P…E…r†††† ˆ P…P…E…r†††. This second part is equal to P…E…r†† by applying Proposition 5. We have proved the following: P…E…P…E…r†††† ˆ P…E…r†† ˆ E…P…E…r†††:  De®nition 12. Let us suppose the following Conditions: Let us consider an extended scheme S ˆ …REL; ~ h; ~ R; L†, two sets of attributes X ; Y  REL, with X 6ˆ ; and associated thresholds for X and Y given by a  ~ h and b  ~ h respectively. Let r be a relation in a crisp database satisfying scheme S. X



Let us denote the fuzzy relation PXY …r† by r . Then, r satis®es a …a; b†-rule based …a;b† fuzzy functional dependency (r.b.f.f.d.) with respect to L, denoted by X , L Y …a;b† …a;b†  or simply X , Y , if and only if r satis®es the X ! Y f.f.d. In other words, one of the following items must be satis®ed: (i) Let us suppose Y ˆ ;. Then, it is always true that X …a;m† ! Y 8 X  REL 8m 2 ‰0; 1Š. (ii) When X \ Y ˆ ;, then the following conditions must hold:  (iia) Y …t† veri®es the level of granularity 8t 2 r ,  0 0 0 (iib) 8 t; t 2 r , t 6ˆ t , if X …t† a X …t †, then Y …t† b Y …t0 † must hold. (iii) If X \ Y 6ˆ ;, then we consider Y 0 ˆ Y ÿ …X \ Y † ˆ Y ÿ X , and the definition of r.b.f.f.d. becomes …a;b†

r satisfies X , Y

()

…a;bY 0 †

r satisfies X ! Y 0 ;

where bY 0 is the vector of resemblance thresholds associated to Y 0 . Remark. We emphasize that, due to Theorem 1, the testing of (iia) and (iib)  over r is equivalent to testing both conditions over the relation E…r†. On the other hand, it is immediately seen that, if some label L belonging to the set Lj associated to each Yj 2 Y , does not satisfy the level of granularity imposed in the attribute Yj , then condition (iia) can never be attained whenever there is some t 2 r such that Yj …t† 2 ker…L†. Therefore, from now on, we can assume the following restriction

254

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

8Yj 2 Y consequent attribute 8L 2 Lj ; L must satisfy the level of granularity imposed in Yj :

…21†

The following propositions state some particular cases of De®nition 12 of r.b.f.f.d. Proposition 10. Let us consider a relation r in a crisp database, under the same conditions as in Definition 12 of r.b.f.f.d. Let us suppose that Lh ˆ ; 8h. Then, the definition of r.b.f.f.d. is given by changing (ii) in Definition 12 by Eq. (5), i.e., 8t1 ; t2 2 r

If Ri …Xi …t1 †; Xi …t2 †† P ai 8i

then Rj …Yj …t1 †; Yj …t2 †† P bj musthold

8j:

Proof. When F and F 0 are crisp values, we have R …F ; F 0 † ˆ R …F ; F 0 † ˆ R…F ; F 0 †: Thus, F ai F 0 if and only if R…F ; F 0 † P ai . Proposition 11. Let us consider a relation r in a crisp database, under the same conditions as in Definition 12 of r.b.f.f.d. Let us consider Kronecker's delta d (Eq. (4)) as antecedent resemblance relation, i.e., Ri ˆ d8Xi 2 X , and ai ˆ 18Xi 2 X . Then, the definition of r.b.f.f.d. is obtained from Definition 12 by removing (iib). So, we only have to test (iia) in Definition 12, i.e., the granularity level for the consequent attributes. 

Proof. According to the de®nition of r , we merge those tuples in EXY …r† which have the same values in X , and therefore 

8t; t0 2 r

9Xi 2 X such that Xi …t† 6ˆ Xi …t0 †:

Let us consider all the possible cases: · Both Xi …t† and Xi …t0 † are crisp. Thus d …Xi …t†; Xi …t0 †† ˆ 0 which is less than 1. · Both are fuzzy labels. Then, by applying Proposition 1, we obtain d …Xi …t†; Xi …t0 †† ˆ sup minflXi …t† …x†; lXi …t0 † …x†g x

which is less than 1 because of IR-S restriction. · One of them is fuzzy (say Xi …t†) and the other one is crisp (say Xi …t0 †). Then, by Proposition 1 we ®nd d …Xi …t†; Xi …t0 †† ˆ lXi …t† …Xi …t0 †† which is again less than 1 because, in other case, Xi …t0 † should have been replaced by a fuzzy label when constructing EXY …r† but this is not the case. 

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

255

The result stated in the previous proposition drastically decreases the computing time when testing a fuzzy dependency. The reason is that we only X have to project r into PXY …r† and this can be done with a order of O…n†, being n the number of tuples in the database, and then test whether each X

Y …t† t 2 PXY …r† satis®es the level of granularity, with another O…n† algorithm. The ®nal algorithm would have then an O…n† order. Let us remark that working with an antecedent resemblance relation distinct of the Kronecker's delta, the testing algorithm would be O…n2 †. So, an important consequence of Proposition 11 is that we obtain the least restrictive de®nition of fuzzy dependency regarding to the antecedent values, when Ri ˆ d; ai ˆ 1 8Xi 2 X . In this case, the semantics of the dependency would be the following: Those tuples with antecedent values belonging to the same label should have consequent values belonging to strongly resemblant labels. This is not the case for consequent attributes: if we consider Kronecker's delta as a resemblance relation for Y , then the restriction Y …t† db Y …t0 † is a stronger requirement than if we had used any other resemblance relation (see the part (i) of Proposition 1). Thus, our recommendation is begin checking fuzzy dependencies with Ri ˆ d; ai ˆ 1 8Xi 2 X , because if such dependency does not hold, then we are sure that r does not satisfy a r.b.f.f.d. with any other selection of Ri and ai . Regarding the thresholds used in the de®nition of r.b.f.f.d., it is easy to see …a;b† a0 ;b0 that if a relation satis®es a X , Y dependency, then it also satis®es a X ! Y (with a0i P ai and bj P b0j ) dependency. So, we recommend the expert ®rst checking a dependency with a low value of each bj (say bj ˆ 0:6). In this discussion we have supposed that an expert provides these thresholds, but they could be set by the system in the following way. Assume that Ri ˆ d; ai ˆ 18Xi 2 X . Then, each bj is computed by the system as the minimum degree of resemblance among the Yj values appearing in the relation. Example 6. Let us consider the relation in Eq. (7), with the fuzzy projection given in Eq. (20), i.e., Z z1 z2 z3 r ˆ z4 z4 z6 z7 z8

Height Weight 185 86 185 90 176 79 192 99 194 95 160 65 160 66 174 72

Height tall H 176 PHW …r† ˆ rather tall 160 medium

Weight 90 _ about 85 normal rather heavy 65 _ 66 72

256

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270 …a;b†

We want to see if r satis®es the r.b.f.f.d. `Height' , `Weight'. Let us test two cases: · Let us consider a ˆ 1, b ˆ 0:6, RHeight ˆ d (the Kronecker's delta given in Eq. (4)). Then, according to Proposition 11, the following restrictions must hold …90 _ about 85† b …90 _ about 85†; rather heavy b rather heavy;

normal b normal;

…65 _ 66† b …65 _ 66†:

The second and third restrictions hold because we assumed that the linguistic labels satis®ed the level of granularity (stated in Eq. (21)). The fourth condition is equivalent to test whether RW …65; 66† P b. On the other hand, `about 85' has a trapezoidal membership function, so, if we consider RW to be an arbitrary distance relation, then, by applying Eq. (15), the restriction becomes RW …x; 90† P b, where x is the minimum of the …1 ÿ b† cut of the fuzzy set `about 85'. As b ˆ 0:6, then …1 ÿ b† ˆ 0:4 and thus x ˆ 83:1, so the restrictions to test are ®nally given by RW …83:1; 90† P 0:6;

RW …65; 66† P 0:6:

…22†

If both restrictions are satis®ed, then we can conclude that r satis®es …1;0:6† `Height' ! `Weight'. · Let us consider now a ˆ 0:8, RHeight ˆ d, and b ˆ 0:6. On the one hand, as we have seen in the previous case, part (iia) of De®nition 12 of r.b.f.f.d. is given by Eq. (22). On the other hand, in addition, we also must compare the consequent values of those tuples with weak resemblant antecedent values. In this case, there are only two tuples (the second and the ®fth one). We have R W …176; medium† ˆ lmedium …176† ˆ 0:8 P a; where we have applied Eq. (16) to derive the ®rst equality. So, in addition to the conditions stated in Eq. (22) we must also check that 72 b …0:8;0:6† `normal', in order to conclude that r satis®es the `Height' ! `Weight' dependency. Should the dependency not hold, then we could proceed to test, for instance, whether the Weight depends on both the Age and the Height, which intuitively makes more sense than the ®rst dependency. In general, it is obvious that the more the attributes we include as antecedents, the greater is the possibility to detect a fuzzy dependency. But this process should be supervised by an expert in order to avoid ®nding simple and not interesting dependencies such as every attribute except Weight determines Weight.

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

257

3.3. Properties We are going to analyze several properties the de®nition r.b.f.f.d satis®es. First, it is easy to see that the de®nition of r.b.f.f.d. satis®es the basic properties (i)±(iii) introduced in Section 2 (following a similar reasoning to that in [12]). Second, in the next result, we see that if r satis®es a r.b.f.f.d. with respect to L, then it satis®es a r.b.f.f.d. with respect to any other L0 contained in L. Proposition 12. Let us consider two sets of linguistic labels L ˆ fLh g, L0 ˆ fL0h gh , where each Lh , L0h is associated to each Zh 2 XY . Let us suppose the following condition: 8h 8L 2 Lh

9L0 2 L0h such that L  L0 ;

where  stands for inclusion between fuzzy sets. Then, if a relation r satisfies …a;b† …a;b† X , L Y , then r satisfies X , L0 Y L

0

L

 X ;L

 X ;L

0

Proof. Let us denote by r and r the relations PXY …r† and PXY …r†, respectively. Let us test conditions (iia) and (iib) in De®nition 12 over the relation 0 0 L L such that r . So, let us consider two arbitrary tuples t10 ; t20 2 r X …t10 † a X …t20 †. By hypothesis and de®nition of fuzzy projection we have L

9t1 t2 2 r

such that XY …t10 †  XY …t1 † and XY …t20 †  XY …t2 †:

By applying Proposition 2 we ®nd X …t1 † a X …t2 † and by hypothesis of dependency we have Y …t1 † b Y …t2 †. Now, Proposition 3 implies Y …t10 † b Y …t20 †. 0

L

On the other hand, every tuple value Yj …t0 † with t0 2 r satis®es the level of L granularity because there exists t 2 r such that Yj …t0 †  Yj …t†. As Yj …t† satis®es the level of granularity because of the hypothesis of dependency, we can apply Proposition 3 and obtain Yj …t† b Yj …t† ) Yj …t0 † b Yj …t0 †:  Proposition 13. Let us consider a relation r in a crisp database under the same …a;b† …a;b† conditions as in Definition 12. If r satisfies X , L Y , then r satisfies X ! Y . Proof. By applying Proposition 12 with L0 ˆ ; to r, we have that r satis®es …a;b† X , ; Y . By considering Proposition 10 we have that r satis®es Eq. (5), which is the de®nition of f.f.d for a crisp database (see remark after De®nition 8).  Remark. The converse of Proposition 13 is not true in general because some values in a crisp relation r could be resemblant but the corresponding labels (after the fuzzi®cation step) might reveal themselves not resemblant. Put in other words, some fuzzy dependencies (like f.f.d. stated in Eq. (5)) could be lost during the fuzzi®cation process.

258

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

Example 7. Let us reconsider the relation in Eq. (7). We were looking for a fuzzy dependency between `Height' and `Weight'. Let us reconsider, as in Example 1, the Kronecker's delta as antecedent resemblance relation and an arbitrary resemblance RW for the `Weight' attribute. …1;0:6†

Being interested in testing the `Height' ! `Weight' dependency, with RHeight ˆ d, we saw in Eq. (8) that the following restrictions have to be satis®ed: RW …86; 90† P b;

RW …65; 66† P b:

As we saw in Proposition 10, these are the same restrictions for the de®nition of r.b.f.f.d. with LH ˆ LW ˆ ;. Let us suppose that both restrictions hold. Then, …1;0:6†

we can state that r satis®es `Height' ! `Weight'. Now, if we are interested to see whether r satis®es a r.b.f.f.d with LH 6ˆ ; and LW 6ˆ ;, or not, we may consider for instance the labels de®ned in Fig. 4. Then, in Eq. (22) in Example 6 we saw that the restrictions of this dependency were RW …83:1; 90† P 0:6;

RW …65; 66† P 0:6

which are more restrictive than RW …86; 90† P b; RW …65; 66† P b. If both restrictions in Eq. (22) hold, then r satis®es the dependency given by `Height' …1;0:6† ! `Weight' with respect to the previous sets of linguistic labels. On the other hand, let us suppose that the ®rst restriction does not hold, i.e., RW …83:1; 90† < 0:6. Then r does not satisfy the considered r.b.f.f.d. Proposition 12 says that if a relation does not satisfy a dependency with respect to a given set of linguistic labels, then it does not satisfy a dependency with respect to any other coarser set, i.e., a set of linguistic labels containing the given set of actual labels. So, if r does not satisfy the considered r.b.f.f.d, we should not to try with another coarser partition, because we are guaranteed that this dependency will not hold. Thus, we could try with another finer set L0 of linguistic labels, i.e., L0 satisfying the condition in Proposition 12 (a set of labels contained in the actual set). Now, we proceed to see several properties regarding new updates to the database. De®nition 12 of r.b.f.f.d. says that if we want to test if a relation r  test for a satis®es a r.b.f.f.d., we ®rst construct the relation P…r†, and then  …a; b†-fuzzy functional dependency. If it is not satis®ed, then P…r† must be deleted from memory. If it is satis®ed, then we can store relation P…r† containing the rules which explain the dependency. All the updates regarding  antecedent and consequent attributes can be accomplished directly on P…r†. In order to maintain the integrity constraints, these updates must not violate the  …a; b†-fuzzy functional dependency in P…r† (i.e., the level of granularity and Eq. (17)). This is proven in the next results:

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

259

Theorem 2. Let us consider a relation r under the same conditions as in DefiX  nition 12 of r.b.f.f.d., and let us denote by r the fuzzy projection PXY …r†. Let us consider a new tuple given by t. Then, we have X

X



(i) PXY …r [ XY …t†† ˆ PXY …r [ t†,

…a;b†

…a;b†



(ii) r [ t satisfies the r.b.f.f.d X , Y if and only if r [ XY …t† satisfies X , Y . Proof. For the sake of simplicity, we shall omit X and Y , when denoting PXXY X

and PXY , and we shall denote XY …t† by p. Part (ii) is a direct consequence of part (i), so that we are going to prove (i). If we apply Proposition 8, we ®nd E…r [ t† ˆ E…r† [ E…t†, and thus P…E…r [ t†† ˆ P…E…r† [ E…t††



P…r [ t† ˆ P…E…r† [ E…t††:

()

Proposition 5 implies 

P…E…r† [ E…t†† ˆ P…P…E…r†† [ P…E…t††† ˆ P…r [ P…E…t†††: As t is a single tuple, we have P…E…t†† ˆ E…p†, so, we have 



P…r [ t† ˆ P…E…r [ t†† ˆ P…r [ E…p††: 



On the other hand, Proposition 9 says that r ˆ E…r †, and Proposition 8 implies the following: 



 



P…r [ E…p†† ˆ P…E…r † [ E…p†† ˆ P…E…r [ p†† ˆ P…r [ p†: 

 

By chaining these equalities, we ®nally obtain P…r [ t† ˆ P…r [ p†:  In the next theorem we analyze the cost of dependency checking for new entries to the database. Theorem 3. Let us consider a relation r under the same conditions as in Defi…a;b† nition 12 of r.b.f.f.d. and satisfying the r.b.f.f.d. X , Y . Let us denote the fuzzy X



projection PXY …r† by r , and let us consider a new tuple t. Let us denote XY …t† by p and E…p† by p. If the following condition is satisfied: 

8u 2 r If X …p† a X …u†; then Y …p† b Y …u† holds;

…23†

then, we have …a;b†  (a) r [ p satisfies the r.b.f.f.d. X , Y , …a;b† (b) r [ t satisfies the r.b.f.f.d. X , Y .  

…a;b†

Proof. To prove (a) we have to show that P…r [ p† satis®es X ! Y . In the proof of Theorem 2 we saw that

260

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270  



P…r [ p† ˆ P…r [ p†:  

…a;b†

…a;b†



Therefore P…r [ p† satis®es X ! Y if and only if P…r [ p† satis®es X ! Y . Now, …a;b†  Theorem 1 implies that it is equivalent to see that r [ p satis®es X ! Y . But p is a single tuple (satisfying the level of granularity according to the hypothesis), …a;b† …a;b†   and therefore, r [ p satis®es X ! Y , if and only if r satis®es X ! Y and Eq. (23) …a;b† …a;b†  holds. But r satis®es X , Y , and therefore r satis®es X ! Y , and Eq. (23) holds   according to the hypothesis. Then, we have proven that P…r [ p† satis®es …a;b† X ! Y , and thus (a) is proven. Part (b) follows from Theorem 2.  Remark. Eq. (23) says that we only have to compare the consequent values of X

the new entry with the consequent values of the projection PXY …r†, whenever the antecedent values are weak resemblant. The gain we obtain in computaX

tional e€ort is that there are less tuples in PXY …r† than in r. In any case the testing algorithm is O…n† with n the number of tuples in the relation. The best case is given when using Kronecker's delta as antecedent resemblance relation and a ˆ 1. In such case, the condition X …p† a X …u† becomes X …p† ˆ X …u†, and thus u is unique or does not exist (because of IR-S restriction). Then, the algorithm to test the dependency should stop whenever such u is found, i.e., it would be O…n† in the worst case. Example 8. Let us consider the relation given in Eq. (7), with the fuzzy projection given in Eq. (20). Let us suppose RHeight ˆ d, and let us consider a new entry to the database given by the following tuple: t ˆ …z9 ; 186; 88†: Then following the same notation as in Theorem 3, we have p ˆ HW …t† ˆ …186; 88† and p ˆ E…p† ˆ …tall; 88†, so that if we consider the fuzzy projection given in Eq. (20), then the restriction stated in Eq. (23) becomes the following: 88 b …90 _ `about 85'† must hold which is equivalent to saying that 88 _ 90 _ `about 85' satis®es the level of granularity. If such a restriction holds, then we proceed to construct the following relation: Height tall H H 176 PHW …PHW …r†† ˆ rather tall 160 medium

Weight 88 _ 90 _ about 85 normal rather heavy 65 _ 66 72

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

261

Let us point out that the value z9 will be stored in another projection, as we shall see in the ®nal section. Let us now see the counterpart of Armstrong's axioms [1]. Let us point out that we proved in [10] that Armstrong's axioms are satis®ed for De®nition 8 of f.f.d. This fact will be used in this section. We shall consider that a; b; c are the associated thresholds for X ; Y ; Z, respectively. …a;b† Reflexivity axiom: If Y  X  REL then every relation satis®es the X , Y dependency. …a;b† Augmentation axiom: If a relation satis®es the X , Y dependency, and ac;bc Z  REL, then it also satis®es the XZ ! YZ dependency. …a;b† …b;c† Transitivity axiom: If a relation satis®es the X , Y and Y , Z dependencies, …a;c† then it also satis®es the X , Z dependency. The proof of re¯exivity and augmentation can be done in a similar way as the proof in [10]. Let us show the transitivity property. We are going to prove the most general case, considering that X , Y , and Z share common attributes. …a;b† …b;c† We suppose that r satis®es X , Y , Y , Z, and we must prove that r satis®es X

…a;c†

…a;c†

X , Z. So, we must prove that PXZ …r† satis®es X ! Z, or equivalently, by ap…a;c† plying Theorem 1, show that EXZ …r† satis®es X ! Z. Let us consider the relation M XYZ s ˆ E …r†. If we take into account Proposition 7, then we have sjXZ ˆ EXZ …r† XZ ; …a;c†

…a;c†

so that EXZ …r† satis®es X ! Z, if and only if, s satis®es X ! Z. We are going to demonstrate this: according to the hypothesis, we know that EXY …r† satis®es …a;b† X ! Y , and Proposition 7 implies that sjXY ˆ EXY …r† XY , so that s also satis®es …a;b† …b;c† X ! Y . Following the same reasoning, we ®nd that s satis®es Y ! Z. Now, we have that the transitivity axiom is sound for the de®nition of f.f.d. (see [10]), …a;c† and thus s satis®es X ! Z. Finally, Armstrong's axioms are complete for De®nition 12 of dependency, i.e., if a relation satis®es a set F of r.b.f.f.d, then any other r.b.f.f.d. I, can be deduced by only applying the re¯exivity, augmentation and transitivity properties. The proof is a simple extension of the classical case, whenever we oblige the following restriction 8 domain Dwith associated threshold h and resemblance R; 9x; y 2 D such that x ¿ h y

()

R…x; y† < h:

This guarantees that we can construct a relation with two tuples with nonresemblant antecedent values and therefore violating the precondition of fuzzy dependency. The rest of the proof is the same as in the classical case (see for instance [28]).

262

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

4. Fuzzy loss less decompositions 4.1. Fuzzy decomposition of a crisp relation 

We have introduced the fuzzy projection operator P, in order to extract the rules explaining the dependency between two sets of attributes X ; Y , thereby X

constructing the relation r2 ˆ PXY …r†. Now, we wonder what's the matter with the other information appearing in the rest of the attributes Z of REL. We must project their values into another relation r1 with scheme XZ, so that a fuzzy version of Eq. (3) can be proven: we shall prove that we can recover almost the same information we had in r, in the sense that we recover exactly the same antecedent values but the consequent ones could now be fuzzier. This approach implies a loss of precision the designer must accept. He will agree or not, depending on the type of application he is faced with. For instance, it is usual in industry manufacture to make a product and then test it (usually several tests are passed) in order to see if the product ®ts the requirements. If so, the product is sold as type A. In other case, it is sold as a lower quality product type B, C, etc, depending on the result of such tests. This policy is cheaper than having several distinct manufacture chains: one for type A products, another one for type B products and so on. For instance, when we see a chip with 233 MHz (type B) and another one with 266 MHz (type A), it is usual that they have really been constructed in the same chain but the ®rst one passed all the tests, the second one did not. The industry should have a database with information about each product it constructs. They should not be interested about the exact test results, because you can not guarantee an exact result with physical devices. Thus, this is a perfect domain where our approach may be adopted in order to represent this information. The initial database (data) stores an identi®cation number of the product (ID), some information about it (we shall include it in a set of attributes we call Z), the result of the tests (say attributes T1 ; T2 ; T3 ), and the assigned type (attribute Type): ID id1 data ˆ id2 id3

Z z1 z2 z3

T1 T2 20 y 21 y 31 n

T3 190 193 220

Type A A B

We are interested about ®nding a dependency between the type and the tests. Following the idea of replacing the exact tests results by wider values (in this case only for consequent values, i.e., for the test results), ®xing a ˆ 1 and Kronecker's delta for the Type attribute (the antecedent one), we could ®nd a fuzzy dependency between the Type and the test results, and we should isolate it into a separate relation, obtaining then the following decomposition:

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

ID id1 id2 id3

Z z1 z2 z3

Type A A B

Type A B

T1 low high

T2 y n

263

T3 medium high

When we want to recover the tests results for a given product, say id1, we join both relations through the Type attribute and obtain (id1, A, z1 , low, y, medium). This tuple di€ers from that we had in data …id1; z1 ; A; 20; y; 190† because the test results are now fuzzier. Again, it is important to emphasize that the original values we had in data were not exact measures because of the impossibility to have completely exact physical devices. It may be the case that there is not a fuzzy dependency between Type and Tests. This is the case, for instance, if the second tuple in relation data had n as the value for T2 , and R…y; n† ˆ 0. But, in this case, we may ®nd the contrary fuzzy dependency, i.e., between Tests and Type. If such dependency exists, then, the decomposition would be the following: ID id1 id2 id3

Z z1 z2 z3

T1 T2 20 y 21 n 31 n

T3 190 193 220

T1 low r2 ˆ low high

T2 y n n

T3 medium medium high

Type A A B

We have isolated the attributes appearing in the fuzzy dependency in a relation, although, as we have only three tuples, we are storing more values than previously. Now, we could join the ®rst tuple of the ®rst projection because their values are included in the ®rst tuple of the second projection, and we could obtain the tuple …id1; z1 ; 20; y; 190; A† which is the same tuple we had in data. This join process is a kind of fuzzy join which will be introduced in this section. On the other hand, we could have decided to replace the data values in the ®rst projection by the linguistic labels. This would yield to the following relation: ID id1 id2 id3

Z z1 z2 z3

T1 low low high

T2 y n n

T3 medium medium high

and now, we could join the ®rst tuple of this relation with the ®rst tuple of r2 because they are the same, obtaining the tuple …id1; z1 ; low; y; medium; A† which is fuzzier than the tuple we had in data.

264

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

In general, when we have a relation with scheme XYZ satisfying a fuzzy dependency between X and Y , we propose two alternative ways to construct the projection over XZ: we may use PXZ …r† or PXZ …r†. Formally, we have: De®nition 13. Let us consider a relation r under the same conditions as in De®nition 12 of r.b.f.f.d., with REL ˆ XYZ. Let us suppose that r satis®es the …a;b† r.b.f.f.d. I ˆ …X , Y †. Then, the decomposition of r with respect to I, is given by the following projections: r1 ˆ PXZ …r†;

X

r2 ˆ PXY …r†

or alternatively by 

r1 ˆ PXZ …r†;

X

r2 ˆ PXY …r†:

Example 9. Let us consider the relation r in Eq. (7). The fuzzy projection  r2 ˆ PHW …r† was given in Eq. (20). Now, the other projection is given by one of these relations: Z z1 z2 z3 z r1 ˆ PZH …r† ˆ 4 z4 z6 z7 z8

Height 185 185 176 192 or alternatively 194 160 160 174

Z z1 z2 z3  r1 ˆ PZH …r† ˆ z4 z6 z7 z8

Height tall tall 176 rather tall 160 160 medium

…24†

Our objective is to show that the decomposition given in De®nition 13 is such that a fuzzy natural join of r1 and r2 recovers (in fuzzy terms) the original data appearing in r, i.e., it is a fuzzy loss less decomposition. Therefore, we shall be

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

265

able to prove the fuzzy counterpart of Eq. (3). Thus, we proceed to introduce the notion of fuzzy natural join. 4.2. Fuzzy natural join In the previous sections, we worked with a relation r and we projected the X

tuples into r2 ˆ PXY …r†, in such a way, that the de®nition of fuzzy dependency …a;b† guarantees that r2 satis®es X ! Y . We can think of r2 as a relation with tuples representing rules with fuzzy antecedents (X ) and consequents (Y ), and the …a;b† dependency X ! Y represents a consistency condition for these rules. The entries to these rules are now given by the tuples in the other projection  r1 ˆ PXZ …r† (or alternatively r1 ˆ PXZ …r†). We need a criterion to determine which tuples of r1 join respect to X , with the tuples in r2 , i.e., we need a fuzzy criterion of natural join (see [7,10] for an approximation in fuzzy databases). Let us suppose we have a tuple in r2 of the kind u ˆ …`between 184 and 187 cm'; `heavy'† and a tuple t 2 r1 with the information of Peter's height t ˆ (`Peter', `185 cm'). Then, we can conclude that Peter's weight is `heavy', i.e., the join of the tuples t and u results in: …`Peter'; `185 cm'; `heavy'†: It is worth mentioning that, due to the IR-S restriction, each tuple in r1 joins with exactly one tuple in r2 , so that we obtain an ecient way of computing the fuzzy join. Formally, we de®ne the fuzzy natural join between r1 (with scheme XZ) and r2 (with scheme XY ) in the following sense r1

X r2

ˆ Pr1 :…XZ†[r2 :Y …rr1 :X r2 :X …r1  r2 ††;

where rr1 :X r2 :X …r1  r2 † selects those tuples t 2 r1 ; u 2 r2 such that Xi …t†  Xi …u† 8Xi 2 X , where  stands for fuzzy inclusion. So, this join operator is not symmetrical. When we use r1 ˆ PXZ …r†, we have the following possibilities: Xi …t† is a crisp value and Xi …t† is a crisp value and

Xi …t† 2 ker…Xi …u††; Xi …t† ˆ Xi …u†:

…25†



When we use r1 ˆ PXZ …r†, we have the following possibilities: Xi …t† is a crisp value equal to Xi …u†; Xi …t† is a fuzzy value equal to Xi …u†:

…26†

Once we have introduced the notion of fuzzy natural join, we proceed to study the loss less decomposition issue in Theorem 4, by using r1 ˆ PXZ …r†. Theorem 5 states the same result but using r1 ˆ PXZ …r†. The proof of the

266

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

second theorem is done exactly the same as in theorem 4, but taking into account Eq. (26). Theorem 4. Let us consider a relation r with an extended scheme S ˆ …XYZ; …a; b; c†; …RX ; RY ; RZ †; …LX ; LY †† …a;b†

satisfying the r.b.f.f.d. I ˆ …X , Y †. Then, the decomposition of r with respect to I given by r1 ˆ PXZ …r†;

X

r2 ˆ PXY …r†

satisfies the following properties Y …t†  Y …t†, (i) 8t 2 r9t 2 r1 X r2 such that XZ…t† ˆ XZ…t†; Y …t†  Y …t†, (ii) 8t 2 …r1 X r2 et 2 r such that XZ…t† ˆ XZ…t†; (iii) The relation r1 in Y .

X

…a;b†

r2 satisfies X , Y , and thus, the b-level of granularity

Proof. Let us denote by s the relation EXY …r1 the following property:

X

r2 †. Then, it is easy to check

8v 2 s 9u 2 r2 such that XY …v† ˆ XY …u†; 8u 2 r2

9v 2 s such that XY …v† ˆ XY …u†; …a;b†

…a;b†

Since r2 satis®es the X ! Y dependency, then s also satis®es X ! Y , and The…a;b† orem 1 implies that PXXY …s† satis®es X ! Y , but this is equivalent to saying that X

…a;b†

PXY …r1 X r2 † satis®es X ! Y : this implies, by de®nition of r.b.f.f.d., that …a;b† r satis®es X , Y . On the other hand, (i) and (ii) are straightforward by r1 X 2 applying the de®nitions of fuzzy projection, fuzzy join, and Eq. (25) Remark. The ®rst and second condition of the previous theorem ensure that we recover almost the same information (without spurious tuples) as we had in r, i.e., if we had a tuple t 2 r, we do not recover exactly the same values as in the classical case (Eq. (3)), but we can guarantee that 9t 2 r1 X r2 such that XZ…t† ˆ XZ…t†; Y …t†  Y …t† (furthermore, all tuples in the join are of this type). On the other hand, the third condition guarantees the quality of such information in the following sense: the consequent values of each tuple t satisfy the same level of granularity (b) as the level of resemblance we allowed in r:Y , i.e., we do not obtain too fuzzy values in the attributes belonging to Y. Theorem 5. Let us consider a relation r under the same conditions as in Theorem 4. Then the decomposition of r with respect to I given by

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

267

X



r1 ˆ PXZ …r†;

r2 ˆ PXY …r†

satisfies the following properties: XY …t†  XY …t†. · 8t 2 r 9 t 2 r1 X r2 such that Z…t† ˆ Z…t†; r †9t 2 r such that Z…t† ˆ Z…t†; XY …t†  XY …t†: · 8t 2 …r1 X 2 r satisfies the b-level of granularity in Y . · The relation r1 X 2 Example 10. Let us consider the relation given in Eq. (7). Then, the projections H

are given in Eqs. (20) (r2 ˆ PHW …r†) and (24) (r1 ˆ PZH …r† or alternatively 

r1 ˆ PZH …r†). The fuzzy join of both projections is given by

PZH …r†

Z z1 z2 z3 H z PHW …r† ˆ 4 z4 z6 z7 z8

Height Weight 185 90 _ about 85 185 90 _ about 85 176 normal 192 rather heavy 194 rather heavy 160 65 _ 66 160 65 _ 66 174 72

Z z1 z2 z3 H PHW …r† ˆ z4 z6 z7 z8

Height tall tall 176 rather tall 160 160 174

H

or alternatively by



PZH …r†

H

Weight 90 _ about 85 90 _ about 85 normal rather heavy 65 _ 66 65 _ 66 72

If we compare these relations with the original relation given in Eq. (7), then we see that, e€ectively, we obtain the results stated in the Theorems 4 and 5, respectively. The di€erence is that we have one less tuple in the second case, because the fourth and ®fth tuples in r are included in only one tuple (z4 , rather tall, rather heavy). It is important to emphasize that if the expert chooses this second decomposition of r, then Theorem 5 guarantees that he will obtain consequent values satisfying the granularity level. On the other hand, for the antecedent attributes, as there is no merging of di€erent labels, then he is

268

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

guaranteed to obtain antecedent values satisfying the same granularity level he allowed in the sets of linguistic labels. 5. Conclusions In this work we have introduced a smoother version of functional dependency in crisp relational databases. This allows us to detect relationships between attributes that are not detected by classical approximations, and decompose the relation r satisfying such fuzzy dependency into two new relations obtained as projections of r with fewer tuples, so avoiding some redundancy and computing time. In this decomposition process some information is lost, in the sense that, following Theorem 4, when joining the projections, we are able to recover the same antecedent values we had in the original relation but fuzzier consequent values. In order to quantify the amount of information lost in a consequent attribute we use a measure of granularity and a threshold b which is a priori ®xed by the expert. He will be sure that, roughly speaking, the values obtained after joining the projections are not fuzzier than b. It is necessary to allow the storage of fuzzy information, but it can be accomplished, as we proved in [21,22] using classical relational database systems. An open (data-mining) problem to solve in the future is the design of ecient algorithms to detect fuzzy dependencies without the help of an expert.

Acknowledgements We thank the anonymous referees for their valuable comments and suggestions, which helped us to correct some mistakes and gave us the chance to improve the presentation of this work.

References [1] W.W. Armstrong, Dependency structures of data base relationships, in: Proceedings of the IFIP Congress, North-Holland, Amsterdam, 1974, pp. 580±583. [2] P. Bosc, D. Dubois, H. Prade, More results on functional dependencies and quotient operators in fuzzy databases, Technical Report IRIT/96-10-R, Institute de Recherche en Informatique de Toulouse, Mars 1996. [3] B.P. Buckles, F.E. Petry, Extending the fuzzy database with fuzzy numbers, Information Sciences 34 (1984) 145±155. [4] G. Chen, E.E. Kerre, J. Vandenbulcke, A computational algorithm for the €d transitivity closure and a complete axiomatization of fuzzy functional dependence (€d), International Journal of Intelligent Systems 9 (5) (1994) 421±440.

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

269

[5] E.F. Codd, Normalized data base structure: a brief tutorial, In ACM SIGFIDET Workshop on Data Description, Access, and Control, San Diego, 1971. [6] E.F. Codd, Further Normalization of the Data Base Relational Model, vol. 6, Courant Computer Science Symposia Series, Prentice-Hall, Englewood Cli€s, NJ, 1972. [7] J.C. Cubero, The design of a fuzzy database, in: R. De Caluwe (Ed.), Lectures on Fuzziness and Data Bases, Third Series, Gent, Belgium, 1995. [8] J.C. Cubero, J.M. Medina, O. Pons, M.A. Vila, Rules discovery in fuzzy relational databases, in: Conference of the North American Fuzzy Information Processing Society, NAFIPS'95, Maryland (USA), IEEE Computer Society Press, 1995, pp. 414±419. [9] J.C. Cubero, J.M. Medina, O. Pons, M.A. Vila, Extensions of a resemblance relation, Fuzzy Sets and Systems 86 (2) (1997) 197±212. [10] J.C. Cubero, J.M. Medina, O. Pons, M.A. Vila, Fuzzy loss less decompositions in databases, Fuzzy Sets and Systems 97 (2) (1998) 145±167. [11] J.C. Cubero, J.M. Medina, M.A. Vila, In¯uence of granularity level in fuzzy functional dependencies, in: M. Clarke, R. Kruse, S. Moral (Eds.), Symbolic and Quantitative Approaches to Reasoning and Uncertainty, Lecture Notes in Computer Science, vol. 747, Springer, Berlin, 1993, pp. 73±78. [12] J.C. Cubero, M.A. Vila, A new de®nition of fuzzy functional dependencies in fuzzy relational databases, International Journal of Intelligent Systems 9 (5) (1994) 441±448. [13] D. Dubois, H. Prade, Fuzzy Sets and Systems: Theory and Applications, Academic Press, New York, 1979. [14] D. Dubois, H. Prade, Possibility Theory, Plenum Press, New York, 1988. [15] D. Dubois, H. Prade, Generalized dependencies in fuzzy databases, in: International Conference on Information Processing and Management of Uncertainty in Knowledge Based Systems, IPMU'92, Palma de Mallorca, Spain, 1992. [16] H. Gallaire, J. Minker, J.M. Nicolas, Logic and databases: a deductive approach, ACM Computing Surveys 16 (2) (1984) 153±185. [17] T. Imielinski, W. Lipski, Incomplete information in relational databases, Journal of ACM 31 (4) (1984). [18] D. Laurent, N. Spyratos, Partition semantics for incomplete information in relational databases, In SIGMOD, ACM Press, New York, 1988, pp. 66±73. [19] Lipski, W., Jr., On semantic issues connected with incomplete information databases, ACM Transactions on Database Systems 4 (3) (1979) 262±296. [20] K.C. Liu, R. Sunderraman, A generalized relational model for inde®nite and maybe information, IEEE Transactions on Knowledge and Data Engineering 3 (1) (1991) 65±76. [21] J.M. Medina, J.C. Cubero, O. Pons, M.A. Vila, Towards the implementation of a generalized fuzzy relational database model, Fuzzy Sets and Systems 75 (1995) 273±289. [22] J.M. Medina, O. Pons, M.A. Vila, GEFRED: a generalized model for fuzzy relational databases, Information Sciences 76 (1994) 87±109. [23] A. Ola, G. Ozsoyoglu, Incomplete relational database models based on intervals, IEEE Transactions on Knowledge and Data Engineering 5 (2) (1993) 293±308. [24] O. Pons, J.C. Cubero, J.M. Medina, M.A. Vila, Dealing with disjunctive and missing information in logic fuzzy databases, International Journal on Uncertainty and Fuzziness in Knowledge Based Systems 4 (2) (1996) 177±201. [25] H. Prade, C. Testemale, Generalizing database relational algebra for the treatment of incomplete/uncertain information and vague queries, Information Sciences 34 (1984) 115±143. [26] K. Raju, A. Majumdar, Fuzzy functional dependencies and loss less join decomposition on fuzzy relational database systems, ACM Transactions on Database Systems 13 (2) (1988) 29± 166. [27] K. Tanaka, M. Yoshikawa, K. Ishihara, Schema design, views and incomplete information in object-oriented databases, Journal of Information Processing 12 (3) (1989) 239±250.

270

J.C. Cubero et al. / Information Sciences 121 (1999) 233±270

[28] J.D. Ullman, Principles of Database and Knowledge-Base Systems, vol. I, Computer Science Press, Rockville, MD, 1988. [29] M. Umano, Freedom-0: A fuzzy databases system, in: M.M. Gupta, E. Sanchez (Eds.), Fuzzy Information and Decision Processes, North Holland, Amsterdam, 1982, 339±347. [30] Y. Vassiliou, Functional dependencies and incomplete information, in: Proceedings Sixth International Conference on VLDB, 1980, pp. 260±269. [31] M.A. Vila, J.C. Cubero, J.M. Medina, O. Pons, A logic approach to fuzzy relational databases, International Journal of Intelligent Systems 9 (5) (1994) 449±461. [32] M.A. Vila, J.C. Cubero, J.M. Medina, O. Pons, A conceptual approach for dealing with imprecision and uncertainty in object-based data models, International Journal of Intelligent Systems 11 (1996) 791±806. [33] L. Weiyi, The reduction of the fuzzy data domain and fuzzy consistent join, Fuzzy Sets and Systems 50 (1992) 89±96. [34] L. Zadeh, Knowledge representation in fuzzy logic, IEEE Transactions on Knowledge and Data Engineering 1 (1) (1989) 89±100. [35] R. Zicari, Incomplete information in object-oriented databases, Sigmod 19 (3) (1990) 5±16.