Temporal Probabilistic Object Bases - AMiner

Report 3 Downloads 88 Views
Technical Report 29-01, December 2001. Dipartimento di Informatica e Sistemistica, Universit`a di Roma “La Sapienza”, ITALY. URL = http://www.dis.uniroma1.it/pub/AI/papers/tr-29-01.pdf

Temporal Probabilistic Object Bases Veronica Biazzo

Rosalba Giugno



Thomas Lukasiewicz



V.S. Subrahmanian



Abstract There are numerous applications where we know that a certain event occurred during some time period, but we do not know exactly when that event occurred. Dyreson and Snodgrass have shown how this kind of temporal uncertainty can be handled in relational databases. In this paper, we propose two data models to handle temporal indeterminacy in object bases. The first model, which we call the explicit model, provides an extension of the relational algebra that explicitly considers all possibilities. This makes defining algebraic operations easy, but makes their implementation quite inefficient. The second model, which we call the implicit model, overcomes these deficiencies by proposing the intelligent use of constraints. This causes the model to be succinct. We also propose an implicit algebra on the implicit representation. We show that each implicit algebra operation precisely captures its explicit counterpart.

1 Introduction There are numerous applications involving temporal indeterminacy. For instance, consider a commercial package delivery company (examples of companies in this broad class include UPS, Fedex, DHL, and many others). Such a company has detailed statistical information on how long packages take to get from one zip code to another, and often even more specific information (e.g., how long it takes for a package from one street address to another). A company expecting deliveries would like to have some statistical information about when the deliveries will arrive (an answer of the form “There is a 10 - 20% probability of the package being delivered between 9 am and 1pm, and a 80 - 90% probability of being delivered between 1pm and 5pm”) is far more helpful to the company’s decision making processes than the bland answer given today (“It will be delivered sometime today between 9 am and 5pm”). Temporal indeterminacy also arises in many other situations. Dyreson and Snodgrass [5] have identified numerous other applications where temporal indeterminacy is important. For example, radio-carbon dating efforts in archaeology are temporally indeterminate — a historical relic may be dated as “sometime between 500 and 400 BC.” Likewise, timeseries prediction programs are also uncertain about when certain events will occur. There are literally hundreds of stock market prediction programs containing models of when stocks are expected to reach certain prices. When the results of such programs are stored in databases and subjected to querying, the need to handle temporal indeterminacy is even more acute. In this paper, we propose for the first time, a formal theoretical foundation for object bases containing temporal indeterminacy. As probabilities are the best known method for handling uncertain information, our model for indeterminacy (like that of Dyreson and Snodgrass [5]) is probabilistic. The organization and contributions of this paper are as follows.  Dipartimento di Matematica e Informatica, Universit`a di Catania, Viale A. Doria 6, 95125 Catania, Italy. E-mail:  vbiazzo, giugno   @dmi.unict.it. Dipartimento di Informatica e Sistemistica, Universit`a di Roma “La Sapienza”, Via Salaria 113, 00198 Roma, Italy. E-mail: [email protected].  Institute for Advanced Computer Studies, Institute for Systems Research and Department of Computer Science, University of Maryland, College Park, Maryland 20742. E-mail: [email protected].

1

Package

d 0.5 Letter

0.3

d 0.2

Box 0.4

0.3 Priority

Tube

0.2

0.3

One_transfer

0.6 Express_saves 0.2

Two_transfer

Figure 1: Package Example with probability assignment 

In Section 2, we introduce some basic definitions in probability theory and temporal databases. An important definition introduced here is that of explicit values and implicit values — the latter are succinct representations of the former. In Section 3, we define the concept of a temporal probabilistic object base (TPOB for short). We define the important concept of an explicit TPOB-instance and an implicit TPOB-instance with the latter being a succinct representation of the former. In Section 4, we describe an explicit TPOB-algebra (e-algebra for short) that operates on explicit TPOB-instances. The advantage of the e-algebra is that using it, it is relatively intuitive to define the operations. However, it does not have an efficient implementation. In Section 5, we define an implicit TPOB-algebra (i-algebra for short) that operates on implicit TPOBinstances. We show that the i-algebra correctly implements the e-algebra (in other words, the answers produced by the i-algebra operations succinctly represent the answers produced by the corresponding e-algebra operations). As the i-algebra works on succinct representations, it has better computational properties. In Section 6, we compare our work with related work on temporal indeterminacy.

Section 7 contains directions for future work and concludes the paper.

The key contributions of this paper are: (1) the definition of implicit TPOB-instances, (2) the definition of the implicit algebra operations, and (3) the results stating that these implicit algebra operations are correct implementations of the explicit algebra operations. The importance of (3) cannot be overemphasized — for example, a simple statement such as “Package P will be delivered sometime between 9 am and 5 pm today” expands out into 8 explicit statements if our chronon (i.e., smallest temporal granularity used) is hour, 480 explicit statements if our chronon is minute, and 28,800 explicit statements if our chronon is a second. Thus, a single statement in the implicit algebra can capture huge amounts of explicit data in a succinct fashion. The ability to correctly manipulate this implicit data is critical to the efficiency of TPOB systems. Our work builds directly on top of pioneering work by Dyreson and Snodgrass [5]. Our work differs from theirs in several ways. First, it applies to object bases, while theirs applies to relational databases. Second, we make no independence assumptions between events — the user’s query can explicitly encode her knowledge of the dependencies between events, if any. Third, our work introduces an algebra, while their work defines an SQL extension. Fourth, we present formal definitions of important notions like coherence and consistency and show that under appropriate assumptions, all our operations preserve coherence and consistency.

2

2 Basic Definitions In this section, we recapitulate some basic definitions. In particular, we recall the notion of a calendar. We then define classical types and their values, and introduce probabilistic types and their explicit and implicit values. Finally, we describe the concept of a probabilistic strategy.

2.1 Calendars We now recapitulate the concept of a calendar due to Kraus et al. [15]. A calendar consists of a linear temporal hierarchy of time units and a validity predicate specifying valid time points. Definition 2.1 (time unit) A time unit consists of a name and a time-value set. For example, the time unit named  has the time-value set  . Definition 2.2 (linear temporal hierarchy) A linear temporal hierarchy is a finite set of distinct time units with a linear order  among them. For example, !#"%$&%'(*),+.-/&1032 is a linear temporal hierarchy. Definition 2.3 (time point) Let 46587:9;7@? be a linear temporal hierarchy. A time point over 4 is a tuple AB0C9DDE0F?HG , where each 0JI is a time-value in the time-value set of 7@I . We use KML to denote the usual lexicographic order on all time points over 4 , which is defined by AB09NE0F?GOKMLPAB0JQ 9 DE0FQ? G iff some +SRTNE-U exists such that 0WVX5Y0 V Q for all Z[RTNE+]\^ and 0FI_K>0 QI . We use `ML to denote the reflexive closure of KML . For example, A3Daba cDNG is a time point over 4d5#H2D'(*)e$-/03",^ . Definition 2.4 (calendar) A calendar f consists of a linear temporal hierarchy 4 and a validity predicate. The validity predicate specifies a non-empty set of valid time points over 4 . A calendar is finite if the set of all its valid time points is finite. In the rest of this paper, all calendars are assumed to be finite unless specified otherwise. Intuitively, A3Daba cDNG and A3Daba cNG are time points over 4 5gH2D'*h)e$-/0E"Yi . The validity predicate may now characterize the former as valid and the latter as invalid. The reader interested in how to specify validity predicates may consult [15].

2.2 Types and Values This section introduces types, and values associated with types. It is divided into three parts — classical types and values, probabilistic types, and explicit and implicit values of probabilistic types. 2.2.1 Classical Types and Values Every classical type j is associated with a domain, denoted kml n[Aoj%G , which specifies the set of values of j . In this paper, we assume that pq5 . An explicit value of a probabilistic type ‡ ƒ9 ˆ j9v…ƒS† ˆ j†‰ is of the form ‡ ƒ„9 ˆ Š9N…ƒŒ† ˆ Šb†‰ , where Š9NEŠb† are either values or explicit values of j9N…j† . Example 2.11 An example of an explicit value of the atomic probabilistic type ‡ ‡ uCrt›Sw|‰ ‰ , where ur›Sw is the calendar over the linear temporal hierarchy "%$&%'¨[),+.-/&1032 , is AEA3D¥GCN‡£©H”ªv‰BG|NAA3D « G , ‡£©H”ªv‰BG . An explicit value of the atomic probabilistic type ‡ ‡ {EuCz.rtsNx ‰ ‰ is A3¬Uw|­ ®/~ z°¯ , ‡”£©‰BGCNA²±³€v{  rsNxbu3~smN‡”«”ªv‰BG . Let ƒ be an attribute and Š!5´ABŠ 9 N‡ ¤ 9 E& 9 ‰BGCvNABŠ ? N‡ ¤ ? E& ? ‰BG be an explicit value of an atomic probabilistic type. Then, “ƒ ˆ Š ” intuitively says that “The probability that ƒ has the value ŠI lies in the interval ‡ ¤µIJE&1I²‰ ”. We assume that the events “ƒ ˆ ŠI ” are exhaustive and pairwise mutually exclusive, which implies the following notion of consistency for explicit values of probabilistic types. Definition 2.12 (compactness and consistency) An explicit value Š#5¶ABŠ9DN‡ ¤.9DE&=9E‰BGCvNABŠ?@N‡ ¤·?@E&%?‰BG of an atomic probabilistic type is compact iff Š9NEŠ? are pairwise distinct. We say Š is consistent iff ? ? Š is compact and ¸ I·¹:9 ¤·I„`P[`q¸ I·¹:9 &%I . An explicit value Š of a probabilistic type is compact (resp., consistent) iff all contained explicit values of atomic probabilistic types are compact (resp., consistent). 2.2.4 Implicit Values of Probabilistic Types We now introduce implicit values of probabilistic types. These are implicit representations of explicit values — we generalize a constraint-based approach due to Dekhtyar et al. [2]. 4

Temporal and Data Constraints. Using a temporal constraint, we can implicitly define a set of valid time points (namely the solutions of that constraint) w.r.t. a given calendar. In contrast, a data constraint specifies a set of data values from a totally ordered domain. We now define the syntax of temporal and data constraints. Definition 2.13 (temporal constraint) Let j be a calendar with linear temporal hierarchy 4h5^79¨8ºººm 7@? . An atomic temporal constraint for j has one of the following forms: (7¡I»UŠI ) where » belongs to `DKC5¼ 5( ½ D¾D¿; and ŠI is a time-value in the time-value set of time unit 7@I . We call (7¡Ib»UŠI ) an atomic time-value constraint. (0C9_ÀÁ03 ) where 09DE03„R[kml n[Aoj%G and 09S`SL*03 . We call (09ŸÀÁ03 ) an atomic time-interval constraint. We use (0C9 ) to abbreviate (09ÃÀ#0C9 ). A temporal constraint for j is a Boolean combination of atomic temporal constraints for j (that is, constructed from atomic temporal constraints by using the Boolean operators Ä , Å , and Æ ). Definition 2.14 (data constraint) Let j be a classical type with totally ordered domain kHl nÇAoj%G . An atomic data constraint for j is either of the form AoÈ^»“ŠG , where »ŸR¼`„DKC5( 5( ½ D¾D¿ and Š;RkHl n[Aoj%G , or of the form ABŠ9³ÀɊbÂDG , where Š9DEŠÂ]RkHl neAoj%G with Š9Ê`hŠb . We use (Š9 ) to abbreviate (Š9ËÀɊ9 ). A data constraint for j is a Boolean combination of atomic data constraints for j . We now define the semantics of temporal and data constraints, that is, the set of time points and data values, respectively, that they specify. Definition 2.15 (solution to a temporal constraint) Let j be a calendar with linear temporal hierarchy 4 5Ì7:9#ͺºº‹Î7@? . A time point ÏÐ5 AWÏ 9yvÏD?HG^RPkml n[Aoj%G is a solution to an atomic temporal constraint A²7¡Ib»UŠI.G (resp., AB09TÀÑ0FÂDG ), denoted ÏYÒ 5 A²7@I»]ŠI.G (resp., ÏÓÒ 5 AB0C9ÔÀÑ03ÂNG ), iff ÏNI»UŠI (resp., 09Õ`ML^ύ`ML¦0F ). We inductively extend the notion of solutions to all temporal constraints by: Ï¼Ò 5>ēfM9 iff it is not the case that Ï(Ò 5ÐfM9 , Ï¼Ò 5ÖA.fM9Åef¨ÂDG iff Ï(Ò 5>fM9 and ÏXÒ 5>f¨Â , Ï¼Ò 5ÖA.fM9Æef¨ÂDG iff Ï(Ò 5>fM9 or Ï(Ò 5>f¨Â . Definition 2.16 (solution to a data constraint) Let j be any classical type with totally ordered domain kHl neAoj%G . A value ÏÔRYkHl n[Aoj%G is a solution to AoÈ#»UŠG (resp., ABŠ9,À׊bÂNG ), denoted ÏØÒ 5ÙAoÈ#»]ŠG (resp., Ï§Ò 5dABŠ 9 Àڊ  G ), iff ÏÛ»“Š (resp., Š 9 ` . The most widely used pdf is the uniform distribution: The uniform distribution over Þ , denoted å ä , is the function å ä ˆ Þæàç‡ ¥D|‰ defined by åèAWÏG]5ÖÛé:- for all ύR§Þ . The following are some other standard probability distribution functions. Here, we additionally assume that Þ is totally ordered by ÏDIUK*Ï|V iff +ÃKêZ for all +…°Z!RËNE-U : The geometric distribution over Þ for ¥êKìëìKí , denoted î ä ïtð , is the function î äbïtð ˆ ÞÐà ‡ ¥D|‰ I defined by î ä ïð AWÏDI.G“5ÊëXºNA3Õ\‹ë¡G for all ÏNIR[Þ . The binomial distribution over Þ for ¥TKYëÐKd , denoted ñ ä ïð , is the function ñ ä ïtð ˆ ÞÖà ‡ ¥D|‰ ? I ?òmI defined by ñ äbïtð AWÏ I G]5ÖA I G1º.ë ºNA3M\‹ë¡G for all Ï I R§Þ . The Poisson distribution over Þ for ó¦¾Ö¥ , denoted ô äbïBõ , is the function ô ä ï²õ ˆ ÞÁà ‡ ¥D|‰ defined ò õ I by ô ä ï²õ AWÏNI°G]5Á2 ºCó é:+mö for all ÏNIR§Þ . 5

Implicit Values of Probabilistic Types. In order to define implicit values of probabilistic types, we start by defining implicit tuples — a concept borrowed from [2]. Definition 2.19 (implicit tuple) Let j be either a calendar or a classical type with a totally ordered domain. An implicit tuple (or i-tuple) for j is a 5-tuple A.fŒ…ÈÇ…¤FE&Û÷G , where fŒ…È are constraints for j with øÊù ÜEl Ý.A.fGÇú×ÜEl ݰAoÈ,G , ¤FE& are reals with ¥#`פ!`P&´`6 , and ÷ is a distribution function over ÜEl ݰAoÈ,G . If ÜEl Ý.A.fG“5ÁÜEl Ý.AoȋG , we use A.ûOG to abbreviate f . We next give a formal definition of implicit values of probabilistic types. Definition 2.20 (implicit values of probabilistic types) We define implicit values of probabilistic types by induction as follows: An implicit value of an atomic probabilistic type ‡ ‡ j‰ ‰ is a finite set of implicit tuples for j . An implicit value of a probabilistic type ‡ ƒ9 ˆ j9v…ƒS† ˆ j†‰ is of the form ‡ ƒ9 ˆ Š9N…ƒŒ† ˆ Šb†‰ , where Š9NEŠb† are either values or implicit values of j9N…j† . Example 2.21 An implicit value of the atomic probabilistic type ‡ ‡£ur›Sw|‰ ‰ , where ur›Sw is the calendar over the linear temporal hierarchy "m$&%'¨[),+.-/&1032 , is ŠO5ÖAEA.ûOGCNAEA3D¥GŸÀ´A3D©GEGC…¥”«DyåG . Every implicit value Š of an atomic probabilistic type ‡ ‡£j‰ ‰ has an equivalent explicit value üABŠHG , which is defined by üABŠHG’58ABŠQoN‡ ¤º÷ABŠQ²GCE&ºC÷ABŠQ²G°‰BGSÒbý/A.fŒ…ÈÇ…¤JE&:÷G@R(Š ˆ ŠQbRÜEl ݰA.fXG . Example 2.22 Let us reconsider the implicit value Š of Example 2.21. Its explicit value üABŠG is given by AEA3D¥GCN‡·”v‰BG|NAEAEDvG , ‡·”v‰BGCNAEA3Db GCN‡þ·D”‰BGCAEA3NCb G|N‡þ·N”v‰oGCNA…A3D©GyN‡µtv‰BGC . It is now easy to extend the notions of compactness and consistency to implicit values. Definition 2.23 (compactness and consistency) An implicit value ŠO58A.fM9D…È!9D…¤.9DE&=9D÷v9|GCNNA.fÃ?@…ȼ? , ¤·?%E&1?%÷|?mG of an atomic probabilistic type is compact iff ÜEl ݰA.fÃI“ŦfUVG#5Î ½ ø for all +…°Z -,G"! ,$0/21H JI9-,GC!#JIB! ,$    "!  $& E@K   , "! , $-/J1; L.M#  % , /B"=@?AON"! QP ! , /$

Table 2: Disjunction strategies Mutual exclusion Positive correlation Independence Ignorance

   " !   $ R '*)   , "! , $0/213 = ?SON QP  , /=@?AONG! QP ! , /$    " !   $ R7698:  , "! , $0/:1; 5i^A _UB `¡9 aÁ…ncbby=G of Example 3.6 and the TPOB-instance Äe5ÉAÂUXÅ%G over ¬ of Example 3.8. The assignments ` Q and Å Q obtained by projecting ¬ and Ä on %q5YŽ;zWr xrts…™Õrt›Œw are shown in Tables 10 and 11, respectively.

w

` Q resulting from projection Table 10: ¡ x  w y5z { |Bz[}G~  ~ ~ ˆDˆ 

— ŠV˜ šA›œ.~ y ƒ Ё0ƒ ˆ  ™ ~ z ~ ˜  ‡‡ ‡ Œ ‡ ~ z ~ ‚„ -ˆ „V"‡ ž  šQ¢ z ~ Š -ˆ „[O‡ ž 

 / }  ‚ƒ …ƒ „A†Z‡Oˆƒ…„ }  ‚ƒ …ƒ „A†Z‡Oˆƒ…„ }  ‚ƒ ƒ…„A†Z‡Oˆƒ…„ }  ‚ƒ ƒ…„A†Z‡Oˆƒ…„ }  ‚ƒ ƒ…„A†Z‡Oˆƒ…„ }  ‚ƒ ƒ…„A†Z‡Oˆƒ…„ }  ‚ƒ ƒ…„A†Z‡Oˆƒ…„ }  ‚ƒ ƒ…„A†Z‡Oˆƒ…„

} $ } $ } $ } $ }

š 

}  }

} 

š

ƒ…Ž ƒ…Ž

~

†#  ˆ%ƒ…Ž ~

ƒ…Ž š



~

ƒ…Ž š

$ $@$ ~

~

~

†#  ˆ%ƒ…Ž †#  ˆ%ƒ…Ž †#  ˆ%ƒ…Ž

$ $@$ ~

~

$ $@$ $ $@$

4.5 Extraction To our knowledge, the extraction operation is one that has never been defined in probabilistic databases. This operation is unique to object bases because it allows for classes to be selected (and hence for other classes to be dropped) from the class hierarchy of a TPOB-schema. 15

Table 11: Å Q resulting from projection Ñ Î  / } ~ š ~ Ì Í €‚0ƒ …ƒ „C†SÒ\Š5Ž  ƒ…Ž † Ó54G4o/V…Ô ÕoØZXԀØ[$0/9Ó5XN94o/V…Ô Õ59Ô Ø $-/ $ } y¾z š ~ Ì Í €‚0ƒ …ƒ „C† ƒ ‡B …ƒ Ž †  ON[ڒ44/B9@Ô ×5XÔ Õ.$0/9ON[ڒ4oØG/B9@Ô Õ¾XÔÞÛV$0/ $

Î Î Ï Î Ð

That is, the extraction operation removes classes from the class hierarchy and all objects in the dropped classes. We first define the extraction operation on TPOB-schemas. Definition 4.24 (extraction on TPOB-schemas) Let ¬„5¦A^_B`¡9aÁ…ngb yÛG be a TPOB-schema, and let + be a subset of _ . The extraction on ¬ with respect to + , denoted ,.-MA¬ÛG , is defined as the TPOB-schema ¬@Qm5Ö^A _¡QBB  `¡Qo9aÔQo…ngbQoy¡Q—G , where _¡Qm5/+ .

¡ ` Q is the restriction of ` to _¡Q .

aÔQ is a binary relation on _@Q such that for each dv9DBdy„R¹_¡Q , it holds dv9(aÔQdy iff there exists some path H( 9 a6¯aѺººaɆ such that H9¨5 dN9 , †;5 d| , and ÂN…† ò/9 R®_,\õ_ Q . ngb Q AdDG¨5ÖAEA10 °¹_ Q G324065  G„58 ½ ø‹Ò0qR„ngbAdDG , where 065  5ÖNH9Râ_ Q ÒH9³aÙ¤a纺º#a † for some   v…† ò/9 R®_,\õ_ Q %†XR70×\õ_ Q  , for all dURT_ Q . For all dv9DBdyR®_ Q , we define  Q Adv9DBd|ÂG]58 † ò/9 ]AoIF…I:9:9|G , where the I ’s are such that H9(aɯa I·¹:9 ºº\ º a¶† , H9¨5 dv9 , †5 dy , and Âv…† ò/9 R®_,\õ_@Q . The following example illustrates the use of the extraction operator on TPOB-schemas. Example 4.25 Consider the fully inherited TPOB-schema ¬Ö5çA^_B`¡9aÁ…ngb yÛG of Examples 3.6. The extraction on ¬ with respect to the set of classes +´5Yšm€op¯C€vx w.qwyuFuFwzFDš:zWr ~zWr uo–HCŽ;sw -u|z²€sN{vWwz. is given by the TPOB-schema , - A¬=G“5´A^_ Q B` Q 9a Q …ngb Q y Q G , where: _ Q 5Yšm€p¯|€Nx w.qwyuJu3wz3DšÛzWr ~ zWr uo–HCŽ;svw -uCz²€sv{vowz° .



Q is given in Table 12. `



^A _ Q 9  a Q G and  Q are given in Figure 2. g n bQWA3šm€op¯C€vx wG“5Yb5qwyuFu3wzJCŽsvw -u|z²€sN{vWwz. /š:zWr ~ zWr uo–Hb , ngbQ.A3šÛz.r ~ z.r uo–G“58bŽsvw -u|z²€sN{vWwz.b , and ngb Q AqwyuFu3wz.G]5Ángb Q A.Ž;svw -uCz²€sv{vowzWG]5Áø .

w

x y5z { |Bz[}~  ~ y

ˆDˆ

~ 

ƒ Ёƒ ˆ^ ~ z ~ ‚„ -ˆ „[‡Ož 



w

Table 12: Type assignment ` Q resulting from extraction

 / } } €‚ƒ ƒ…„A†Z‡Oˆƒ…„ } } €‚ƒ ƒ…„A†Z‡Oˆƒ…„ } } €‚ƒ ƒ…„A†Z‡Oˆƒ…„ } } €‚ƒ ƒ…„A†Z‡Oˆƒ…„ ~.™ ‘ z ‰ ˆ †Z“ZŠ ˆ

~

#‰

z

‡"ˆ%ƒ…„

~

‡"ˆ%ƒ…„

z

z

}

ˆ%ƒ Š5„S†Z‡"ˆ0ƒ…„

S‰

ˆ%ƒ Š5„S†Z‡"ˆ0ƒ…„

}

~G‹

}

S‰

~G‹

~

ƒŒ

~G‹

ƒŒ

~

~

~

@Q†S  ˆ%ƒ…Ž @Q†S  ˆ%ƒ…Ž

~

$ $ $

~ }Z‘ $ $S ƒ š ~ #‰ ‡"ˆ%ƒ…„ ˆ%ƒ Š5„S†Z‡"ˆ0ƒ…„ S‰ ƒ Œ @Q†S  ˆ%ƒ…Ž $ $ ƒ…Ž ~ z } ~ ‹ ~ G ~ ~ }Z‘ #‰ ‡"ˆ%ƒ…„ ˆ%ƒ Š5„S†Z‡"ˆ0ƒ…„ S‰ ƒ Œ @Q†S  ˆ%ƒ…Ž $ $S ƒ š ~ ~ } ~ ~ ƒ…Ž †S  ˆ%ƒ…Ž $ $\ŸAƒ ˆA†’‡"ˆ0ƒ…„ 5 W^ƒ Œ †S  ˆ%ƒ…Ž $ $¡ #‰

~

~

ˆ†’“ZŠ †#  ˆ%ƒ…Ž

z ~

ˆ’”•ƒ –.ˆ

‘

†’“ZŠ

z

ˆ$

$ $ $

z ‘ z ˆ†’“ZŠ  ˆ ’”•ƒ –.ˆ ’ † “ZŠ ˆ ‘ ™ ~ ~ ƒ Ž „9 ˆ †S  ˆ%ƒ…Ž $ $ $

,

Our next step is to define the extraction operation on TPOB-instances. Definition 4.26 (extraction on TPOB-instances) Let ÄN5¼AÂUXÅ%G be a TPOB-instance over a TPOB-schema ¬Ð5¶^A _B  `¡9a>…ngbbyÛG , and let + be a set of classes from _ . The extraction on Ä with respect to + , denoted , - A ÄNG , is the TPOB-instance A ¡QoX ÅQ·G over the TPOB-schema , - A¬ÛG“5ÖA^_¡QBB`¡Qo9aÔQo…ngbQoy¡Q—G , where: ¡Q is the restriction of  to _@Q .

 Å Q is the restriction of Å to ¡QWA^_@QµG .

The following example illustrates the application of the extraction operator on TPOB-instances.

16

Package 0.3 Priority

d 0.5

0.06

Letter

0.2

One_transfer

Figure 2: Class hierarchy and probability assignment resulting from extraction

Example 4.27 Consider the fully inherited TPOB-schema ¬>5i^A _UB `¡9 aÁ…ncbby=G of Example 3.6 and the TPOB-instance ÄÕ5ÖAÂUXÅ%G over ¬ of Example 3.8. The extraction on Ä with respect to the set of classes +Ó5 šm€ p¯|€Nx w , qwyuFuFwz , š:zWr ~zWr uo– , Žsvw -u|z²€sN{vWwz. results in a TPOB-instance A¡QoXÅQ—G over the TPOB-schema , - A¬ÛG in Example 4.25, where  Q is given by Table 13 and Å Q is given by Å Q Ao$iDG“5 Å=Ao$iNG .  Q and A¡Q·G Table 13: ¡ w

resulting from extraction Ê  w Ê  Ë w  /

y5z {.|z[}~

ÌGÍ

 ~

ÌGÍ ÌVÎ Ï Í ÌGÍ

~ Dˆ ˆ  0ƒ Ёƒ ˆ^ ~ z ~ ‚„ -ˆ „[‡Ož  y

 /  / Ì[Î Ï Í ÌÍ Ì[Î Ï Í ÌÍ

4.6 Natural Join The operations we have presented thus far access only one TPOB-instance at a time. We now define the important concept of a natural join. Recall that for each class d;R®_ of a TPOB-schema ¬Ç5ÖA^_UB`¡9aì…ncb y=G , the type `“A dDG is a probabilistic tuple type over ¬ . Moreover, each oid $ØR “A dDG that occurs in a TPOBinstance ď5qA ÂUX Å%G over ¬ is associated with a value of the probabilistic tuple type `“A dDG . Each such $ may be written as the list of the values (possibly complex values) for the top-level attributes ƒ 9 N…ƒ_19 and d|Â]RT_ , the common top-level attributes of `/9NA dv9|G and `1 A d|ÂNG are associated with the same types in `=9DAdv9|G and `% AdyÂNG . We may now define the natural join of two natural-join-compatible TPOB-schemas. Definition 4.29 (natural join of TPOB-schemas) Let ¬9M5qA^_19B`/9D9a#9D…ngb9Dy:9CG and ¬/5 A^_HÂB`%Â9aT , n b  y  G be two natural-join-compatible TPOB-schemas. The natural join of ¬ 9 and ¬  , denoted ¬ 9.= > ¬  , g n bby=G , where is the TPOB-schema ¬e5Ö^A _UB `¡9 a>…g _[5È_19•¶ _ .

For all d;5ÚAdv9DBdyÂNGSR•_ , the probabilistic tuple type `“AdGŸ5ڇ ƒ9 ˆ j9N…ƒ( ˆ jþ‰ contains exactly all ƒSI ˆ jyI that belong to either the type `/9vAdv9|G or the type `% AdyÂNG . The directed acyclic graph A^_9a¦G is defined as follows. For all Ë d 5ÑAdv9DBdyÂNGC…#5ÑAoH9D…ÂNGÇR d¯aÉ iff Ad 9 a 9  9 Å d  5ì  G or Ad 9 5ì 9 Å®d  a    GC

The partitioning ncb is given as follows. For all , d 5¶AdN9DBdyÂNG!R n b9vAdv9|G?2ÇbGdv9:¶³k_Òok_RengbDÂbAdyÂNG  g

17

_

: ngbAdDG¦5

_

.kS9J¶OGd|ÂÔÒJkS9[R

:

(Package,Package)

d

d 0.2

0.3 (Priority,Package)

(Tube,Package) 0.3 d

0.2 d

(One_transfer,Package)

0.5

0.5

(Tube,Letter)

d

0.3

(Priority,Letter) 0.2

0.5 (One_transfer,Letter)

Figure 3: Natural join of schemas

The probability assignment  is defined as follows. For all dM5ÖAdv9DBd|ÂNGïaÉ(5ÖAom9D…ÂNG : @ :9NA dv9D…H9|G if d|ÂM5Á “A d…G’5 ¡ÂbA dyÂ…ÂNG if dN9¨5ÁH9 . The following example illustrates the natural join of TPOB-schemas via the Package Example. Example 4.30 Let ¬9 and ¬¡Â be the TPOB-schemas from Examples 3.6 and 4.25, respectively. Then, ¬:9 = > ¬/ is the TPOB-schema ¬[5Ö^A _UB`¡9aÁ…ncbby=G partially shown in Table 14 and Figure 3. Table 14: Type assignment ` resulting from natural join w

x y z { |Bz[}G~ 5 ¾ y z { |Bz[}G~   / šA›oœ.~ y5z { B | z[}~   / y5z . { |z[}~ ƒ Ё0ƒ ˆ’ / ~ z ~  ~ ~ („ -ˆ „V‡"ž D  ˆDˆ ^/ 

y

w  / }

€‚ƒ } €‚ƒ ~.™ ‰ €‚ƒ €‚ƒ ~.™ ‰

}

}

} ƒ…„C†5‡Oˆƒ…„ } ƒ…„C†5‡Oˆƒ…„ ‘ z ˆ †’“ZŠ ˆ$ } ƒ…„C†5‡Oˆƒ…„ } ƒ…„C†5‡Oˆƒ…„ ‘ z ˆ †’“ZŠ ˆ

~ z } ~ ‹ ~ G ~ S‰ ‡"ˆ%ƒ…„ %ˆ ƒ Š5„S†’‡Oˆƒ…„ S  ‰ ƒ Œ @¿†S  ˆ%ƒ…Ž $ $@$ ~ z } ~G‹ ~ ~ ~ }Z‘ z ‘ z  ‰ "‡ ˆ%ƒ…„ %ˆ ƒ Š5„S†’‡Oˆƒ…„ S S  ‰ ƒ Œ @ ¿†S  ˆ%ƒ…Ž $ $C ƒ ˆo†Z“ZŠ ˆ”⃠–Gˆ †“ZŠ ˆ ~ ƒ…Ž ~ Z } ‘ S‰ ‡"ˆ%ƒ…„ ˆ%ƒ Š5„S†’‡Oˆƒ…„ S‰ ƒ Œ @¿†S  ˆ%ƒ…Ž $ $C ƒ š ~ ~ } ~ ~ ƒ…Ž †#  ˆ%ƒ…Ž $ $\ŸAƒ ˆ^¿†Z‡Oˆƒ…„ ’ W0ƒ Œ †#  ˆ%ƒ…Ž $ $%¡ S‰

~

~

‡"ˆ%ƒ…„

z

z

ˆ%ƒ Š5„S†’‡Oˆƒ…„

}

}

S‰

~G‹ ~G‹

ƒŒ

~

~

@¿†S  ˆ%ƒ…Ž

~

~

$ $

š

†S  ˆ%ƒ…Ž

,

~

$ $ $ ‘ z ˆo†Z“ZŠ  ˆ ”⃠–Gˆ †“ZŠ ˆ ‘ ™ ~ ~ ƒ Ž „V’ ˆ †S  ˆ%ƒ…Ž $ $@$ z

,

To define the natural join of TPOB-instances, we first need some preliminary definitions. These include the concept of intersection of two explicit values Š9 and Šb of the same type j , and the natural join of two explicit values Š 9 and Š  of two probabilistic tuple types j 9 and j  , respectively. Definition 4.31 (intersection of explicit values) Let Š9 and Š be either two values of the same classical  type j , or two explicit values of the same probabilistic type j . Let be a conjunction strategy. The  9 °*㠊b , is inductively defined by: intersection of Š9 and Š under , denoted Š+ If j is a classical type and Š915§Šb , then Š+ 9 °*㠊Â=5§Š9 . If j is an atomic probabilistic type and ù 5Á ½ ø , then Š 9 ° 㠊  Á 5 ù , where ù#5ÁABŠ%Bê9



êÂNGŸÒHABŠ%Bê9|G_ReŠ9:ABŠ1BêÂNG_R,ŠbÂv“

If j is a probabilistic tuple type over the set of top-level attributes % then ABŠ 9 ° 㠊  GC ƒ>5#Š 9  ƒH° 㠊   ƒ for all ƒYR)% . 18

and all Š9 ƒâ°*ã;Šb ƒ are defined,

Otherwise, Š9+°*㠊 is undefined. The following example illustrates the above concept. Example 4.32 Let ur›Sw be the standard calendar with respect to the linear temporal hierarchy "%$v&1'Á  ),+.-/&%0E2 , and let be a conjunction strategy. Consider the values Š 9 5֊  5 A3D¥G and Š i 5 A3DE© ¥G of the temporal atomic type ur›Sw . Then, Š+ 9 °*㠊bŸ5ìŠ9 , while Š+ 9 °*㠒Š i is undefined. Consider the following explicit values of the atomic probabilistic type ‡ ‡ uCrt›Sw|‰ ‰ :

Š9¨58AEAWa…¥b¥GCN‡””«v‰BGCvAEA3N¥¥b¥GCN‡þ”D”ªv‰WGCNA…A3 …¥ ¥G|N‡D”«v‰BGC ۊbÂM58AEA3D…¥b¥GCN‡”£©‰BGÃ Š’iM58AEAWa…¥b¥GCN‡”£©‰BGCvAEA3¥b¥GCN‡þ”D”ªv‰WGCNA…A3N…¥ ¥G|N‡D£©‰BGC“ Then, Š 9

Š  is undefined, while Š 9

° ã @è

° ã …è

Š i

58AEAWa…¥b¥GCN‡ ¥ a”v‰BGyNAEAEbb…¥b¥GCv‡ ¥ªD”v‰BGC .

We now come to our second preliminary definition — that of a natural join of two explicit values. Definition 4.33 (natural join of explicit values) Let Š9 and Šb be explicit values of probabilistic tuple  types j9 and j , respectively. Let be a conjunction strategy. Let %[9 and %  be the top-level attributes of j 9 and j  , respectively, and let %í5A% 9 °7%  . Let all ƒÔR!% have the same types in j 9 and j  . The natural  join of Š9 and Šb under , denoted Š9 = > 㧊b , is defined as follows: ABŠ9 = > 㳊bÂDGC ƒÐ5ìŠI3 ƒ for all ƒÔR!% I\B% , +%R¼ , ABŠ9 = > ãˊbÂNGC ƒÐ5ìŠ9 ƒ = > 㳊b ƒ for all ƒÔR!% . > 㠊   ƒ with ƒÊR!% are defined, then Š 9.= Z > 㠊  is defined. If all Š 9  ƒ = Z We are now ready to define the natural join of two TPOB-instances. Definition 4.34 (natural join of TPOB-instances) Let Ä9Ç5áAÂ=9yXÅ9|G and ÄÂæ5áAÂ%ÂXÅÂNG be two TPOBinstances over the natural-join-compatible TPOB-schemas ¬:9@5¦A^_19B`/9D9a#9D…ngb9Dy:9CG and ¬/Â,5¶A^_ÂB`1 ,  aØÂ…g n bNÂy¡ÂNG , respectively. For +XR# , let % I denote the set of top-level attributes of ¬¡I . Let be a  conjunction strategy. The natural join of Ä9 and Ä under , denoted Ä9 = > ã Ä , is defined as the TPOBinstance ÄM5ÖA ÂUX Å%G over the TPOB-schema ¬Ç5 ¬ 9C= > ¬  , where: “A dDG“58Ao$9N…$ÂNG¨â R Â=9DA dv9|Y G ¶•Â%ÂbA dyÂNGÕÒ Å9vAo$9|G = > ã»Å Ao$ÂDG is defined  , for all dÕ5ÖA dv9DB d|ÂG¨® R _1¯ 9 ¶«_ . Å=Ao$ G“5 Å9NAo$9|G = > ã»ÅÂbAo$ÂNG , for all $„5ÖAo$9D…$ÂNG_R«Â“A^_@9¯¶®_HÂNG . Example 4.35 Let ¬:9 and ¬/ be the TPOB-schemas given in Example 3.6 and produced in Example 4.25, respectively. Let Ä9 and Ä be the TPOB-instances over ¬9 and ¬/ produced in Examples 4.11 and 4.27, respectively. Then, Ä9 = > ã ED Ä is the TPOB-instance ÄÐ5 AÂUXÅ%G over ¬:9 = > ¬/ , where  is given by “AEA3š:zWr ~zWr uo–mDš:zWr ~ zWr uo–GEG“58Ao$i…$iNG and “AdDG’5Áø for all other classes d , and Å is given by Table 15. Table 15: Î 

Ñ Î Ï 

Î Ï /

Å

resulting from natural join

Î

 / } ~ ~ z — ~.‹ ~ Ì Í  ‚ƒ …ƒ „A†SÒ\ŠZŽ #‰ ‡Oˆ%ƒ…„ ˆ%ƒ ŠZ„C† ŠG‡"ˆOŠZ„C‰ ƒ Œ  A† ""N9ÓZ%4G4o/9@Ô Õ5XNX$-/B9ON9Ó5×5NV/V…Ô ×Z9NB$0/  š ~ Ì Í ƒ…Ž † Ó54G4o/V…Ô ÕoØZXNB$0/V"^ÓZXNV4/B9@Ô Õ¾XNB$0/ $

4.7 Cartesian Product and Conditional Join In the above definition of natural join, if the sets %)F and %4G are disjoint, then the natural join is called Cartesian product and denoted by the symbol ¶ . The following condition describes when two TPOBschemas ¬:9 and ¬¡Â can be combined using Cartesian product. Definition 4.36 (Cartesian-product-compatible TPOB-schemas) The TPOB-schemas ¬:9@5ØA^_@9DB`/9D9a#9 , n b9y:9|G and ¬/Â(5 A^_ , `1Â9aTÂ…ngbNÂy/ÂNG are Cartesian-product-compatible iff for all classes dv9=R>_@9 and g d  T R _  , the types ` 9 A d 9 G and `  A d  G have disjoint sets of top-level attributes. 19

The conditional join operation combines values of two TPOB-instances that satisfy a probabilistic selection condition ä . Let Ä9 and Ä be TPOB-instances over the Cartesian-product-compatible TPOB-schemas ¬ 9 and ¬  , respectively. The conditional join of Ä 9 and Ä Â with respect to ä , denoted Ä 94= >  Ä Â , is the 9 ¶âÄÂDG over the TPOB-schema ¬¯ 9 ¶õ¬/ . TPOB-instance Ä9 = >  ÄÂM5 `  A Ä· Example 4.37 Let ¬:9 and Ä9 be the TPOB-schema and the TPOB-instance, respectively, produced in Example 4.23. Let ¬  and Ä Â be the TPOB-schema and the TPOB-instance obtained from ¬ 9 and Ä 9 , respectively, by renaming the attributes Ž;z.r xrts and ™Ÿrt›Œw with Ž;zWr xrtsH and ™Õrt›Œw , respectively. The Cartesian product of ¬:9 and ¬¡Â is the TPOB-schema ¬Ç5ÖA^_B`¡9a>…ngbbyÛG partially shown in Table 16 and Figure 3. The conditional join of Ä9 and Ä with respect to ä,5´A.Ž;zWr xrts„5JI¡~›SwÃÅeŽzWr xrsKS5֚m€zWr {CGy‡þD|‰ is the TPOB-instance Ä;5qA ÂUX  Å%G over the TPOB-schema ¬§ 9 ¶º¬¡Â , where  is given by “AEA3šÛz.r ~ z.r uo–m…™]­“~ -u|z²€sN{ vWwzoGEGŸ5´Ao$ ib…$ hNG and “A dDG“5>ø for all other classes d , and Å is shown in Table 17. Table 16: Type assignment ` resulting from conditional join w 

x y5z {.|Bz[}G~

y¾z { |Bz[}G~

 / šA›oœ.~ y¾z { |Bz[}G~   / y y5z { |Bz[}~  0ƒ Ёƒ ˆ / ~ z ~  ~ ~ ‚„ -ˆ „[‡Ož O ˆDˆ /

w  /

} €‚ƒ …ƒ „C†5‡Oˆƒ…„ } €‚ƒ ƒ…„C†5‡Oˆƒ…„ } €‚ƒ ƒ…„C†5‡Oˆƒ…„ } €‚ƒ ƒ…„C†5‡Oˆƒ…„

}

} } \‚ƒ …ƒ „ML†Z‡Oˆƒ…„ $ } } \‚ƒ ƒ…M „ L†Z‡Oˆƒ…„ $ } š ~ ~ } }  ƒ…Ž †C  ˆ%ƒ…Ž $ $\‚ƒ …ƒ „ML†Z‡"ˆ0ƒ…„ $ } š ~ ~ } }  ƒ…Ž †C  ˆ%ƒ…Ž $ $\‚ƒ …ƒ „ML†Z‡"ˆ0ƒ…„ $ }

Table 17: Value assignment Å resulting from conditional join Î

Ñ 

Î Ï 

Î Ð /



Î /

} ~ š ~ Ì Í €‚0ƒ …ƒ „C†SÒ\Š5Ž  ƒ…Ž † " Ó544/V…Ô ÕoØZXԀØ[$0/V"Ó5XNV4/V…Ô Õ59Ô Ø $-/ , } y5z š ~ Ì Í ‚0ƒ …ƒ „NL† ƒ ‡B …ƒ Ž L†  ONVÚZ44/V…Ô ×Z9Ô ÕG$-/B9ONVÚZ4oØG/V…Ô Õ59ԀÛ[$-/ $

4.8 Intersection, Union, and Difference We finally define the operations of intersection, union, and difference on two TPOB-instances over the same TPOB-schema. We first describe intersection. Informally speaking, this operation intersects the sets of oids of two TPOB-instances, as well as the explicit values associated with each oid in both TPOB-instances. Definition 4.38 (intersection of TPOB-instances) Let Ä95OAÂ/9DXÅ9CG and ÄÂD5¼AÂ1ÂXÅÂNG be TPOB-instances  over the same TPOB-schema ¬´5ÎA^_B`¡9aÁ…ngb yÛG , and let be a conjunction strategy. The intersec 9 °*ã¹Ä , is the TPOB-instance AÂUXÅ%G over ¬ , where: tion of Ä9 and Ä under , denoted Ä+ “AdDG“58N$_RÉÂ=9NAdG:°®Â%ÂbAdDGMÒÅ9vAo$ G°*ã Å Ao$ G is defined  , for all dUR>_ .

= Å Ao$ G“5

Å9NAo$ G°*ãgÅÂbAo$ G

, for all $¨RÉ“A^_=G .

Example 4.39 Let ¬ be the TPOB-schema of Example 3.6. Let Ä9 and Ä be the TPOB-instances over ¬ given in Example 3.8 and produced in Example 4.14, respectively. The intersection of Ä9 and Ä under the  conjunction strategy IPO is the TPOB-instance Ą5 A ÂUX Å%G over ¬ , where  is given by “A3š:zWr ~zWr uo–GŸ5ø for all other classes d , and Å is shown in Table 18. Table 18: Î

Ñ Î Ï



Î

Å

resulting from intersection

/

} ~ ~ z — ~ ‹ ~ . Ì Í €‚0ƒ …ƒ „C†#Ò\ŠZŽ S‰ ‡"ˆ%ƒ…„ ˆ%ƒ Š5„S† ŠG‡"ˆOŠZ„#‰ ƒ Œ  Q† ON9Ó544/V…Ô Õ59NB$0/V"ONVÓZ%×ZNV/B9@Ô ×5XNX$-/ š ~ Ì Í ƒ…Ž † "^ÓZ44/B9@Ô ÕؒXNX$-/ $

20

,

Likewise, the union operation intuitively computes the union of the sets of oids of two TPOB-instances, combined with the union of the two explicit values associated with each oid in both TPOB-instances. We first define the union of two explicit values of the same type. Definition 4.40 (union of explicit values) Let Š9 and Šb be either two values of the same classical type j , or two explicit values of the same probabilistic type j , and let  be a disjunction strategy. The union of Š9 and Š  under  , denoted Š 9 2 ç Š  , is inductively defined as follows: If j is a classical type and Š 9 5§Š  , then Š 9 2 ç Š  5ìŠ 9 .

If j is an atomic probabilistic type, then Š9$2*çXŠbÂ

5

ABŠ1Bê9yG@RŠ9ŒÒvŠŒRÇô¡9D\§ô%Âv?2ÇABŠ%BêÂG@RŠbÒvŠŒRÉô%Âb\§ô/9|2 ABŠ1Bê9WeêÂDGÕÒABŠ%Bê9G@RŠ9DÛABŠ%BêÂvG@RŠbÂv“

where ô 9 58DŠ,ÒABŠ1BêG@RXŠ 9  and ô  5YDŠ,ÒABŠ1BêG@RXŠ   . If j is a probabilistic tuple type over the set of top-level attributes % then ABŠ92*ç ŠÂNGC ƒ>5#Š9 ƒQ2*ç ŠbÂv ƒ for all ƒYR)% .

and all Š 9  ƒB2 ç

Š   ƒ are defined,

Otherwise, Š92*ç Š is undefined.

We are now ready to define the union of two TPOB-instances. Definition 4.41 (union of TPOB-instances) Let Ä 9 5ÚA 9 XÅ 9 G and Ä Â 5ÚA  XÅ Â G be TPOB-instances over the same TPOB-schema ¬>5ÉA^_B`¡9aÁ…ngb yÛG . Let  be a disjunction strategy. The union of Ä9 and Ä under  , denoted Ä9#2*ç¹Ä , is the TPOB-instance AÂUXÅ%G over ¬ , where: “AdDG“5ÖAÂ=9vAdG|\§Â%ÂbAdDGEG32ËAÂ1 AdG|\§Â/9NAdDGEGR2

N$_RÉÂ 9 AdDG2°«Â Â AdDGMÒÅ 9 Ao$ GS2 V Å9DAo$ G TU ÅÂAo$ G

= Å Ao$ G“5

ç Å

 Ao$ G is defined  , for all dUR>_ .

if $_RÃÂ/9NAdDG|\¯Â1ÂbAdG if $_RÃÂ%ÂbAdDG|\¯Â=9NAdG if $_à R  9 A dD2 G °®Â  A dDG .

Å 9 Ao$ GS2 ç Å Â Ao$ G WU We finally define the difference of two TPOB-instances.

Definition 4.42 (difference of explicit values) Let Š 9 and Š  be either two values of the same classical type j , or two explicit values of the same probabilistic type j , and let  be a difference strategy. The difference of Š9 and Š under  , denoted Š9’\YX Šb , is inductively defined by: If j is a classical type and Š915§Šb , then Š9’\YX ŠÂM5ìŠ9 .

If j is an atomic probabilistic type, then Š9/\<XŠbÂÍ5

ABŠ1Bê9yG@R„Š9SÒŠŒRüô¡9N\¯ô%ÂZ2ÇABŠ1Bê9WeêÂNGŸÒHABŠ1Bê9|G@R„Š9:ABŠ%BêÂNG@RŠb“

where ô 9 58DŠ,ÒABŠ1BêG@RXŠ 9  and ô  5YDŠ,ÒABŠ1BêG@RXŠ   . If j is a probabilistic tuple type over the set of top-level attributes % then ABŠ9’\YX ŠÂNGC ƒ>5#Š9 ƒì\YX ŠbÂv ƒ for all ƒYR)% .

and all Š 9  ƒ§\ X Š   ƒ are defined,

Otherwise, Š9’\YX Š is undefined.

Definition 4.43 (difference of TPOB-instances) Let Ä9;5íAÂ=9XÅ9|G and ÄÂ(5íAÂ%ÂXÅÂNG be TPOB-instances over the same TPOB-schema ¬#5hA^_UB`¡9aì…ncb y=G . Let  be a difference strategy. The difference of Ä9 and Ä under  , denoted Ä9“\Y¹ X Ä , is the TPOB-instance AÂUXÅ%G over ¬ , where: “AdDG“5ÖAÂ=9vAdG|\§Â%ÂbAdDGEG32[N$_RÉÂ=9NAdG:°®Â%ÂbAdDGMÒÅ9vAo$ GU\YX¹Å Ao$ G is defined  , for all dUR>_ . ö

= Å Ao$ G“5

Å9NAo$ G Å9NAo$ GU\YX®Å Ao$ G

if $¨RTÂ=9vAdDG|\·Â% AdDG if $¨RTÂ=9vAdDG2°®Â%ÂbAdDG .

21

5 The Implicit Algebra The explicit algebra described in the preceding section suffers from many problems. First, the sizes of TPOB-instances can be very large. As we can see from Table 7, a probability must be associated with each time point involved. However, to merely say that a given package will arrive at St. Louis sometime between 5:30pm and 6:30 pm may (if we reason at a minute by minute level) require 60 time points to be specified (Table 7 only shows a couple of time points). Second, because of the large size of the explicit TPOB-instances, the costs of executing the operations is also potentially high as their inputs are large. In this section, we alleviate this problem by defining TPOB algebraic operations on implicit TPOBinstances. These implicit operations correctly implement their explicit counterparts defined in Section 4.

5.1 Selection In order to define the selection operation for implicit TPOB-instances, it is sufficient to define how to evaluate path expressions and how to assess the probability that an implicit value satisfies an atomic selection condition. The valuation of selection conditions, the satisfaction of probabilistic selection conditions, and the selection on implicit TPOB-instances are then defined in the same way as in Section 4.1. Definition 5.1 (valuation of path expressions) Let ô be a path expression for the probabilistic type j . The valuation of ô under an implicit value Š of j , denoted Š1 ô , is defined as follows: If Šè5և ƒ9 ˆ Š9DD…ƒŒ† ˆ Šb†N‰ and ô85ÁƒMIDé , then Š% ôÓ5#ŠIé . If ŠÔ5hA.fM9D…È!9D…¤.9DE&=9D÷v9yGCvNA.f¨†…È膅¤µ†E&1†÷y†bG and ôi5 ‡ ‡€éŒ‰ ‰ , then Š1 ôi5hA.fM9N…È!9D…¤W9DE&Û9 ,

÷ 9 BéGCNNA.f¨†…È膅¤µ†E&1†÷y†BéG . We call such sets generalized implicit values of j . Š% ô is undefined otherwise.

Definition 5.2 (valuation of atomic selection conditions) Let ÄÊ5çA ÂUX Å%G be an implicit TPOB-instance over the TPOB-schema ¬e5´A^_B`¡9a>…ngbbyÛG , and let $_RÓA^_ÛG . Let  be the disjunction strategy for mutual exclusion. The probabilistic valuation with respect to Ä and $ , denoted ë#ì…l’í î ï ð , is the following partial mapping from the set of all atomic selection conditions to the set of all closed subintervals of ‡ ¥D|‰ : ë#ì…l’í î ï ð AB+°-’AdGEGÃ5և ncñ0ò=Ab[ß àNAáNGDAo$ GEGC…ncóoß:Ab[ß àNAáNGDAo$ GEG°‰ . Let ô be a path expression for the type of $ . If ÅÛAo$bGC ô is a value of a classical type, then define ôÖ5

AEA.ûOGCNADÅ=Ao$ GEGCDDy娅ôXG , else if Å=Ao$ GC ô is a generalized implicit value of an atomic probabilistic type, then define ôÓ5 Å=Ao$ GC ô . Otherwise, ô is undefined. öø÷ ë#ì…l’í î ï ð

† if ô is defined —I ¹:9 êyI undefined otherwise,

Aoôê»]ŠG“5

where ê9yvBê† are the intervals ‡ ¤ºC÷Aù;GCE&Xºy÷Aù;G°‰ such that A.f;…È[…¤FE&Û÷NCޓG@Rÿô , ùÔR„Ü…l Ý.A.fG , and ùtÞO»UŠ , if ô is defined. Note that ëSì…l’í î ï ð Aoôæ»]ŠHG is undefined, if some ùXtÞè»]Š is undefined.

For each +ŒR¦ , let ô:I be a path expression for the type of $ . If Å=Ao$ GC ôÛI is a value of a classical type, then define ôHIŸ5 AEA.ûOGCNADÅ=Ao$ GEGCDDyåÃ…ô:I…G , else if Å=Ao$ GC ôÛI is a generalized implicit value of an atomic probabilistic type, then define ô I 5 Å=Ao$ GC ô I . Otherwise, ô I is undefined. Then, ö ÷ ë#ìl’í‚î ï ð

Aoô“91»ãXô:ÂNG“5

† if ô 9 and ô  are defined ·I ¹:9 ê I undefined otherwise, 

where ê9DNBê† is the list of all intervals ‡ ¤.9=º÷v9vABŠ9|GCE&=9=º÷v9vABŠ9|G°‰ ‡ ¤µÂUºC÷yÂbABŠbÂNGCE&1“º÷y ABŠÂNG°‰ such that A.fÃI3…ȼIJ…¤µI3E&%I3÷|I3CÞ@IWG@RÇôHI , ŠIRÜ…l ݰA.fÃI°G , and Š9Dtޝ9H»“ŠÂtÞ¡Â , if ô/9 and ôm are defined. Observe that ë#셒 l í î ï ð Aoô’9]o» ã[ô:ÂDG is undefined, if some Š9ytޝ9%»UŠÂtÞ/ is undefined. The following result shows that the selection on implicit TPOB-instances correctly implements its counterpart on explicit TPOB-instances. That is, the mapping ü commutes with `  . 22

Theorem 5.3 (correctness of selection) Let ÄM5ÖA ÂUX Å1G be an implicit TPOB-instance over a TPOB-schema ¬e5Ö^A _UB  `¡9aì…ncb y=G , and let ä be a probabilistic selection condition. Then,  AoüAÄNGEGÎ5 `

üA`  AÄNGEG1

5.2 Restricted Selection In order to define the restricted selection operation on implicit TPOB-instances, it is sufficient to define restricted selection on implicit values. The restricted selection on implicit TPOB-instances is then defined in the same way as in Section 4.2. Definition 5.4 (restricted selection on implicit values) Let j be a probabilistic tuple type. Let ä be of the form ôŸtf , where ô is a path expression for j , and f is a constraint. Let Š be an implicit value of j . The restricted selection on Š with respect to ä , denoted ` ABŠG , is defined by: If ŠË5 ‡ ƒ9 ˆ Š9N…ƒSI ˆ ŠIFD…ƒS† ˆ ІN‰ , ŠI is a value of a classical type, ô 5 ƒSI , and ŠIR„ÜEl Ý.A.fG , then `  ABŠHG’5#Š . If ŠÇ5 ‡ ƒ 9mˆ Š 9 N…ƒ Iˆ Š I v…ƒS† ˆ Šb†N‰ , Š I is an implicit value of an atomic probabilistic type, and ô85ìƒSI , then `  ABŠHG’5և ƒ9 ˆ Š9DD…ƒMI ˆ ŠIBQ²v…ƒS† ˆ Šb†‰ , where

ŠI Q 5

A.fXÅOf Q …ÈÇ…¤JE&:÷GŒÒA.f Q …ÈÇ…¤JE&:÷G@R(ŠIF%ÜEl Ý.A.fÅ f Q GŒ5Á ½ ø“

If Š#5· ƒ9 ˆ Š9Dv…ƒMI ˆ ŠIJv…ƒS† ˆ ІN‰ , Š I is an implicit value of a probabilistic tuple type, and   ABŠI°CG D…ƒŒ† ˆ Šb†N‰ . ô85ìƒSI3Þé , then ` ABŠHG’5և ƒ9 ˆ Š9DD…ƒMI ˆ ` Otherwise, `  ABŠHG is undefined. The next theorem shows that the restricted selection on implicit TPOB-instances correctly implements its counterpart on explicit TPOB-instances. That is, the mapping ü commutes with `  . Theorem 5.5 (correctness of restricted selection) Let Ä be an implicit TPOB-instance over a TPOBschema ¬…ngbbyÛG . Let ä be an expression of the form ôÕtf , where ô is a path expression for all `“A dDG with dŒ« R _ , and f is a constraint. Then, `

üA` AÄNGEG1

AoüAÄNGEGÎ5

5.3 Renaming To define renaming on implicit TPOB-instances, we need to define renaming on implicit values, which is then extended to implicit TPOB-instances in the same way as in Section 4.3. Definition 5.6 (renaming on constraints) Let f be a constraint for the classical type j , and let  be a renaming condition for j . The renaming on f with respect to  , denoted ÷  A.fG , is obtained from f by replacing every value ŠI in f by ÷XABŠI°G .  Definition 5.7 (renaming on implicit values) Let  be a renaming condition of the form ô ô Q for the probabilistic tuple type j 55#Š 9  ƒH° 㠊   ƒ for all ƒYR)% . Otherwise, Š9+°*㠊 is undefined. The next result shows that the join on implicit TPOB-instances correctly implements its counterpart on explicit TPOB-instances. That is, the mapping ü commutes with = > ã . Theorem 5.10 (correctness of natural join) Let Ä9 and Ä be implicit TPOB-instances over the natural join-compatible TPOB-schemas ¬:9 and ¬/ , respectively. Let be a conjunction strategy. Then,

üAÄ9|G = > Ë ã üAÄÂNGÎ5

üAÄ9 = >

ãõÄÂNG1

5.5 Intersection, Union, and Difference To define intersection, union, and difference, we need to define the intersection, union, and difference of implicit values, which are then extended to implicit TPOB-instances in the same way as in Section 4.8. The intersection of implicit values is given by Definition 5.9, while the union and difference of implicit values is defined below. Definition 5.11 (union of implicit values) Let Š9 and Šb be either two values of the same classical type j or two implicit values of the same probabilistic type j , and let  be a disjunction strategy. The union of Š9 and Šb under  , denoted Š9* 2 ç Š , is inductively defined as follows: If j is a classical type and Š915§Šb , then Š92*ç ŠÂM5ìŠ9 .

If j is an atomic probabilistic type, then Š 9 2 ç

Š  5

A.f 9 ÅeÄ^f ]  …È 9 …¤ 9 E& 9 ÷ 9 GÕÒA.f 9 …È 9 …¤ 9 E& 9 ÷ 9 G@RŠ 9 %܅l Ý.A.f 9 ÅeÄ^f ]  G;5Á ½ ø2 A.f¨Â’ÅeÄ fM ½ ø2 ] 9y…ÈèÂ…¤²ÂE&1Â÷yÂNGÕÒA.f¨Â…ÈèÂ…¤²ÂE&1Â÷yÂNG@RŠÂ%܅l Ý.A.f¨Â’ÅeÄ fM ] 9CG;5Á AEA.ûOGCNABŠHGC…¤FE&ÛyåG;Ò ý=A.fM9y…È!9D…¤.9DE&=9D÷v9|G@RŠ9NA.f¨Â…ÈèÂ…¤²ÂE&1Â÷yÂNG@R„Šb ˆ Š!ReÜEl ݰA.f 9 Å,f  GC:‡ ¤FE&%‰¡5և ¤ 9 º÷ 9 ABŠHGCE& 9 º÷ 9 ABŠHG° ‰ Á‡ ¤  º÷  ABŠGCE&  º÷  ABŠG°‰W“

where f ] I , +%Rè , denotes the logical disjunction of all f I such that A.f I …È I …¤ I E& I ÷ I G@RŠ I . 24

If j is a probabilistic tuple type over the set of top-level attributes % then ABŠ92*ç ŠÂNGC ƒ>5#Š9 ƒQ2*ç ŠbÂv ƒ for all ƒYR)% .

and all Š9 ƒB2*ç;Šb ƒ are defined,

Otherwise, Š92*ç Š is undefined.

Definition 5.12 (difference of implicit values) Let Š9 and Šb be either two values of the same classical type j or two implicit values of the same probabilistic type j , and let  be a difference strategy. The difference of Š9 and Š under  , denoted Š9’\YX Šb , is inductively defined as follows: If j is a classical type and Š 9 5§Š  , then Š 9 \ X Š  5ìŠ 9 .

If j is an atomic probabilistic type, then Š 9 \ X Š  5

A.f 9 ÅeÄ^f ]  …È 9 …¤ 9 E& 9 ÷ 9 GÕÒA.f 9 …È 9 …¤ 9 E& 9 ÷ 9 G@RŠ 9 %܅l Ý.A.f 9 ÅeÄ^f ]  G;5Á ½ ø2 AEA.ûOGCNABŠHGC…¤FE&ÛyåG;Ò ý=A.fM9y…È!9D…¤.9DE&=9D÷v9|G@RŠ9NA.f¨Â…ÈèÂ…¤²ÂE&1Â÷yÂNG@R„Šb ˆ Š!ReÜEl ݰA.fM9Å,f¨ÂNGC:‡ ¤FE&%‰¡5և ¤W9=º÷v9vABŠHGCE&=9Ûº|÷v9DABŠHG° ‰ Á‡ ¤²ÂUº÷y ABŠGCE&@Â]º÷y ABŠG°‰W“

where fà ] I , +%Rè , denotes the logical disjunction of all fÃI such that A.fÃI3…ȼIF…¤·I3E&%I3÷|I°G@RŠI . If j is a probabilistic tuple type over the set of top-level attributes % and all Š 9  ƒ§\ X Š   ƒ are defined, then ABŠ9’\YX ŠÂNGC ƒ>5#Š9 ƒì\YX ŠbÂv ƒ for all ƒYR)% .

Otherwise, Š9’\YX Š is undefined.

The following theorem shows that the intersection, union, and difference of implicit TPOB-instances correctly implement their counterparts on explicit TPOB-instances. That is, the mapping ü commutes with °*ã , * 2 ç , and \<X , respectively. Theorem 5.13 (correctness of intersection, union, and difference) Let Ä 9 and ÄÂ be two implicit TPOB (resp.,  ,  ) be a conjunction (resp., disjunction, instances over the same TPOB-schema ¬ , and let difference) strategy. Then,

üAÄ 9 G2°

 G1

(1)

üAÄ9CGS2*ç!üAÄÂNGÑ5

üAÄ92*ç¹ÄÂDG1

(2)

üAÄ9CGU\YX!üAÄÂNGÑ5

üAÄ9’\YX¹ÄÂDG1

(3)

ã

üAÄ Â GÑ5

üAÄ 9

° ã Ä

5.6 Projection, Extraction, Cartesian Product, and Conditional Join The operations of projection, extraction, Cartesian product, and conditional join for implicit TPOB-instances are defined in exactly the same way as their counterparts for explicit TPOB-instances.

5.7 Compression Functions The implicit operations of natural join, intersection, union, and difference may generate implicit TPOBinstances that contain a large number of implicit tuples. Adopting an idea from [2], we now define compression functions through which such implicit TPOB-instances can be made more compact. Definition 5.14 (compression function) Let j be an atomic probabilistic type. A compression function _ _ for j is a function that maps every implicit value Š of j to an implicit value ABŠG of j such that (i) _ _ Ò ABŠGÒm`YÒ Š/Ò , and (ii) there exists a bijection between üABŠG and üA ABŠGEG that maps each ABŠ1N‡ ¤JE&%‰BG@R¼üABŠG to a _ pair ABŠ1N‡ ¤FE&%Qþ‰BG@R„üA ABŠGEG such that ¤]`T&1Q@`T& . _ Example 5.15 Let j be an atomic probabilistic type. The same-distribution compression function maps _ every implicit value Š of j to the implicit value ABŠG , which is obtained from Š by iteratively replacing any two distinct A.fM9D…È!9D…¤FE&Û÷GCNA.f¨Â…ÈèÂ…¤JE&:÷G@RXŠ with ÜEl Ý.AoÈ!9|GH5æÜEl Ý.AoÈèÂNG by A.fM9Æef¨Â…È!9D…¤FE&Û÷G . We now define the compression of implicit values of probabilistic types. Here, we assume that for every _3` atomic probabilistic type j , we have some compression function . 25

Definition 5.16 (compression of implicit values) Let Š be either a value of a classical type j , or an implicit _ value of a probabilistic type j . The compression of Š , denoted ABŠHG , is inductively defined as follows: If j is a classical type, then _ ABŠHG“5#Š . If j is an atomic probabilistic type, then _ ABŠG“5 _ ` ABŠG . If j is a probabilistic tuple type over the set of top-level attributes % , _ _ then ABŠHGC ƒÐ5 ABŠ1 ƒ;G for all ƒÔR!% . We finally define the compression of implicit TPOB-instances. Definition 5.17 (compression of implicit TPOB-instances) Let Ä 5dAÂUXÅ%G be a TPOB-instance over the _ TPOB-schema ¬[5ÖA^_B`¡9a>…ngbbyÛG . The compression of Ä , denoted A ÄNG , is defined as the TPOB-instance _ A ÂUX  ÅQµG over ¬ , where ÅQWAo$ G“5 DA Å=Ao$ GEG for all $_à R “^A _=G .

5.8 Preservation of Consistency and Coherence We now show that all our explicit algebraic operators defined in Section 4 preserve consistency and coherence of schemas and instances. If the input TPOB-schemas (resp., TPOB-instances) are consistent (resp., coherent), then the output TPOB-schemas (resp., TPOB-instances) are also consistent (resp., coherent). This also shows that all our implicit algebraic operators given in Section 5 preserve consistency and coherence of schemas and instances, respectively, as they correctly implement their explicit counterparts. The explicit operators of selection, restricted selection, intersection, union, and difference trivially preserve consistency of schemas, as the input TPOB-schemas coincide with the output TPOB-schemas. Projection and renaming also preserve consistency of schemas, as they only modify type assignments. The following result shows that extraction and natural join, and thus also Cartesian product and conditional join, preserve consistency of schemas. Theorem 5.18 Let ¬ be a TPOB-schema, and let + join-compatible TPOB-schemas. (a) If

¬

be a set of classes from

¬

¬ 9 and / ¬  be two . Let 

is consistent, then , - A¬=G is consistent.

(b) If : ¬ 9 and ¡ ¬  are consistent, then  ¬ 9 = > / ¬  is consistent. We now concentrate on the preservation of coherence. Recall that the coherence of a TPOB-instance ÄÛ5ØA ÂUX  Å%G over a TPOB-schema ¬5TA^_UB`¡9aì…ngby=G depends on  , _ , a , ngb , and  . The explicit algebraic operations of selection, restricted selection, intersection, union, and difference preserve coherence of instances, as they do not modify the input TPOB-schemas and they may only modify the input TPOBinstances by removing objects and changing value assignments to objects. Similarly, projection and renaming preserve coherence of instances, as they may only modify type and value assignments to classes and objects, respectively. The result below shows that natural join, and thus also Cartesian product and conditional join, preserve coherence of instances. Moreover, it shows that extraction preserves coherence of instances, when we do not remove any characteristic classes. Theorem 5.19 Let Ä , Ä9 , and Ä be TPOB-instances over the TPOB-schemas ¬ , ¬:9 , and ¬/ , respectively. Let Ä;5ÚA ÂUXÅ%G and ¬5TA^_UB`¡9aÁ…ngbbyÛG . Let + úÈ_ such that GdURÃ_ØÒd is characteristic for b[ß\àNAoGyAo$ G for some SRa+ and some $_RÓAb+èGXúc+ . Let ¬9 and ¬/ be join-compatible. (a) If Ä is coherent, then , - AÄNG is coherent. (b) If Ä9 and Ä are coherent, then Ä9 = > Ä Â is coherent.

6 Related Work There is quite extensive work in the literature on temporal databases and temporal object-oriented databases; we refer especially to the recent surveys [19, 10] and the books [23, 22].

26

Probabilistic extensions to relational databases are also well-explored in the literature; see especially [16, 7] for more background and a detailed discussion of recent work on probabilistic relational databases. Recently, more complex data models have been extended by probabilistic uncertainty in a number of papers. In particular, Eiter et al. [7] presented an approach that adds probabilistic uncertainty to complex value relational databases, while Kornatzky and Shimony [11, 12] and Eiter et al. [6] described approaches to probabilistic object-oriented databases. Our approach in this paper is a temporal extension of the model by Eiter et al. [6]. Additionally, the present paper newly introduces an implicit data model and an implicit algebra, which is shown to correctly implement its explicit counterpart, and which can be more efficiently realized. Moreover, the two operations of restricted selection and extraction are newly introduced here. Even though the areas of temporal and probabilistic databases are both well-explored, there is very little work on the integration of temporal reasoning and probabilistic databases. In particular, Dyreson and Snodgrass in their pioneering work [5] and subsequently Dekhtyar et al. [2] presented approaches to temporal indeterminacy in relational databases based on probabilistic uncertainty: d Dyreson and Snodgrass [5] extend the SQL data model and query language by probabilistic uncertainty on time points. They add indeterminate temporal attributes (which have indeterminate instants as associated values) to SQL. Indeterminate instants are intervals of time points with associated probability distributions. The SQL query language is extended by a construct to define the ordering plausibility, which is an integer between 1 and 100 that specifies to which degree the result of an SQL query should contain uncertain answers (where 1 means that any possible answer to a query is desired, while 100 says that only definite answers to a query are desired). Moreover, there is a construct to define the correlation credibility, which specifies simple modifications of the probability distributions in the base relations before evaluating the selection condition in SQL queries. Dyreson and Snodgrass also describe efficient data structures and query processing algorithms for their approach. Our work in this paper differs from theirs in several ways. First, we present an extension of object-oriented databases, while their approach is an extension of relational databases. Second, we make no independence assumptions between events (the user’s query can explicitly encode her knowledge of the dependencies between events, if any), while Dyreson and Snodgrass assume that all indeterminate events are probabilistically independent from each other. Third, our work introduces an algebra, while their work defines an SQL extension. Fourth, we present formal definitions of important notions like coherence and consistency and show that under appropriate assumptions, our operations all preserve coherence and consistency. Fifth, we allow for interval probabilities over solution sets of temporal constraints, while their work allows only for precise point probabilities over intervals of time points. d Dekhtyar et al. [2] extend the relational data model and algebra by temporal indeterminacy based on probabilities. They define a theoretical annotated temporal algebra on large annotated relations, and a temporal probabilistic algebra on succinct temporal probabilistic relations. They show that the latter efficiently and correctly implements the former. They also report on timings of the temporal probabilistic algebra in a prototype implementation. Our work in this paper, especially the idea of having an explicit algebra on large instances, which is efficiently and correctly implemented by an implicit algebra on succinct instances, is inspired by Dekhtyar et al.’s work. Our work, however, is an extension of the much richer object-oriented data model and algebra, as compared to the relational algebra. Our work may be viewed as a generalization of theirs. To our knowledge, there has been no work to date on temporal probabilistic object-oriented databases. There is other work on nonprobabilistic temporal indeterminacy in databases, which is less related to our work. In particular, Snodgrass [21] models indeterminacy using a model that is based on a three-valued logic. Dutta [4] and Dubois and Prade [3] propose a fuzzy logic approach to temporal indeterminacy, while Koubarakis [14, 13] and Brusoni et al. [1] suggest approaches based on constraints. Gadia et al. [8] introduce partial temporal databases, which are based on partial temporal elements.

27

7 Conclusions Dyreson and Snodgrass [5], followed subsequently by Dekhtyar et al. [2], have argued persuasively that there are numerous real-world applications where temporal uncertainty abounds. In this paper, we have used a simple example tracking shipments carried, for instance, by commercial carriers. Many other examples abound: stock market models making predictions of stock prices involve temporal probabilities specifying when a stock will reach a specific price. Archaeological databases containing radio-carbon dating of historical artifacts invariably involve temporal uncertainty as well. Programs tracking the behavior of parts on a factory floor and predicting when they will need to be serviced and/or replaced also involve temporal uncertainty. The fact that many of these applications also involve object models should come as no surprise. Descriptions of three dimensional historical artifacts are often stored using object models. Maintaining information about machine parts often includes design information, drawings, and manuals that are often represented with object models as well. In this paper, we have made a first attempt to deal with temporal uncertainty in object-based systems. We have provided two models. The first is an explicit model where a probability is associated with each time point. As temporal granularity gets finer and finer, this model gets more and more impractical to use. For this explicit model, we provide an algebra (e-algebra) that extends the relational algebra. To avoid the problems associated with the e-algebra, we introduce a succinct implicit algebra (i-algebra). We define operators for the i-algebra. We show that each operator in the i-algebra correctly implements the corresponding operator in the e-algebra without computing the entire explicit representation. Thus, the e-algebra operators “work” on a compact implicit represent of a much larger explicit representation. There are numerous directions for future research. Building physical cost models and cost based query optimizers for TPOBs is a major challenge that must be addressed if applications such as the package and stock market example are to scale up for heavy duty use. Building mechanisms to update such databases poses yet another challenge. Building view creation and maintenance algorithms provides a third challenge. Developing an implementation of (the implicit version of) TPOBs poses a fourth major challenge as it will provide a testbed for all the algorithms resulting from the other problems mentioned here. Acknowledgements. This work was supported by the Army Research Lab under contract number DAAL0197K0135, the Army Research Office under grant number DAAD190010484, by DARPA/RL contract number F306029910552, by the ARL CTA on Advanced Decision Architectures, by a DFG grant, and by a Marie Curie Individual Fellowship of the European Community.

Appendix A. Proofs for Section 5 For the proof of Theorem 5.3, we need the following lemma, which says that the valuation of path expressions under implicit values correctly implements the valuation of path expressions under explicit values. Here, the mapping e is extended to generalized implicit values as follows: Every generalized implicit value fhgjiKkmlYnporqsntoru ntowv3nto÷xntoyCzo{{{popkmlZ| , qa|}oru~|Howv|Ko÷|KoyCz€ is associated with the generalized explicit value eKk1fz.g‚iKk1f á ø for all classes ²‡7Ÿ ƒ . We next show C2. Consider two classes ²xntor²Ö‡4Ÿ ƒ such that ²xnZ¡ ƒ Ÿ implies e ƒ k†²tzÌgÁ Ö ¡ ËËËR¡ }| exists such that n gA² n , K|gA² Ö , and Ö o{{{xo }| n ‡7ŸBö Ÿ ƒ . ² Ö . That is, some path n ¡ î As eHk Rn‰z ŽeKk }ÖNz AËËË QeHk }| z , it thus follows eKk†²xn‰z QeHk†²Öpz . We now prove that C3 holds. Let ²xntor²Ö‡4Ÿ$ƒ be two distinct classes that belong to the same cluster ƒ ‡Ã¢a£ ƒ k†²tz for some ²‡Ÿ ƒ . That is, there exists a cluster އ¢£}k†²z such that, for ° ‡ai}ÀMoÓK€ , either ² È belongs to or ² È is a proper subclass of a class in . As C2 and C3 hold for e , it thus follows that eKk†²xn‰zðseKk†²ÖpzCg>ø . This shows that C3 holds. We finally prove C4. Consider two classes ²pntor²Öñ‡ Ÿ$ƒ such that ²xnÌ¡šƒ²Ö . That is, some path RnÌ¡ }֌¡ ËËË¡ }| exists È:| É î n n ¤Äk È o È n‰z . such that Rnˆgزxn , }|!gزÖ , and KÖxo{{{xo }| n ‡BŸBö Ÿ ƒ . Moreover, it holds ¤ ƒ k†²xntor²ÖpzÐg î As C4 holds for e , it follows that … eHk È z…œg ¤Äk È o È n zËx… eKk È n z… for all ° ‡6i}ÀMo{{{NoßaöQÀN€ . This shows È:| É î n n ¤Äk È o È n‰zËN… eKk†²Öxz… , that is, … etƒmk†²xn‰z… g‚¤ƒ†k†²xntor²ÖpzËN… etƒmk†²Öpz… . This proves C4. that … eHk†²xnz…g ž ž ž ž Ö3gúk–Ÿor o¡Aor¢a£o¤3z . Let enÝ (b) Let n$gúk–Ÿnor Snto¡ nor¢a£no¤n‰z , Ö3gúk–ŸRÖ ,  ÖMo¡šÖNor¢a£pÖNo¤Öpz , and n ž ž Ÿn(ý Óÿ and exÖ6ÝRŸÖ^ýÓÿ be models of n and Ö , respectively. Let the mapping esÝRŸÑýÛÓÿ , where n Ö , be defined as follows: ŸBg Ÿ n Ÿ Ö and ’g

            



                       eHk†²zšg

enxk†²xn‰z







     





exÖk†²‰ÖNzo for all ²Yg’k†²xntor²Öpz?‡sŸ .

ž We now show that e is a model of . We first prove C1. Since enxk†²xn‰z gh á ø for all classes ²xn)‡cŸ$n and epÖ k†²Öpzñg á ø for all classes ²Ös‡ÃŸRÖ , we get eHk†²tzsgq á ø for all classes ²s‡ÑŸ . We next show C2 and C4. Let ²Œghk†²xntor²Öxzo Bgjk "nto }Öpz‡ Ÿ with ²Œ¡ . Without loss of generality, we can assume that ²xnÌ¡ n "n and ž Ö . Since e n is a model of n , it holds that e n k†² n z Æe n k n z and … e n k†² n z…Äg ¤ n k†² n o n z?Ë… e n k n z… . ² Ö g eKk z and … eKk†²tz…Rg³¤Äk†²No zÄËR… eKk Hz… . We finally prove C3. Let ²No B‡BŸ Hence, it immediately follows eKk†²tz be two distinct classes that belong to the same cluster ‡ 𢐣}k–ŸSz . Without loss of generality, we can ¢a£Mnpk–Ÿ$n‰z and that ²ÖÌg KÖ . Since e n is a model assume that ²xnto Rn‡*Ÿ$n belong to the same cluster Yn‡ ž of n , it holds that e nxk†²pn‰z3ð4enpk RnzÄgÁø . Thus, eKk†²tzSð4eHk Hz.g>ø . ˜ ž Proof of Theorem 5.19. (a) Let ™Bg k†›œo z and ûCü