Technical Report 29-01, December 2001. Dipartimento di Informatica e Sistemistica, Universit`a di Roma “La Sapienza”, ITALY. URL = http://www.dis.uniroma1.it/pub/AI/papers/tr-29-01.pdf
Temporal Probabilistic Object Bases Veronica Biazzo
Rosalba Giugno
Thomas Lukasiewicz
V.S. Subrahmanian
Abstract There are numerous applications where we know that a certain event occurred during some time period, but we do not know exactly when that event occurred. Dyreson and Snodgrass have shown how this kind of temporal uncertainty can be handled in relational databases. In this paper, we propose two data models to handle temporal indeterminacy in object bases. The first model, which we call the explicit model, provides an extension of the relational algebra that explicitly considers all possibilities. This makes defining algebraic operations easy, but makes their implementation quite inefficient. The second model, which we call the implicit model, overcomes these deficiencies by proposing the intelligent use of constraints. This causes the model to be succinct. We also propose an implicit algebra on the implicit representation. We show that each implicit algebra operation precisely captures its explicit counterpart.
1 Introduction There are numerous applications involving temporal indeterminacy. For instance, consider a commercial package delivery company (examples of companies in this broad class include UPS, Fedex, DHL, and many others). Such a company has detailed statistical information on how long packages take to get from one zip code to another, and often even more specific information (e.g., how long it takes for a package from one street address to another). A company expecting deliveries would like to have some statistical information about when the deliveries will arrive (an answer of the form “There is a 10 - 20% probability of the package being delivered between 9 am and 1pm, and a 80 - 90% probability of being delivered between 1pm and 5pm”) is far more helpful to the company’s decision making processes than the bland answer given today (“It will be delivered sometime today between 9 am and 5pm”). Temporal indeterminacy also arises in many other situations. Dyreson and Snodgrass [5] have identified numerous other applications where temporal indeterminacy is important. For example, radio-carbon dating efforts in archaeology are temporally indeterminate — a historical relic may be dated as “sometime between 500 and 400 BC.” Likewise, timeseries prediction programs are also uncertain about when certain events will occur. There are literally hundreds of stock market prediction programs containing models of when stocks are expected to reach certain prices. When the results of such programs are stored in databases and subjected to querying, the need to handle temporal indeterminacy is even more acute. In this paper, we propose for the first time, a formal theoretical foundation for object bases containing temporal indeterminacy. As probabilities are the best known method for handling uncertain information, our model for indeterminacy (like that of Dyreson and Snodgrass [5]) is probabilistic. The organization and contributions of this paper are as follows. Dipartimento di Matematica e Informatica, Universit`a di Catania, Viale A. Doria 6, 95125 Catania, Italy. E-mail: vbiazzo, giugno @dmi.unict.it. Dipartimento di Informatica e Sistemistica, Universit`a di Roma “La Sapienza”, Via Salaria 113, 00198 Roma, Italy. E-mail:
[email protected]. Institute for Advanced Computer Studies, Institute for Systems Research and Department of Computer Science, University of Maryland, College Park, Maryland 20742. E-mail:
[email protected].
1
Package
d 0.5 Letter
0.3
d 0.2
Box 0.4
0.3 Priority
Tube
0.2
0.3
One_transfer
0.6 Express_saves 0.2
Two_transfer
Figure 1: Package Example with probability assignment
In Section 2, we introduce some basic definitions in probability theory and temporal databases. An important definition introduced here is that of explicit values and implicit values — the latter are succinct representations of the former. In Section 3, we define the concept of a temporal probabilistic object base (TPOB for short). We define the important concept of an explicit TPOB-instance and an implicit TPOB-instance with the latter being a succinct representation of the former. In Section 4, we describe an explicit TPOB-algebra (e-algebra for short) that operates on explicit TPOB-instances. The advantage of the e-algebra is that using it, it is relatively intuitive to define the operations. However, it does not have an efficient implementation. In Section 5, we define an implicit TPOB-algebra (i-algebra for short) that operates on implicit TPOBinstances. We show that the i-algebra correctly implements the e-algebra (in other words, the answers produced by the i-algebra operations succinctly represent the answers produced by the corresponding e-algebra operations). As the i-algebra works on succinct representations, it has better computational properties. In Section 6, we compare our work with related work on temporal indeterminacy.
Section 7 contains directions for future work and concludes the paper.
The key contributions of this paper are: (1) the definition of implicit TPOB-instances, (2) the definition of the implicit algebra operations, and (3) the results stating that these implicit algebra operations are correct implementations of the explicit algebra operations. The importance of (3) cannot be overemphasized — for example, a simple statement such as “Package P will be delivered sometime between 9 am and 5 pm today” expands out into 8 explicit statements if our chronon (i.e., smallest temporal granularity used) is hour, 480 explicit statements if our chronon is minute, and 28,800 explicit statements if our chronon is a second. Thus, a single statement in the implicit algebra can capture huge amounts of explicit data in a succinct fashion. The ability to correctly manipulate this implicit data is critical to the efficiency of TPOB systems. Our work builds directly on top of pioneering work by Dyreson and Snodgrass [5]. Our work differs from theirs in several ways. First, it applies to object bases, while theirs applies to relational databases. Second, we make no independence assumptions between events — the user’s query can explicitly encode her knowledge of the dependencies between events, if any. Third, our work introduces an algebra, while their work defines an SQL extension. Fourth, we present formal definitions of important notions like coherence and consistency and show that under appropriate assumptions, all our operations preserve coherence and consistency.
2
2 Basic Definitions In this section, we recapitulate some basic definitions. In particular, we recall the notion of a calendar. We then define classical types and their values, and introduce probabilistic types and their explicit and implicit values. Finally, we describe the concept of a probabilistic strategy.
2.1 Calendars We now recapitulate the concept of a calendar due to Kraus et al. [15]. A calendar consists of a linear temporal hierarchy of time units and a validity predicate specifying valid time points. Definition 2.1 (time unit) A time unit consists of a name and a time-value set. For example, the time unit named has the time-value set . Definition 2.2 (linear temporal hierarchy) A linear temporal hierarchy is a finite set of distinct time units with a linear order among them. For example, !#"%$&%'(*),+.-/&1032 is a linear temporal hierarchy. Definition 2.3 (time point) Let 46587:9;7@? be a linear temporal hierarchy. A time point over 4 is a tuple AB0C9DDE0F?HG , where each 0JI is a time-value in the time-value set of 7@I . We use KML to denote the usual lexicographic order on all time points over 4 , which is defined by AB09NE0F?GOKMLPAB0JQ 9 DE0FQ? G iff some +SRTNE-U exists such that 0WVX5Y0 V Q for all Z[RTNE+]\^ and 0FI_K>0 QI . We use `ML to denote the reflexive closure of KML . For example, A3Daba cDNG is a time point over 4d5#H2D'(*)e$-/03",^ . Definition 2.4 (calendar) A calendar f consists of a linear temporal hierarchy 4 and a validity predicate. The validity predicate specifies a non-empty set of valid time points over 4 . A calendar is finite if the set of all its valid time points is finite. In the rest of this paper, all calendars are assumed to be finite unless specified otherwise. Intuitively, A3Daba cDNG and A3Daba cNG are time points over 4 5gH2D'*h)e$-/0E"Yi . The validity predicate may now characterize the former as valid and the latter as invalid. The reader interested in how to specify validity predicates may consult [15].
2.2 Types and Values This section introduces types, and values associated with types. It is divided into three parts — classical types and values, probabilistic types, and explicit and implicit values of probabilistic types. 2.2.1 Classical Types and Values Every classical type j is associated with a domain, denoted kml n[Aoj%G , which specifies the set of values of j . In this paper, we assume that pq5 . An explicit value of a probabilistic type 9 j9v
S j is of the form 9 9N
b , where 9NEb are either values or explicit values of j9N
j . Example 2.11 An example of an explicit value of the atomic probabilistic type uCrtSw| , where urSw is the calendar over the linear temporal hierarchy "%$&%'¨[),+.-/&1032 , is AEA3D¥GCN£©HªvBG|NAA3D « G , £©HªvBG . An explicit value of the atomic probabilistic type {EuCz.rtsNx is A3¬Uw| ®/~ z°¯ , £©BGCNA²±³v{ rsNxbu3~smN«ªvBG . Let be an attribute and !5´AB 9 N ¤ 9 E& 9 BGCvNAB ? N ¤ ? E& ? BG be an explicit value of an atomic probabilistic type. Then, “ ” intuitively says that “The probability that has the value I lies in the interval ¤µIJE&1I² ”. We assume that the events “ I ” are exhaustive and pairwise mutually exclusive, which implies the following notion of consistency for explicit values of probabilistic types. Definition 2.12 (compactness and consistency) An explicit value #5¶AB9DN ¤.9DE&=9EBGCvNAB?@N ¤·?@E&%?BG of an atomic probabilistic type is compact iff 9NE? are pairwise distinct. We say is consistent iff ? ? is compact and ¸ I·¹:9 ¤·I`P[`q¸ I·¹:9 &%I . An explicit value of a probabilistic type is compact (resp., consistent) iff all contained explicit values of atomic probabilistic types are compact (resp., consistent). 2.2.4 Implicit Values of Probabilistic Types We now introduce implicit values of probabilistic types. These are implicit representations of explicit values — we generalize a constraint-based approach due to Dekhtyar et al. [2]. 4
Temporal and Data Constraints. Using a temporal constraint, we can implicitly define a set of valid time points (namely the solutions of that constraint) w.r.t. a given calendar. In contrast, a data constraint specifies a set of data values from a totally ordered domain. We now define the syntax of temporal and data constraints. Definition 2.13 (temporal constraint) Let j be a calendar with linear temporal hierarchy 4h5^79¨8ºººm 7@? . An atomic temporal constraint for j has one of the following forms: (7¡I»UI ) where » belongs to `DKC5¼ 5( ½ D¾D¿; and I is a time-value in the time-value set of time unit 7@I . We call (7¡Ib»UI ) an atomic time-value constraint. (0C9_ÀÁ03 ) where 09DE03ÂR[kml n[Aoj%G and 09S`SL*03 . We call (09ÀÁ03 ) an atomic time-interval constraint. We use (0C9 ) to abbreviate (09ÃÀ#0C9 ). A temporal constraint for j is a Boolean combination of atomic temporal constraints for j (that is, constructed from atomic temporal constraints by using the Boolean operators Ä , Å , and Æ ). Definition 2.14 (data constraint) Let j be a classical type with totally ordered domain kHl nÇAoj%G . An atomic data constraint for j is either of the form AoÈ^»G , where »R¼`DKC5( 5( ½ D¾D¿ and ;RkHl n[Aoj%G , or of the form AB9³ÀÉbÂDG , where 9DEÂ]RkHl neAoj%G with 9Ê`hb . We use (9 ) to abbreviate (9ËÀÉ9 ). A data constraint for j is a Boolean combination of atomic data constraints for j . We now define the semantics of temporal and data constraints, that is, the set of time points and data values, respectively, that they specify. Definition 2.15 (solution to a temporal constraint) Let j be a calendar with linear temporal hierarchy 4 5Ì7:9#ͺººÎ7@? . A time point ÏÐ5 AWÏ 9yvÏD?HG^RPkml n[Aoj%G is a solution to an atomic temporal constraint A²7¡Ib»UI.G (resp., AB09TÀÑ0FÂDG ), denoted ÏYÒ 5 A²7@I»]I.G (resp., ÏÓÒ 5 AB0C9ÔÀÑ03ÂNG ), iff ÏNI»UI (resp., 09Õ`ML^Ï`ML¦0F ). We inductively extend the notion of solutions to all temporal constraints by: Ï¼Ò 5>ÄfM9 iff it is not the case that Ï(Ò 5ÐfM9 , Ï¼Ò 5ÖA.fM9Åef¨ÂDG iff Ï(Ò 5>fM9 and ÏXÒ 5>f¨Â , Ï¼Ò 5ÖA.fM9Æef¨ÂDG iff Ï(Ò 5>fM9 or Ï(Ò 5>f¨Â . Definition 2.16 (solution to a data constraint) Let j be any classical type with totally ordered domain kHl neAoj%G . A value ÏÔRYkHl n[Aoj%G is a solution to AoÈ#»UG (resp., AB9,À×bÂNG ), denoted ÏØÒ 5ÙAoÈ#»]G (resp., Ï§Ò 5dAB 9 ÀÚ Â G ), iff ÏÛ» (resp., 9 ` . The most widely used pdf is the uniform distribution: The uniform distribution over Þ , denoted å ä , is the function å ä Þæàç ¥D| defined by åèAWÏG]5ÖÛé:- for all ÏR§Þ . The following are some other standard probability distribution functions. Here, we additionally assume that Þ is totally ordered by ÏDIUK*Ï|V iff +ÃKêZ for all +
°Z!RËNE-U : The geometric distribution over Þ for ¥êKìëìKí , denoted î ä ïtð , is the function î äbïtð ÞÐà ¥D| I defined by î ä ïð AWÏDI.G5ÊëXºNA3Õ\ë¡G for all ÏNIR[Þ . The binomial distribution over Þ for ¥TKYëÐKd , denoted ñ ä ïð , is the function ñ ä ïtð ÞÖà ¥D| ? I ?òmI defined by ñ äbïtð AWÏ I G]5ÖA I G1º.ë ºNA3M\ë¡G for all Ï I R§Þ . The Poisson distribution over Þ for ó¦¾Ö¥ , denoted ô äbïBõ , is the function ô ä ï²õ ÞÁà ¥D| defined ò õ I by ô ä ï²õ AWÏNI°G]5Á2 ºCó é:+mö for all ÏNIR§Þ . 5
Implicit Values of Probabilistic Types. In order to define implicit values of probabilistic types, we start by defining implicit tuples — a concept borrowed from [2]. Definition 2.19 (implicit tuple) Let j be either a calendar or a classical type with a totally ordered domain. An implicit tuple (or i-tuple) for j is a 5-tuple A.f
ÈÇ
¤FE&Û÷G , where f
È are constraints for j with øÊù ÜEl Ý.A.fGÇú×ÜEl ݰAoÈ,G , ¤FE& are reals with ¥#`פ!`P&´`6 , and ÷ is a distribution function over ÜEl ݰAoÈ,G . If ÜEl Ý.A.fG5ÁÜEl Ý.AoÈG , we use A.ûOG to abbreviate f . We next give a formal definition of implicit values of probabilistic types. Definition 2.20 (implicit values of probabilistic types) We define implicit values of probabilistic types by induction as follows: An implicit value of an atomic probabilistic type j is a finite set of implicit tuples for j . An implicit value of a probabilistic type 9 j9v
S j is of the form 9 9N
b , where 9NEb are either values or implicit values of j9N
j . Example 2.21 An implicit value of the atomic probabilistic type £urSw| , where urSw is the calendar over the linear temporal hierarchy "m$&%'¨[),+.-/&1032 , is O5ÖAEA.ûOGCNAEA3D¥GÀ´A3D©GEGC
¥«DyåG . Every implicit value of an atomic probabilistic type £j has an equivalent explicit value üABHG , which is defined by üABHG58ABQoN ¤º÷ABQ²GCE&ºC÷ABQ²G°BGSÒbý/A.f
ÈÇ
¤JE&:÷G@R( QbRÜEl ݰA.fXG . Example 2.22 Let us reconsider the implicit value of Example 2.21. Its explicit value üABG is given by AEA3D¥GCN·vBG|NAEAEDvG , ·vBGCNAEA3Db GCNþ·DBGCAEA3NCb G|Nþ·NvoGCNA
A3D©GyNµtvBGC . It is now easy to extend the notions of compactness and consistency to implicit values. Definition 2.23 (compactness and consistency) An implicit value O58A.fM9D
È!9D
¤.9DE&=9D÷v9|GCNNA.fÃ?@
ȼ? , ¤·?%E&1?%÷|?mG of an atomic probabilistic type is compact iff ÜEl ݰA.fÃIŦfUVG#5Î ½ ø for all +
°Z -,G"! ,$0/21H JI9-,GC!#JIB! ,$ "! $& E@K , "! , $-/J1; L.M# % , /B"=@?AON"! QP ! , /$
Table 2: Disjunction strategies Mutual exclusion Positive correlation Independence Ignorance
" ! $ R '*) , "! , $0/213 = ?SON QP , /=@?AONG! QP ! , /$ " ! $ R7698: , "! , $0/:1; 5i^A _UB `¡9 aÁ
ncbby=G of Example 3.6 and the TPOB-instance Äe5ÉAÂUXÅ%G over ¬ of Example 3.8. The assignments ` Q and Å Q obtained by projecting ¬ and Ä on %q5Y;zWr xrts
Õrtw are shown in Tables 10 and 11, respectively.
w
` Q resulting from projection Table 10: ¡ x w y5z { |Bz[}G~ ~ ~ D
V A.~ y 0 ~ z ~ ~ z ~ - V" Q¢ z ~ - [O
/ }
AZO
}
AZO
}
AZO
}
AZO
}
AZO
}
AZO
}
AZO
}
AZO
} $ } $ } $ } $ }
} }
}
~
# %
~
~
$ $@$ ~
~
~
# %
# %
# %
$ $@$ ~
~
$ $@$ $ $@$
4.5 Extraction To our knowledge, the extraction operation is one that has never been defined in probabilistic databases. This operation is unique to object bases because it allows for classes to be selected (and hence for other classes to be dropped) from the class hierarchy of a TPOB-schema. 15
Table 11: Å Q resulting from projection Ñ Î / } ~ ~ Ì Í 0
CSÒ\5
Ó54G4o/V
Ô ÕoØZXÔØ[$0/9Ó5XN94o/V
Ô Õ59Ô Ø $-/ $ } y¾z ~ Ì Í 0
C B
ON[Ú44/B9@Ô ×5XÔ Õ.$0/9ON[Ú4oØG/B9@Ô Õ¾XÔÞÛV$0/ $
Î Î Ï Î Ð
That is, the extraction operation removes classes from the class hierarchy and all objects in the dropped classes. We first define the extraction operation on TPOB-schemas. Definition 4.24 (extraction on TPOB-schemas) Let ¬5¦A^_B`¡9aÁ
ngb yÛG be a TPOB-schema, and let + be a subset of _ . The extraction on ¬ with respect to + , denoted ,.-MA¬ÛG , is defined as the TPOB-schema ¬@Qm5Ö^A _¡QBB `¡Qo9aÔQo
ngbQoy¡QG , where _¡Qm5/+ .
¡ ` Q is the restriction of ` to _¡Q .
aÔQ is a binary relation on _@Q such that for each dv9DBdyÂR¹_¡Q , it holds dv9(aÔQdy iff there exists some path H( 9 a6¯aѺººaÉ such that H9¨5 dN9 , ;5 d| , and ÂN
ò/9 R®_,\õ_ Q . ngb Q AdDG¨5ÖAEA10 °¹_ Q G324065 G58 ½ øÒ0qRngbAdDG , where 065 5ÖNH9Râ_ Q ÒH9³aÙ¤a纺º#a for some  v
ò/9 R®_,\õ_ Q %XR70×\õ_ Q , for all dURT_ Q . For all dv9DBdyÂR®_ Q , we define Q Adv9DBd|ÂG]58 ò/9 ]AoIF
I:9:9|G , where the I ’s are such that H9(aɯa I·¹:9 ºº\ º a¶ , H9¨5 dv9 , 5 dy , and Âv
ò/9 R®_,\õ_@Q . The following example illustrates the use of the extraction operator on TPOB-schemas. Example 4.25 Consider the fully inherited TPOB-schema ¬Ö5çA^_B`¡9aÁ
ngb yÛG of Examples 3.6. The extraction on ¬ with respect to the set of classes +´5Ymop¯Cvx w.qwyuFuFwzFD:zWr ~zWr uoHC;sw -u|z²sN{vWwz. is given by the TPOB-schema , - A¬=G5´A^_ Q B` Q 9a Q
ngb Q y Q G , where: _ Q 5Ymp¯|Nx w.qwyuJu3wz3DÛzWr ~ zWr uoHC;svw -uCz²sv{vowz° .
Q is given in Table 12. `
^A _ Q 9 a Q G and Q are given in Figure 2. g n bQWA3mop¯Cvx wG5Yb5qwyuFu3wzJCsvw -u|z²sN{vWwz. /:zWr ~ zWr uoHb , ngbQ.A3Ûz.r ~ z.r uoG58bsvw -u|z²sN{vWwz.b , and ngb Q AqwyuFu3wz.G]5Ángb Q A.;svw -uCz²sv{vowzWG]5Áø .
w
x y5z { |Bz[}~ ~ y
D
~
^ ~ z ~ - [O
w
Table 12: Type assignment ` Q resulting from extraction
/ } }
AZO
} }
AZO
} }
AZO
} }
AZO
~. z ZZ
~
#
z
"%
~
"%
z
z
}
% 5SZ"0
S
% 5SZ"0
}
~G
}
S
~G
~
~G
~
~
~
@QS %
@QS %
~
$ $ $
~ }Z $ $S ~ # "%
% 5SZ"0
S @QS %
$ $
~ z } ~ ~ G ~ ~ }Z # "%
% 5SZ"0
S @QS %
$ $S ~ ~ } ~ ~
S %
$ $\A A"0
5 W^ S %
$ $¡ #
~
~
Z # %
z ~
.
Z
z
$
$ $ $
z z Z . Z ~ ~ 9 S %
$ $ $
,
Our next step is to define the extraction operation on TPOB-instances. Definition 4.26 (extraction on TPOB-instances) Let ÄN5¼AÂUXÅ%G be a TPOB-instance over a TPOB-schema ¬Ð5¶^A _B `¡9a>
ngbbyÛG , and let + be a set of classes from _ . The extraction on Ä with respect to + , denoted , - A ÄNG , is the TPOB-instance A ¡QoX ÅQ·G over the TPOB-schema , - A¬ÛG5ÖA^_¡QBB`¡Qo9aÔQo
ngbQoy¡QG , where: ¡Q is the restriction of  to _@Q .
Å Q is the restriction of Å to ¡QWA^_@QµG .
The following example illustrates the application of the extraction operator on TPOB-instances.
16
Package 0.3 Priority
d 0.5
0.06
Letter
0.2
One_transfer
Figure 2: Class hierarchy and probability assignment resulting from extraction
Example 4.27 Consider the fully inherited TPOB-schema ¬>5i^A _UB `¡9 aÁ
ncbby=G of Example 3.6 and the TPOB-instance ÄÕ5ÖAÂUXÅ%G over ¬ of Example 3.8. The extraction on Ä with respect to the set of classes +Ó5 m p¯|Nx w , qwyuFuFwz , :zWr ~zWr uo , svw -u|z²sN{vWwz. results in a TPOB-instance A¡QoXÅQG over the TPOB-schema , - A¬ÛG in Example 4.25, where  Q is given by Table 13 and Å Q is given by Å Q Ao$iDG5 Å=Ao$iNG .  Q and A¡Q·G Table 13: ¡ w
resulting from extraction Ê w Ê Ë w /
y5z {.|z[}~
ÌGÍ
~
ÌGÍ ÌVÎ Ï Í ÌGÍ
~ D 0 ^ ~ z ~ - [O y
/ / Ì[Î Ï Í ÌÍ Ì[Î Ï Í ÌÍ
4.6 Natural Join The operations we have presented thus far access only one TPOB-instance at a time. We now define the important concept of a natural join. Recall that for each class d;R®_ of a TPOB-schema ¬Ç5ÖA^_UB`¡9aì
ncb y=G , the type `A dDG is a probabilistic tuple type over ¬ . Moreover, each oid $ØR ÂA dDG that occurs in a TPOBinstance Ä5qA ÂUX Å%G over ¬ is associated with a value of the probabilistic tuple type `A dDG . Each such $ may be written as the list of the values (possibly complex values) for the top-level attributes 9 N
_19 and d|Â]RT_Â , the common top-level attributes of `/9NA dv9|G and `1Â A d|ÂNG are associated with the same types in `=9DAdv9|G and `%Â AdyÂNG . We may now define the natural join of two natural-join-compatible TPOB-schemas. Definition 4.29 (natural join of TPOB-schemas) Let ¬9M5qA^_19B`/9D9a#9D
ngb9Dy:9CG and ¬/Â5 A^_HÂB`%Â9aT , n b  y  G be two natural-join-compatible TPOB-schemas. The natural join of ¬ 9 and ¬  , denoted ¬ 9.= > ¬  , g n bby=G , where is the TPOB-schema ¬e5Ö^A _UB `¡9 a>
g _[5È_19¶ _Â .
For all d;5ÚAdv9DBdyÂNGSR_ , the probabilistic tuple type `AdG5Ú 9 j9N
( jþ contains exactly all SI jyI that belong to either the type `/9vAdv9|G or the type `%Â AdyÂNG . The directed acyclic graph A^_9a¦G is defined as follows. For all Ë d 5ÑAdv9DBdyÂNGC
#5ÑAoH9D
ÂNGÇR d¯aÉ iff Ad 9 a 9 9 Å d  5ì  G or Ad 9 5ì 9 Å®d  a   GC
The partitioning ncb is given as follows. For all , d 5¶AdN9DBdyÂNG!R n b9vAdv9|G?2ÇbGdv9:¶³k_ÂÒok_ÂRengbDÂbAdyÂNG g
17
_
: ngbAdDG¦5
_
.kS9J¶OGd|ÂÔÒJkS9[R
:
(Package,Package)
d
d 0.2
0.3 (Priority,Package)
(Tube,Package) 0.3 d
0.2 d
(One_transfer,Package)
0.5
0.5
(Tube,Letter)
d
0.3
(Priority,Letter) 0.2
0.5 (One_transfer,Letter)
Figure 3: Natural join of schemas
The probability assignment is defined as follows. For all dM5ÖAdv9DBd|ÂNGïaÉ(5ÖAom9D
ÂNG : @ :9NA dv9D
H9|G if d|ÂM5ÁÂ A d
G5 ¡ÂbA dyÂ
ÂNG if dN9¨5ÁH9 . The following example illustrates the natural join of TPOB-schemas via the Package Example. Example 4.30 Let ¬9 and ¬¡Â be the TPOB-schemas from Examples 3.6 and 4.25, respectively. Then, ¬:9 = > ¬/ is the TPOB-schema ¬[5Ö^A _UB`¡9aÁ
ncbby=G partially shown in Table 14 and Figure 3. Table 14: Type assignment ` resulting from natural join w
x y z { |Bz[}G~ 5 ¾ y z { |Bz[}G~ / Ao.~ y5z { B | z[}~ / y5z . { |z[}~ 0 / ~ z ~ ~ ~ ( - V" D D ^/
y
w / }
} ~. ~.
}
}
}
C5O
}
C5O
z Z $ }
C5O
}
C5O
z Z
~ z } ~ ~ G ~ S "%
% 5SO
S @¿S %
$ $@$ ~ z } ~G ~ ~ ~ }Z z z " %
% 5SO
S S @ ¿S %
$ $C oZZ â G Z ~
~ Z } S "%
% 5SO
S @¿S %
$ $C ~ ~ } ~ ~
# %
$ $\A ^¿ZO
W0 # %
$ $%¡ S
~
~
"%
z
z
% 5SO
}
}
S
~G ~G
~
~
@¿S %
~
~
$ $
S %
,
~
$ $ $ z oZZ â G Z ~ ~ V S %
$ $@$ z
,
To define the natural join of TPOB-instances, we first need some preliminary definitions. These include the concept of intersection of two explicit values 9 and b of the same type j , and the natural join of two explicit values 9 and  of two probabilistic tuple types j 9 and j  , respectively. Definition 4.31 (intersection of explicit values) Let 9 and  be either two values of the same classical type j , or two explicit values of the same probabilistic type j . Let be a conjunction strategy. The 9 °*ã b , is inductively defined by: intersection of 9 and  under , denoted + If j is a classical type and 915§b , then + 9 °*ã Â=5§9 . If j is an atomic probabilistic type and ù 5Á ½ ø , then 9 ° ã  Á 5 ù , where ù#5ÁAB%Bê9
êÂNGÒHAB%Bê9|G_Re9:AB1BêÂNG_R,bÂv
If j is a probabilistic tuple type over the set of top-level attributes % then AB 9 ° ã Â GC >5# 9 H° ã Â for all YR)% . 18
and all 9 â°*ã;b are defined,
Otherwise, 9+°*ã Â is undefined. The following example illustrates the above concept. Example 4.32 Let urSw be the standard calendar with respect to the linear temporal hierarchy "%$v&1'Á ),+.-/&%0E2 , and let be a conjunction strategy. Consider the values 9 5Ö Â 5 A3D¥G and i 5 A3DE© ¥G of the temporal atomic type urSw . Then, + 9 °*ã bÂ5ì9 , while + 9 °*ã i is undefined. Consider the following explicit values of the atomic probabilistic type uCrtSw| :
9¨58AEAWa
¥b¥GCN«vBGCvAEA3N¥¥b¥GCNþDªvWGCNA
A3
¥ ¥G|ND«vBGC ÛbÂM58AEA3D
¥b¥GCN£©BGà iM58AEAWa
¥b¥GCN£©BGCvAEA3¥b¥GCNþDªvWGCNA
A3N
¥ ¥G|ND£©BGC Then, 9
 is undefined, while 9
° ã @è
° ã
è
i
58AEAWa
¥b¥GCN ¥ avBGyNAEAEbb
¥b¥GCv ¥ªDvBGC .
We now come to our second preliminary definition — that of a natural join of two explicit values. Definition 4.33 (natural join of explicit values) Let 9 and b be explicit values of probabilistic tuple types j9 and j , respectively. Let be a conjunction strategy. Let %[9 and %  be the top-level attributes of j 9 and j  , respectively, and let %í5A% 9 °7%  . Let all ÔR!% have the same types in j 9 and j  . The natural join of 9 and b under , denoted 9 = > ã§b , is defined as follows: AB9 = > ã³bÂDGC Ð5ìI3 for all ÔR!% I\B% , +%R¼ , AB9 = > ãËbÂNGC Ð5ì9 = > ã³b for all ÔR!% . > ã  with ÊR!% are defined, then 9.= Z > ã  is defined. If all 9 = Z We are now ready to define the natural join of two TPOB-instances. Definition 4.34 (natural join of TPOB-instances) Let Ä9Ç5áAÂ=9yXÅ9|G and ÄÂæ5áAÂ%ÂXÅÂNG be two TPOBinstances over the natural-join-compatible TPOB-schemas ¬:9@5¦A^_19B`/9D9a#9D
ngb9Dy:9CG and ¬/Â,5¶A^_ÂB`1Â , aØÂ
g n bNÂy¡ÂNG , respectively. For +XR# , let % I denote the set of top-level attributes of ¬¡I . Let be a conjunction strategy. The natural join of Ä9 and Ä under , denoted Ä9 = > ã Ä , is defined as the TPOBinstance ÄM5ÖA ÂUX Å%G over the TPOB-schema ¬Ç5 ¬ 9C= > ¬  , where: ÂA dDG58Ao$9N
$ÂNG¨â R Â=9DA dv9|Y G ¶Â%ÂbA dyÂNGÕÒ Å9vAo$9|G = > ã»Å Ao$ÂDG is defined , for all dÕ5ÖA dv9DB d|ÂG¨® R _1¯ 9 ¶«_ . Å=Ao$ G5 Å9NAo$9|G = > ã»ÅÂbAo$ÂNG , for all $5ÖAo$9D
$ÂNG_R«ÂA^_@9¯¶®_HÂNG . Example 4.35 Let ¬:9 and ¬/ be the TPOB-schemas given in Example 3.6 and produced in Example 4.25, respectively. Let Ä9 and Ä be the TPOB-instances over ¬9 and ¬/ produced in Examples 4.11 and 4.27, respectively. Then, Ä9 = > ã ED Ä is the TPOB-instance ÄÐ5 AÂUXÅ%G over ¬:9 = > ¬/ , where  is given by ÂAEA3:zWr ~zWr uomD:zWr ~ zWr uoGEG58Ao$i
$iNG and ÂAdDG5Áø for all other classes d , and Å is given by Table 15. Table 15: Î
Ñ Î Ï
Î Ï /
Å
resulting from natural join
Î
/ } ~ ~ z ~. ~ Ì Í
ASÒ\Z # O%
% ZC G"OZC A ""N9ÓZ%4G4o/9@Ô Õ5XNX$-/B9ON9Ó5×5NV/V
Ô ×Z9NB$0/ ~ Ì Í
Ó54G4o/V
Ô ÕoØZXNB$0/V"^ÓZXNV4/B9@Ô Õ¾XNB$0/ $
4.7 Cartesian Product and Conditional Join In the above definition of natural join, if the sets %)F and %4G are disjoint, then the natural join is called Cartesian product and denoted by the symbol ¶ . The following condition describes when two TPOBschemas ¬:9 and ¬¡Â can be combined using Cartesian product. Definition 4.36 (Cartesian-product-compatible TPOB-schemas) The TPOB-schemas ¬:9@5ØA^_@9DB`/9D9a#9 , n b9y:9|G and ¬/Â(5 A^_ , `1Â9aTÂ
ngbNÂy/ÂNG are Cartesian-product-compatible iff for all classes dv9=R>_@9 and g d  T R _  , the types ` 9 A d 9 G and `  A d  G have disjoint sets of top-level attributes. 19
The conditional join operation combines values of two TPOB-instances that satisfy a probabilistic selection condition ä . Let Ä9 and Ä be TPOB-instances over the Cartesian-product-compatible TPOB-schemas ¬ 9 and ¬  , respectively. The conditional join of Ä 9 and Ä Â with respect to ä , denoted Ä 94= > Ä Â , is the 9 ¶âÄÂDG over the TPOB-schema ¬¯ 9 ¶õ¬/ . TPOB-instance Ä9 = > ÄÂM5 ` A Ä· Example 4.37 Let ¬:9 and Ä9 be the TPOB-schema and the TPOB-instance, respectively, produced in Example 4.23. Let ¬  and Ä Â be the TPOB-schema and the TPOB-instance obtained from ¬ 9 and Ä 9 , respectively, by renaming the attributes ;z.r xrts and rtw with ;zWr xrtsH and Õrtw , respectively. The Cartesian product of ¬:9 and ¬¡Â is the TPOB-schema ¬Ç5ÖA^_B`¡9a>
ngbbyÛG partially shown in Table 16 and Figure 3. The conditional join of Ä9 and Ä with respect to ä,5´A.;zWr xrts5JI¡~SwÃÅezWr xrsKS5ÖmzWr {CGyþD| is the TPOB-instance Ä;5qA ÂUX Å%G over the TPOB-schema ¬§ 9 ¶º¬¡Â , where  is given by ÂAEA3Ûz.r ~ z.r uom
]~ -u|z²sN{ vWwzoGEG5´Ao$ ib
$ hNG and ÂA dDG5>ø for all other classes d , and Å is shown in Table 17. Table 16: Type assignment ` resulting from conditional join w
x y5z {.|Bz[}G~
y¾z { |Bz[}G~
/ Ao.~ y¾z { |Bz[}G~ / y y5z { |Bz[}~ 0 / ~ z ~ ~ ~ - [O O D /
w /
}
C5O
}
C5O
}
C5O
}
C5O
}
} } \
MLZO
$ } } \
M LZO
$ } ~ ~ } }
C %
$ $\
MLZ"0
$ } ~ ~ } }
C %
$ $\
MLZ"0
$ }
Table 17: Value assignment Å resulting from conditional join Î
Ñ
Î Ï
Î Ð /
Î /
} ~ ~ Ì Í 0
CSÒ\5
" Ó544/V
Ô ÕoØZXÔØ[$0/V"Ó5XNV4/V
Ô Õ59Ô Ø $-/ , } y5z ~ Ì Í 0
NL B
L ONVÚZ44/V
Ô ×Z9Ô ÕG$-/B9ONVÚZ4oØG/V
Ô Õ59ÔÛ[$-/ $
4.8 Intersection, Union, and Difference We finally define the operations of intersection, union, and difference on two TPOB-instances over the same TPOB-schema. We first describe intersection. Informally speaking, this operation intersects the sets of oids of two TPOB-instances, as well as the explicit values associated with each oid in both TPOB-instances. Definition 4.38 (intersection of TPOB-instances) Let Ä95OAÂ/9DXÅ9CG and ÄÂD5¼AÂ1ÂXÅÂNG be TPOB-instances over the same TPOB-schema ¬´5ÎA^_B`¡9aÁ
ngb yÛG , and let be a conjunction strategy. The intersec 9 °*ã¹Ä , is the TPOB-instance AÂUXÅ%G over ¬ , where: tion of Ä9 and Ä under , denoted Ä+ ÂAdDG58N$_RÉÂ=9NAdG:°®Â%ÂbAdDGMÒÅ9vAo$ G°*ã Å Ao$ G is defined , for all dUR>_ .
= Å Ao$ G5
Å9NAo$ G°*ãgÅÂbAo$ G
, for all $¨RÉÂA^_=G .
Example 4.39 Let ¬ be the TPOB-schema of Example 3.6. Let Ä9 and Ä be the TPOB-instances over ¬ given in Example 3.8 and produced in Example 4.14, respectively. The intersection of Ä9 and Ä under the conjunction strategy IPO is the TPOB-instance Ä5 A ÂUX Å%G over ¬ , where  is given by ÂA3:zWr ~zWr uoG5ø for all other classes d , and Å is shown in Table 18. Table 18: Î
Ñ Î Ï
Î
Å
resulting from intersection
/
} ~ ~ z ~ ~ . Ì Í 0
C#Ò\Z S "%
% 5S G"OZ# Q ON9Ó544/V
Ô Õ59NB$0/V"ONVÓZ%×ZNV/B9@Ô ×5XNX$-/ ~ Ì Í
"^ÓZ44/B9@Ô ÕØXNX$-/ $
20
,
Likewise, the union operation intuitively computes the union of the sets of oids of two TPOB-instances, combined with the union of the two explicit values associated with each oid in both TPOB-instances. We first define the union of two explicit values of the same type. Definition 4.40 (union of explicit values) Let 9 and b be either two values of the same classical type j , or two explicit values of the same probabilistic type j , and let be a disjunction strategy. The union of 9 and  under , denoted 9 2 ç  , is inductively defined as follows: If j is a classical type and 9 5§  , then 9 2 ç  5ì 9 .
If j is an atomic probabilistic type, then 9$2*çXbÂ
5
AB1Bê9yG@R9ÒvRÇô¡9D\§ô%Âv?2ÇAB%BêÂG@RbÂÒvRÉô%Âb\§ô/9|2 AB1Bê9WeêÂDGÕÒAB%Bê9G@R9DÛAB%BêÂvG@RbÂv
where ô 9 58D,ÒAB1BêG@RX 9 and ô Â 5YD,ÒAB1BêG@RX Â . If j is a probabilistic tuple type over the set of top-level attributes % then AB92*ç ÂNGC >5#9 Q2*ç bÂv for all YR)% .
and all 9 B2 ç
 are defined,
Otherwise, 92*ç Â is undefined.
We are now ready to define the union of two TPOB-instances. Definition 4.41 (union of TPOB-instances) Let Ä 9 5ÚAÂ 9 XÅ 9 G and Ä Â 5ÚAÂ Â XÅ Â G be TPOB-instances over the same TPOB-schema ¬>5ÉA^_B`¡9aÁ
ngb yÛG . Let be a disjunction strategy. The union of Ä9 and Ä under , denoted Ä9#2*ç¹Ä , is the TPOB-instance AÂUXÅ%G over ¬ , where: ÂAdDG5ÖAÂ=9vAdG|\§Â%ÂbAdDGEG32ËAÂ1 AdG|\§Â/9NAdDGEGR2
N$_RÉÂ 9 AdDG2°«Â Â AdDGMÒÅ 9 Ao$ GS2 V Å9DAo$ G TU ÅÂAo$ G
= Å Ao$ G5
ç Å
 Ao$ G is defined , for all dUR>_ .
if $_RÃÂ/9NAdDG|\¯Â1ÂbAdG if $_RÃÂ%ÂbAdDG|\¯Â=9NAdG if $_à R  9 A dD2 G °®Â  A dDG .
Å 9 Ao$ GS2 ç Å Â Ao$ G WU We finally define the difference of two TPOB-instances.
Definition 4.42 (difference of explicit values) Let 9 and  be either two values of the same classical type j , or two explicit values of the same probabilistic type j , and let be a difference strategy. The difference of 9 and  under , denoted 9\YX b , is inductively defined by: If j is a classical type and 915§b , then 9\YX ÂM5ì9 .
If j is an atomic probabilistic type, then 9/\<XbÂÍ5
AB1Bê9yG@R9SÒRüô¡9N\¯ô%ÂZ2ÇAB1Bê9WeêÂNGÒHAB1Bê9|G@R9:AB%BêÂNG@RbÂ
where ô 9 58D,ÒAB1BêG@RX 9 and ô Â 5YD,ÒAB1BêG@RX Â . If j is a probabilistic tuple type over the set of top-level attributes % then AB9\YX ÂNGC >5#9 ì\YX bÂv for all YR)% .
and all 9 §\ X Â are defined,
Otherwise, 9\YX Â is undefined.
Definition 4.43 (difference of TPOB-instances) Let Ä9;5íAÂ=9XÅ9|G and ÄÂ(5íAÂ%ÂXÅÂNG be TPOB-instances over the same TPOB-schema ¬#5hA^_UB`¡9aì
ncb y=G . Let be a difference strategy. The difference of Ä9 and Ä under , denoted Ä9\Y¹ X Ä , is the TPOB-instance AÂUXÅ%G over ¬ , where: ÂAdDG5ÖAÂ=9vAdG|\§Â%ÂbAdDGEG32[N$_RÉÂ=9NAdG:°®Â%ÂbAdDGMÒÅ9vAo$ GU\YX¹Å Ao$ G is defined , for all dUR>_ . ö
= Å Ao$ G5
Å9NAo$ G Å9NAo$ GU\YX®Å Ao$ G
if $¨RTÂ=9vAdDG|\·Â% AdDG if $¨RTÂ=9vAdDG2°®Â%ÂbAdDG .
21
5 The Implicit Algebra The explicit algebra described in the preceding section suffers from many problems. First, the sizes of TPOB-instances can be very large. As we can see from Table 7, a probability must be associated with each time point involved. However, to merely say that a given package will arrive at St. Louis sometime between 5:30pm and 6:30 pm may (if we reason at a minute by minute level) require 60 time points to be specified (Table 7 only shows a couple of time points). Second, because of the large size of the explicit TPOB-instances, the costs of executing the operations is also potentially high as their inputs are large. In this section, we alleviate this problem by defining TPOB algebraic operations on implicit TPOBinstances. These implicit operations correctly implement their explicit counterparts defined in Section 4.
5.1 Selection In order to define the selection operation for implicit TPOB-instances, it is sufficient to define how to evaluate path expressions and how to assess the probability that an implicit value satisfies an atomic selection condition. The valuation of selection conditions, the satisfaction of probabilistic selection conditions, and the selection on implicit TPOB-instances are then defined in the same way as in Section 4.1. Definition 5.1 (valuation of path expressions) Let ô be a path expression for the probabilistic type j . The valuation of ô under an implicit value of j , denoted 1 ô , is defined as follows: If è5Ö 9 9DD
bN and ô85ÁMIDé , then % ôÓ5#Ié . If Ô5hA.fM9D
È!9D
¤.9DE&=9D÷v9yGCvNA.f¨
Èè
¤µE&1÷ybG and ôi5 é , then 1 ôi5hA.fM9N
È!9D
¤W9DE&Û9 ,
÷ 9 BéGCNNA.f¨
Èè
¤µE&1÷yBéG . We call such sets generalized implicit values of j . % ô is undefined otherwise.
Definition 5.2 (valuation of atomic selection conditions) Let ÄÊ5çA ÂUX Å%G be an implicit TPOB-instance over the TPOB-schema ¬e5´A^_B`¡9a>
ngbbyÛG , and let $_RÃÂA^_ÛG . Let be the disjunction strategy for mutual exclusion. The probabilistic valuation with respect to Ä and $ , denoted ë#ì
lí î ï ð , is the following partial mapping from the set of all atomic selection conditions to the set of all closed subintervals of ¥D| : ë#ì
lí î ï ð AB+°-AdGEGÃ5Ö ncñ0ò=Ab[ß àNAáNGDAo$ GEGC
ncóoß:Ab[ß àNAáNGDAo$ GEG° . Let ô be a path expression for the type of $ . If ÅÛAo$bGC ô is a value of a classical type, then define ôÖ5
AEA.ûOGCNADÅ=Ao$ GEGCDDyå¨
ôXG , else if Å=Ao$ GC ô is a generalized implicit value of an atomic probabilistic type, then define ôÓ5 Å=Ao$ GC ô . Otherwise, ô is undefined. öø÷ ë#ì
lí î ï ð
if ô is defined I ¹:9 êyI undefined otherwise,
Aoôê»]G5
where ê9yvBê are the intervals ¤ºC÷Aù;GCE&Xºy÷Aù;G° such that A.f;
È[
¤FE&Û÷NCÞG@Rÿô , ùÔRÜ
l Ý.A.fG , and ùtÞO»U , if ô is defined. Note that ëSì
lí î ï ð Aoôæ»]HG is undefined, if some ùXtÞè»] is undefined.
For each +R¦ , let ô:I be a path expression for the type of $ . If Å=Ao$ GC ôÛI is a value of a classical type, then define ôHI5 AEA.ûOGCNADÅ=Ao$ GEGCDDyåÃ
ô:I
G , else if Å=Ao$ GC ôÛI is a generalized implicit value of an atomic probabilistic type, then define ô I 5 Å=Ao$ GC ô I . Otherwise, ô I is undefined. Then, ö ÷ ë#ìlíî ï ð
Aoô91»ãXô:ÂNG5
if ô 9 and ô Â are defined ·I ¹:9 ê I undefined otherwise,
where ê9DNBê is the list of all intervals ¤.9=º÷v9vAB9|GCE&=9=º÷v9vAB9|G° ¤µÂUºC÷yÂbABbÂNGCE&1º÷y ABÂNG° such that A.fÃI3
ȼIJ
¤µI3E&%I3÷|I3CÞ@IWG@RÇôHI , IRÜ
l ݰA.fÃI°G , and 9DtÞ9H»ÂtÞ¡Â , if ô/9 and ôm are defined. Observe that ë#ì
l í î ï ð Aoô9]o» ã[ô:ÂDG is undefined, if some 9ytÞ9%»UÂtÞ/Â is undefined. The following result shows that the selection on implicit TPOB-instances correctly implements its counterpart on explicit TPOB-instances. That is, the mapping ü commutes with ` . 22
Theorem 5.3 (correctness of selection) Let ÄM5ÖA ÂUX Å1G be an implicit TPOB-instance over a TPOB-schema ¬e5Ö^A _UB `¡9aì
ncb y=G , and let ä be a probabilistic selection condition. Then, AoüAÄNGEGÎ5 `
üA` AÄNGEG1
5.2 Restricted Selection In order to define the restricted selection operation on implicit TPOB-instances, it is sufficient to define restricted selection on implicit values. The restricted selection on implicit TPOB-instances is then defined in the same way as in Section 4.2. Definition 5.4 (restricted selection on implicit values) Let j be a probabilistic tuple type. Let ä be of the form ôtf , where ô is a path expression for j , and f is a constraint. Let be an implicit value of j . The restricted selection on with respect to ä , denoted ` ABG , is defined by: If Ë5 9 9N
SI IFD
S N , I is a value of a classical type, ô 5 SI , and IRÜEl Ý.A.fG , then ` ABHG5# . If Ç5 9m 9 N
I I v
S bN , I is an implicit value of an atomic probabilistic type, and ô85ìSI , then ` ABHG5Ö 9 9DD
MI IBQ²v
S b , where
I Q 5
A.fXÅOf Q
ÈÇ
¤JE&:÷GÒA.f Q
ÈÇ
¤JE&:÷G@R(IF%ÜEl Ý.A.fÅ f Q G5Á ½ ø
If #5Î 9 9Dv
MI IJv
S N , I is an implicit value of a probabilistic tuple type, and ABI°CG D
bN . ô85ìSI3Þé , then ` ABHG5Ö 9 9DD
MI ` Otherwise, ` ABHG is undefined. The next theorem shows that the restricted selection on implicit TPOB-instances correctly implements its counterpart on explicit TPOB-instances. That is, the mapping ü commutes with ` . Theorem 5.5 (correctness of restricted selection) Let Ä be an implicit TPOB-instance over a TPOBschema ¬
ngbbyÛG . Let ä be an expression of the form ôÕtf , where ô is a path expression for all `A dDG with d« R _ , and f is a constraint. Then, `
üA` AÄNGEG1
AoüAÄNGEGÎ5
5.3 Renaming To define renaming on implicit TPOB-instances, we need to define renaming on implicit values, which is then extended to implicit TPOB-instances in the same way as in Section 4.3. Definition 5.6 (renaming on constraints) Let f be a constraint for the classical type j , and let be a renaming condition for j . The renaming on f with respect to , denoted ÷ A.fG , is obtained from f by replacing every value I in f by ÷XABI°G . Definition 5.7 (renaming on implicit values) Let be a renaming condition of the form ô ô Q for the probabilistic tuple type j 55# 9 H° ã Â for all YR)% . Otherwise, 9+°*ã Â is undefined. The next result shows that the join on implicit TPOB-instances correctly implements its counterpart on explicit TPOB-instances. That is, the mapping ü commutes with = > ã . Theorem 5.10 (correctness of natural join) Let Ä9 and ÄÂ be implicit TPOB-instances over the natural join-compatible TPOB-schemas ¬:9 and ¬/Â , respectively. Let be a conjunction strategy. Then,
üAÄ9|G = > Ë ã üAÄÂNGÎ5
üAÄ9 = >
ãõÄÂNG1
5.5 Intersection, Union, and Difference To define intersection, union, and difference, we need to define the intersection, union, and difference of implicit values, which are then extended to implicit TPOB-instances in the same way as in Section 4.8. The intersection of implicit values is given by Definition 5.9, while the union and difference of implicit values is defined below. Definition 5.11 (union of implicit values) Let 9 and b be either two values of the same classical type j or two implicit values of the same probabilistic type j , and let be a disjunction strategy. The union of 9 and b under , denoted 9* 2 ç  , is inductively defined as follows: If j is a classical type and 915§b , then 92*ç ÂM5ì9 .
If j is an atomic probabilistic type, then 9 2 ç
 5
A.f 9 ÅeÄ^f ] Â
È 9
¤ 9 E& 9 ÷ 9 GÕÒA.f 9
È 9
¤ 9 E& 9 ÷ 9 G@R 9 %Ü
l Ý.A.f 9 ÅeÄ^f ]  G;5Á ½ ø2 A.f¨ÂÅeÄ fM ½ ø2 ] 9y
ÈèÂ
¤²ÂE&1Â÷yÂNGÕÒA.f¨Â
ÈèÂ
¤²ÂE&1Â÷yÂNG@RÂ%Ü
l Ý.A.f¨ÂÅeÄ fM ] 9CG;5Á AEA.ûOGCNABHGC
¤FE&ÛyåG;Ò ý=A.fM9y
È!9D
¤.9DE&=9D÷v9|G@R9NA.f¨Â
ÈèÂ
¤²ÂE&1Â÷yÂNG@Rb !ReÜEl ݰA.f 9 Å,f  GC: ¤FE&%¡5Ö ¤ 9 º÷ 9 ABHGCE& 9 º÷ 9 ABHG° Á ¤  º÷  ABGCE&  º÷  ABG°W
where f ] I , +%Rè , denotes the logical disjunction of all f I such that A.f I
È I
¤ I E& I ÷ I G@R I . 24
If j is a probabilistic tuple type over the set of top-level attributes % then AB92*ç ÂNGC >5#9 Q2*ç bÂv for all YR)% .
and all 9 B2*ç;b are defined,
Otherwise, 92*ç Â is undefined.
Definition 5.12 (difference of implicit values) Let 9 and b be either two values of the same classical type j or two implicit values of the same probabilistic type j , and let be a difference strategy. The difference of 9 and  under , denoted 9\YX b , is inductively defined as follows: If j is a classical type and 9 5§  , then 9 \ X  5ì 9 .
If j is an atomic probabilistic type, then 9 \ X Â 5
A.f 9 ÅeÄ^f ] Â
È 9
¤ 9 E& 9 ÷ 9 GÕÒA.f 9
È 9
¤ 9 E& 9 ÷ 9 G@R 9 %Ü
l Ý.A.f 9 ÅeÄ^f ] Â G;5Á ½ ø2 AEA.ûOGCNABHGC
¤FE&ÛyåG;Ò ý=A.fM9y
È!9D
¤.9DE&=9D÷v9|G@R9NA.f¨Â
ÈèÂ
¤²ÂE&1Â÷yÂNG@Rb !ReÜEl ݰA.fM9Å,f¨ÂNGC: ¤FE&%¡5Ö ¤W9=º÷v9vABHGCE&=9Ûº|÷v9DABHG° Á ¤²ÂUº÷y ABGCE&@Â]º÷y ABG°W
where fà ] I , +%Rè , denotes the logical disjunction of all fÃI such that A.fÃI3
ȼIF
¤·I3E&%I3÷|I°G@RI . If j is a probabilistic tuple type over the set of top-level attributes % and all 9 §\ X Â are defined, then AB9\YX ÂNGC >5#9 ì\YX bÂv for all YR)% .
Otherwise, 9\YX Â is undefined.
The following theorem shows that the intersection, union, and difference of implicit TPOB-instances correctly implement their counterparts on explicit TPOB-instances. That is, the mapping ü commutes with °*ã , * 2 ç , and \<X , respectively. Theorem 5.13 (correctness of intersection, union, and difference) Let Ä 9 and ÄÂ be two implicit TPOB (resp., , ) be a conjunction (resp., disjunction, instances over the same TPOB-schema ¬ , and let difference) strategy. Then,
üAÄ 9 G2°
 G1
(1)
üAÄ9CGS2*ç!üAÄÂNGÑ5
üAÄ92*ç¹ÄÂDG1
(2)
üAÄ9CGU\YX!üAÄÂNGÑ5
üAÄ9\YX¹ÄÂDG1
(3)
ã
üAÄ Â GÑ5
üAÄ 9
° ã Ä
5.6 Projection, Extraction, Cartesian Product, and Conditional Join The operations of projection, extraction, Cartesian product, and conditional join for implicit TPOB-instances are defined in exactly the same way as their counterparts for explicit TPOB-instances.
5.7 Compression Functions The implicit operations of natural join, intersection, union, and difference may generate implicit TPOBinstances that contain a large number of implicit tuples. Adopting an idea from [2], we now define compression functions through which such implicit TPOB-instances can be made more compact. Definition 5.14 (compression function) Let j be an atomic probabilistic type. A compression function _ _ for j is a function that maps every implicit value of j to an implicit value ABG of j such that (i) _ _ Ò ABGÒm`YÒ /Ò , and (ii) there exists a bijection between üABG and üA ABGEG that maps each AB1N ¤JE&%BG@R¼üABG to a _ pair AB1N ¤FE&%QþBG@RüA ABGEG such that ¤]`T&1Q@`T& . _ Example 5.15 Let j be an atomic probabilistic type. The same-distribution compression function maps _ every implicit value of j to the implicit value ABG , which is obtained from by iteratively replacing any two distinct A.fM9D
È!9D
¤FE&Û÷GCNA.f¨Â
ÈèÂ
¤JE&:÷G@RX with ÜEl Ý.AoÈ!9|GH5æÜEl Ý.AoÈèÂNG by A.fM9Æef¨Â
È!9D
¤FE&Û÷G . We now define the compression of implicit values of probabilistic types. Here, we assume that for every _3` atomic probabilistic type j , we have some compression function . 25
Definition 5.16 (compression of implicit values) Let be either a value of a classical type j , or an implicit _ value of a probabilistic type j . The compression of , denoted ABHG , is inductively defined as follows: If j is a classical type, then _ ABHG5# . If j is an atomic probabilistic type, then _ ABG5 _ ` ABG . If j is a probabilistic tuple type over the set of top-level attributes % , _ _ then ABHGC Ð5 AB1 ;G for all ÔR!% . We finally define the compression of implicit TPOB-instances. Definition 5.17 (compression of implicit TPOB-instances) Let Ä 5dAÂUXÅ%G be a TPOB-instance over the _ TPOB-schema ¬[5ÖA^_B`¡9a>
ngbbyÛG . The compression of Ä , denoted A ÄNG , is defined as the TPOB-instance _ A ÂUX ÅQµG over ¬ , where ÅQWAo$ G5 DA Å=Ao$ GEG for all $_Ã R Â^A _=G .
5.8 Preservation of Consistency and Coherence We now show that all our explicit algebraic operators defined in Section 4 preserve consistency and coherence of schemas and instances. If the input TPOB-schemas (resp., TPOB-instances) are consistent (resp., coherent), then the output TPOB-schemas (resp., TPOB-instances) are also consistent (resp., coherent). This also shows that all our implicit algebraic operators given in Section 5 preserve consistency and coherence of schemas and instances, respectively, as they correctly implement their explicit counterparts. The explicit operators of selection, restricted selection, intersection, union, and difference trivially preserve consistency of schemas, as the input TPOB-schemas coincide with the output TPOB-schemas. Projection and renaming also preserve consistency of schemas, as they only modify type assignments. The following result shows that extraction and natural join, and thus also Cartesian product and conditional join, preserve consistency of schemas. Theorem 5.18 Let ¬ be a TPOB-schema, and let + join-compatible TPOB-schemas. (a) If
¬
be a set of classes from
¬
¬ 9 and / ¬ Â be two . Let
is consistent, then , - A¬=G is consistent.
(b) If : ¬ 9 and ¡ ¬ Â are consistent, then ¬ 9 = > / ¬ Â is consistent. We now concentrate on the preservation of coherence. Recall that the coherence of a TPOB-instance ÄÛ5ØA ÂUX Å%G over a TPOB-schema ¬5TA^_UB`¡9aì
ngby=G depends on  , _ , a , ngb , and . The explicit algebraic operations of selection, restricted selection, intersection, union, and difference preserve coherence of instances, as they do not modify the input TPOB-schemas and they may only modify the input TPOBinstances by removing objects and changing value assignments to objects. Similarly, projection and renaming preserve coherence of instances, as they may only modify type and value assignments to classes and objects, respectively. The result below shows that natural join, and thus also Cartesian product and conditional join, preserve coherence of instances. Moreover, it shows that extraction preserves coherence of instances, when we do not remove any characteristic classes. Theorem 5.19 Let Ä , Ä9 , and Ä be TPOB-instances over the TPOB-schemas ¬ , ¬:9 , and ¬/ , respectively. Let Ä;5ÚA ÂUXÅ%G and ¬5TA^_UB`¡9aÁ
ngbbyÛG . Let + úÈ_ such that GdURÃ_ØÒd is characteristic for b[ß\àNAoGyAo$ G for some SRa+ and some $_RÃÂAb+èGXúc+ . Let ¬9 and ¬/Â be join-compatible. (a) If Ä is coherent, then , - AÄNG is coherent. (b) If Ä9 and ÄÂ are coherent, then Ä9 = > Ä Â is coherent.
6 Related Work There is quite extensive work in the literature on temporal databases and temporal object-oriented databases; we refer especially to the recent surveys [19, 10] and the books [23, 22].
26
Probabilistic extensions to relational databases are also well-explored in the literature; see especially [16, 7] for more background and a detailed discussion of recent work on probabilistic relational databases. Recently, more complex data models have been extended by probabilistic uncertainty in a number of papers. In particular, Eiter et al. [7] presented an approach that adds probabilistic uncertainty to complex value relational databases, while Kornatzky and Shimony [11, 12] and Eiter et al. [6] described approaches to probabilistic object-oriented databases. Our approach in this paper is a temporal extension of the model by Eiter et al. [6]. Additionally, the present paper newly introduces an implicit data model and an implicit algebra, which is shown to correctly implement its explicit counterpart, and which can be more efficiently realized. Moreover, the two operations of restricted selection and extraction are newly introduced here. Even though the areas of temporal and probabilistic databases are both well-explored, there is very little work on the integration of temporal reasoning and probabilistic databases. In particular, Dyreson and Snodgrass in their pioneering work [5] and subsequently Dekhtyar et al. [2] presented approaches to temporal indeterminacy in relational databases based on probabilistic uncertainty: d Dyreson and Snodgrass [5] extend the SQL data model and query language by probabilistic uncertainty on time points. They add indeterminate temporal attributes (which have indeterminate instants as associated values) to SQL. Indeterminate instants are intervals of time points with associated probability distributions. The SQL query language is extended by a construct to define the ordering plausibility, which is an integer between 1 and 100 that specifies to which degree the result of an SQL query should contain uncertain answers (where 1 means that any possible answer to a query is desired, while 100 says that only definite answers to a query are desired). Moreover, there is a construct to define the correlation credibility, which specifies simple modifications of the probability distributions in the base relations before evaluating the selection condition in SQL queries. Dyreson and Snodgrass also describe efficient data structures and query processing algorithms for their approach. Our work in this paper differs from theirs in several ways. First, we present an extension of object-oriented databases, while their approach is an extension of relational databases. Second, we make no independence assumptions between events (the user’s query can explicitly encode her knowledge of the dependencies between events, if any), while Dyreson and Snodgrass assume that all indeterminate events are probabilistically independent from each other. Third, our work introduces an algebra, while their work defines an SQL extension. Fourth, we present formal definitions of important notions like coherence and consistency and show that under appropriate assumptions, our operations all preserve coherence and consistency. Fifth, we allow for interval probabilities over solution sets of temporal constraints, while their work allows only for precise point probabilities over intervals of time points. d Dekhtyar et al. [2] extend the relational data model and algebra by temporal indeterminacy based on probabilities. They define a theoretical annotated temporal algebra on large annotated relations, and a temporal probabilistic algebra on succinct temporal probabilistic relations. They show that the latter efficiently and correctly implements the former. They also report on timings of the temporal probabilistic algebra in a prototype implementation. Our work in this paper, especially the idea of having an explicit algebra on large instances, which is efficiently and correctly implemented by an implicit algebra on succinct instances, is inspired by Dekhtyar et al.’s work. Our work, however, is an extension of the much richer object-oriented data model and algebra, as compared to the relational algebra. Our work may be viewed as a generalization of theirs. To our knowledge, there has been no work to date on temporal probabilistic object-oriented databases. There is other work on nonprobabilistic temporal indeterminacy in databases, which is less related to our work. In particular, Snodgrass [21] models indeterminacy using a model that is based on a three-valued logic. Dutta [4] and Dubois and Prade [3] propose a fuzzy logic approach to temporal indeterminacy, while Koubarakis [14, 13] and Brusoni et al. [1] suggest approaches based on constraints. Gadia et al. [8] introduce partial temporal databases, which are based on partial temporal elements.
27
7 Conclusions Dyreson and Snodgrass [5], followed subsequently by Dekhtyar et al. [2], have argued persuasively that there are numerous real-world applications where temporal uncertainty abounds. In this paper, we have used a simple example tracking shipments carried, for instance, by commercial carriers. Many other examples abound: stock market models making predictions of stock prices involve temporal probabilities specifying when a stock will reach a specific price. Archaeological databases containing radio-carbon dating of historical artifacts invariably involve temporal uncertainty as well. Programs tracking the behavior of parts on a factory floor and predicting when they will need to be serviced and/or replaced also involve temporal uncertainty. The fact that many of these applications also involve object models should come as no surprise. Descriptions of three dimensional historical artifacts are often stored using object models. Maintaining information about machine parts often includes design information, drawings, and manuals that are often represented with object models as well. In this paper, we have made a first attempt to deal with temporal uncertainty in object-based systems. We have provided two models. The first is an explicit model where a probability is associated with each time point. As temporal granularity gets finer and finer, this model gets more and more impractical to use. For this explicit model, we provide an algebra (e-algebra) that extends the relational algebra. To avoid the problems associated with the e-algebra, we introduce a succinct implicit algebra (i-algebra). We define operators for the i-algebra. We show that each operator in the i-algebra correctly implements the corresponding operator in the e-algebra without computing the entire explicit representation. Thus, the e-algebra operators “work” on a compact implicit represent of a much larger explicit representation. There are numerous directions for future research. Building physical cost models and cost based query optimizers for TPOBs is a major challenge that must be addressed if applications such as the package and stock market example are to scale up for heavy duty use. Building mechanisms to update such databases poses yet another challenge. Building view creation and maintenance algorithms provides a third challenge. Developing an implementation of (the implicit version of) TPOBs poses a fourth major challenge as it will provide a testbed for all the algorithms resulting from the other problems mentioned here. Acknowledgements. This work was supported by the Army Research Lab under contract number DAAL0197K0135, the Army Research Office under grant number DAAD190010484, by DARPA/RL contract number F306029910552, by the ARL CTA on Advanced Decision Architectures, by a DFG grant, and by a Marie Curie Individual Fellowship of the European Community.
Appendix A. Proofs for Section 5 For the proof of Theorem 5.3, we need the following lemma, which says that the valuation of path expressions under implicit values correctly implements the valuation of path expressions under explicit values. Here, the mapping e is extended to generalized implicit values as follows: Every generalized implicit value fhgjiKkmlYnporqsntoru ntowv3nto÷xntoyCzo{{{popkmlZ| , qa|}oru~|Howv|Ko÷|KoyCz is associated with the generalized explicit value eKk1fz.giKk1f á ø for all classes ²7 . We next show C2. Consider two classes ²xntor²Ö4 such that ²xnZ¡ implies e k²tzÌgÁ Ö ¡ ËËËR¡ }| exists such that n gA² n , K|gA² Ö , and Ö o{{{xo }| n 7Bö . ² Ö . That is, some path n ¡ î As eHk Rnz eKk }ÖNz AËËË QeHk }|z , it thus follows eKk²xnz QeHk²Öpz . We now prove that C3 holds. Let ²xntor²Ö4$ be two distinct classes that belong to the same cluster âa£ k²tz for some ² . That is, there exists a cluster ¢£}k²z such that, for ° ai}ÀMoÓK , either ² È belongs to or ² È is a proper subclass of a class in . As C2 and C3 hold for e , it thus follows that eKk²xnzðseKk²ÖpzCg>ø . This shows that C3 holds. We finally prove C4. Consider two classes ²pntor²Öñ $ such that ²xnÌ¡²Ö . That is, some path RnÌ¡ }Ö¡ ËËË¡ }| exists È:| É î n n ¤Äk È o È nz . such that Rngزxn , }|!gØ²Ö , and KÖxo{{{xo }| n BBö . Moreover, it holds ¤ k²xntor²ÖpzÐg î As C4 holds for e , it follows that
eHk È z
g ¤Äk È o È n zËx
eKk È n z
for all ° 6i}ÀMo{{{NoßaöQÀN . This shows È:| É î n n ¤Äk È o È nzËN
eKk²Öxz
, that is,
etmk²xnz
g¤k²xntor²ÖpzËN
etmk²Öpz
. This proves C4. that
eHk²xnz
g Ö3gúkor o¡Aor¢a£o¤3z . Let enÝ (b) Let n$gúknor Snto¡ nor¢a£no¤nz , Ö3gúkRÖ , ÖMo¡ÖNor¢a£pÖNo¤Öpz , and n n(ý Óÿ and exÖ6ÝRÖ^ýÓÿ be models of n and Ö , respectively. Let the mapping esÝRÑýÛÓÿ , where n Ö , be defined as follows: Bg n Ö and g
eHk²zg
enxk²xnz
exÖk²ÖNzo for all ²Ygk²xntor²Öpz?s .
We now show that e is a model of . We first prove C1. Since enxk²xnz gh á ø for all classes ²xn)c$n and epÖk²Öpzñg á ø for all classes ²ÖsÃRÖ , we get eHk²tzsgq á ø for all classes ²sÑ . We next show C2 and C4. Let ²ghk²xntor²Öxzo Bgjk "nto }Öpz with ²¡ . Without loss of generality, we can assume that ²xnÌ¡ n "n and Ö . Since e n is a model of n , it holds that e n k² n z Æe n k n z and
e n k² n z
Äg ¤ n k² n o n z?Ë
e n k n z
. ² Ö g eKk z and
eKk²tz
Rg³¤Äk²No zÄËR
eKk Hz
. We finally prove C3. Let ²No BB Hence, it immediately follows eKk²tz be two distinct classes that belong to the same cluster ¢£}kSz . Without loss of generality, we can ¢a£Mnpk$nz and that ²ÖÌg KÖ . Since en is a model assume that ²xnto Rn*$n belong to the same cluster Yn of n , it holds that enxk²pnz3ð4enpk RnzÄgÁø . Thus, eKk²tzSð4eHk Hz.g>ø . Proof of Theorem 5.19. (a) Let Bg ko z and ûCü