Access Patterns - Semantic Scholar

Report 5 Downloads 269 Views
Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns Alan Nash Department of Mathematics Bertram Lud¨ascher San Diego Supercomputer Center University of California, San Diego

Processing Queries under Limited Access Patterns, EDBT’04 – p. 1/2

Data Integration •

Problem: Integrate sources with limited query capabilities



Example: Global-as-View (GAV) integration: • Given: query Q against global view GV over source schemas Si : Q(¯ x) ← QGV (¯ x, y¯) •

Find: query plan P with subqueries Qi against sources Si with limited access patterns: Q(¯ x) ← P (Q1 , . . . , Qn )(¯ x, y¯)

Processing Queries under Limited Access Patterns, EDBT’04 – p. 2/2

Web Service Composition •





Problem: Declarative composition of web services into larger web service “workflows” or ”dataflows.” Idea: Specify composite web service plans as declarative queries, then do query planning over sources with limited access patterns. Example: Given web service “relations” WS1 (in y, out z) and WS2 (in x, out y), the declarative query Q(x, z) ← WS2 (x, y), WS1 (y, z) WS2

WS1

becomes a web service plan x −→ y −→ z.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 3/2

Access Patterns Assume you have the following interface, which requires you enter an author, a title, or a subject. author title subject publisher We model this as a relation B(a, t, s, p) with three access patterns: iooo, oioo, ooio.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 4/2

Queries Under Access Patterns What kinds of queries can we answer? • List all books • Title and publisher for non-‘Springer’ books by ‘Knuth’ • Books by ‘Knuth’ or ‘Aho’

Processing Queries under Limited Access Patterns, EDBT’04 – p. 5/2

Classes of Queries • • • •

Q1 (a, t, s, p) ← B(a, t, s, p) Q2 (t, p) ← ¬B( ‘Knuth’ , t, s, ‘Springer’ ), B( ‘Knuth’ , t, s, p) Q3 (a, t, p) ← B( ‘Knuth’ , t, s, p) Q3 (a, t, p) ← B( ‘Aho’ , t, s, p) Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t)

These are CQ, CQ¬ ,UCQ, and UCQ¬ respectively.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 6/2

Executable Queries A UCQ¬ query is executable if every variable appears first, positively, in an output slot in the body. Q3 can be annotated as follows Q3 (a, t, p) ← B iooo ( ‘Knuth’ , t, s, p) Q3 (a, t, p) ← B iooo ( ‘Aho’ , t, s, p) Q3 is executable; the rest are not. •

Processing Queries under Limited Access Patterns, EDBT’04 – p. 7/2

Orderable Queries A UCQ¬ query is orderable if its subgoals can be reordered to get an executable query. Q2 can be reordered to get Q02 (t, p) ← B iooo ( ‘Knuth’ , t, s, p), ¬B iooo ( ‘Knuth’ , t, s, ‘Springer’ ) Q2 and Q3 are orderable; Q1 and Q4 are not. •

Processing Queries under Limited Access Patterns, EDBT’04 – p. 8/2

Feasible Queries A UCQ¬ query is feasible if it is equivalent to an ¬ executable UCQ query. Q4 is equivalent to Q04 (a) ← Lo (t), B oioo (a, t, s, p) Q2 , Q3 , and Q4 are feasible; Q1 is not. •

Processing Queries under Limited Access Patterns, EDBT’04 – p. 9/2

The Answerable Part Access Patterns: B iooo , B oioo , B ooio , Lo , C ii • Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t) Bindings: None • ans(Q4 )(a)← Bindings: None • ans(Q4 )(a)←

Processing Queries under Limited Access Patterns, EDBT’04 – p. 10/2

The Answerable Part Access Patterns: B iooo , B oioo , B ooio , Lo , C ii • Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t) Bindings: None • ans(Q4 )(a) ← Lo (t) Bindings: None • ans(Q4 )(a) ← Lo (t)

Processing Queries under Limited Access Patterns, EDBT’04 – p. 10/2

The Answerable Part Access Patterns: B iooo , B oioo , B ooio , Lo , C ii • Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t) Bindings: t • ans(Q4 )(a) ← Lo (t) Bindings: t • ans(Q4 )(a) ← Lo (t)

Processing Queries under Limited Access Patterns, EDBT’04 – p. 10/2

The Answerable Part Access Patterns: B iooo , B oioo , B ooio , Lo , C ii • Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t) Bindings: t ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p) Bindings: t •



ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p)

Processing Queries under Limited Access Patterns, EDBT’04 – p. 10/2

The Answerable Part Access Patterns: B iooo , B oioo , B ooio , Lo , C ii • Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t) Bindings: t, a, s, p ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p) Bindings: t, a, s, p •



ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p)

Processing Queries under Limited Access Patterns, EDBT’04 – p. 10/2

The Answerable Part Access Patterns: B iooo , B oioo , B ooio , Lo , C ii • Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t) Bindings: t, a, s, p ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p) Bindings: t, a, s, p •



ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p), ¬C ii (a, t)

Processing Queries under Limited Access Patterns, EDBT’04 – p. 10/2

The Answerable Part Access Patterns: B iooo , B oioo , B ooio , Lo , C ii • Q4 (a) ← B(a, t, s, p), L(t), C(b, t) Q4 (a) ← B(a, t, s, p), L(t), ¬C(a, t) Bindings: t, a, s, p ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p) Bindings: t, a, s, p •



ans(Q4 )(a) ← Lo (t), B oioo (a, t, s, p), ¬C ii (a, t)

ans(Q) can be computed efficiently (in quadratic time).

Processing Queries under Limited Access Patterns, EDBT’04 – p. 10/2

The Answerable Part For Q ∈ CQ¬ : • A subgoal is Q-answerable if there is an executable query E including that subgoal and subgoals from Q. • If Q is unsatisfiable, then ans(Q) := false. Otherwise, ans(Q) is the query given by the Q-answerable subgoals in Q. • Head of ans(Q): variables in the head of Q and in the body of ans(Q). If Q ∈ UCQ¬ with Q = Q1 ∪ . . . ∪ Qk , then ans(Q) := ans(Q1 ) ∪ . . . ∪ ans(Qk ). •

Processing Queries under Limited Access Patterns, EDBT’04 – p. 11/2

Query Containment •









Q1 is contained in Q2 (Q1 v Q2 ) if, for every database D, Q1 (D) ⊆ Q2 (D). Q1 is equivalent to Q2 (Q1 ≡ Q2 ) if Q1 v Q2 and Q2 v Q1 . Checking containment of CQ or UCQ is NP-complete. Checking containment of CQ¬ or UCQ¬ is ΠP2 -complete. ΠP2 := coNPNP . That is, ΠP2 is what can be computed in coNP with access to an NP oracle.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 12/2

Outline • • •



• •



We introduced access patterns. We defined executable, orderable, and feasible. Executable and orderable are syntactic notions. Feasible is a semantic notion that depends on equivalence. We have shown how to compute ans(Q), the “answerable part” of Q. We present our main results. We show how to compute the underestimate query Qu and the overestimate query Qo . We show how to use Qu and Qo at runtime.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 13/2

Summary of Definitions •









A UCQ¬ query is safe if every variable appears positively in the body. A UCQ¬ query is executable if every variable appears first, positively, in an output slot in the body. A UCQ¬ query is orderable if its subgoals can be reordered to get an executable query. A UCQ¬ query is feasible if it is equivalent to an executable UCQ¬ query. ans(Q) is the answerable part of Q.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 14/2

The Feasibility Problem Given Q ∈ UCQ¬ , determine whether Q is feasible. • Known: NP-complete for CQ and UCQ. •

We show: ΠP2 -complete for CQ¬ and UCQ¬ .

Processing Queries under Limited Access Patterns, EDBT’04 – p. 15/2

Results Assume Q, E ∈ UCQ¬ . • ans(Q) is executable.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 16/2

Results Assume Q, E ∈ UCQ¬ . • ans(Q) is executable. •

Q is feasible iff ans(Q) ≡ Q.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 16/2

Results Assume Q, E ∈ UCQ¬ . • ans(Q) is executable. • •

Q is feasible iff ans(Q) ≡ Q. That is, UCQ¬ feasibility reduces to containment.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 16/2

Results Assume Q, E ∈ UCQ¬ . • ans(Q) is executable. • • •

Q is feasible iff ans(Q) ≡ Q. That is, UCQ¬ feasibility reduces to containment. ¬

¬

Feasibility of CQ and UCQ : ΠP2 -complete.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 16/2

Results Assume Q, E ∈ UCQ¬ . • ans(Q) is executable. • • • •

Q is feasible iff ans(Q) ≡ Q. That is, UCQ¬ feasibility reduces to containment. ¬

¬

Feasibility of CQ and UCQ : ΠP2 -complete. If Q v E and E is executable, then Q v ans(Q) v E. That is, if there is a minimal executable query containing Q, it is equivalent to ans(Q).

Processing Queries under Limited Access Patterns, EDBT’04 – p. 16/2

Compile-time vs. Runtime •





At compile time (we have the query Q but no database D) we can check whether Q is feasible. At runtime (we have the query Q and a database D) we can check whether we have Q(D), regardless of whether Q is feasible. If we do not have exactly Q(D), we can often quantify how close we are.

Processing Queries under Limited Access Patterns, EDBT’04 – p. 17/2

Under- and Overestimates • •





Access patterns: S o , Roo , B ii , T oo Q(x, y) ← ¬S(y), R(x, y), B(x, z) Q(x, y) ← T (x, y) Underestimate: Qu (x, y) ← T (x, y) Overestimate: Qo (x, y) ← R(x, y), ¬S(y) Qo (x, y) ← T (x, y)

Processing Queries under Limited Access Patterns, EDBT’04 – p. 18/2

Example •

Q(x, y) ← ¬S(y), R(x, y), B(x, z) Q(x, y) ← T (x, y)



Qu (x, y) ← T (x, y)



Qo (x, y) ← R(x, y), ¬S(y) Qo (x, y) ← T (x, y)

T

R 1 2 3 4

Qu

S 1 2 5 6

Qo 1 2 3 4 ∆

1 2 3 4 5 6

B 5 7 Q

1 8 3 9 1 2 3 4

Processing Queries under Limited Access Patterns, EDBT’04 – p. 19/2

Algorithm Answer* ?

procedure A NSWER (Q) (Qu , Qo ) := P LAN ? (Q) ansu := A NSWER (Qu , D) anso := A NSWER (Qo , D) ∆ := anso \ ansu output ansu if ∆ = ∅ then output “answer is complete” else output “answer not known complete;” output “possibly part of the answer:” output ∆ if ∆ has no null values then u| output “answer at least” |ans |anso | “complete”

Processing Queries under Limited Access Patterns, EDBT’04 – p. 20/2

Algorithm Plan* ?

procedure P LAN (Q) for i := 1 to n do Ai := A NSWERABLE (Qi , P) Ui := Q i \ Ai Ai if Ui = ∅ u Qi := ⊥ otherwise v¯ := x¯ \ vars(Ai ) Qoi := Ai and ( v¯ = null ) Qu := Qu1 ∨ · · · ∨ Qun Qo := Qo1 ∨ · · · ∨ Qon output Qu , Qo

Processing Queries under Limited Access Patterns, EDBT’04 – p. 21/2

Algorithm Feasible* procedure F EASIBLE (Q) ? u o (Q , Q ) := P LAN (Q) if Qu = Qo then return true else if Qo contains null then return false else return Qo v Q

Processing Queries under Limited Access Patterns, EDBT’04 – p. 22/2

Summary •

Feasibility of UCQ¬ : ΠP2 -complete.



Feasibility of CQ¬ : ΠP2 -complete.



ans(Q): minimal executable query containing Q. Unified algorithm for CQ, UCQ, CQ¬ , UCQ¬ . Runtime approximations.

• •

Processing Queries under Limited Access Patterns, EDBT’04 – p. 23/2

Future Work • • • •

Beyond UCQ¬ : FO (PODS 2004) Integrity constraints Views with access patterns Recursion

Processing Queries under Limited Access Patterns, EDBT’04 – p. 24/2

Recommend Documents