Incremental View Maintenance for openCypher Queries

Report 0 Downloads 52 Views
4th openCypher Implementers Meeting

Incremental View Maintenance for openCypher Queries Gábor Szárnyas, József Marton

MODEL-DRIVEN ENGINEERING  Primarily for designing critical systems  Models are first class citizens during development o SysML / requirements, statecharts, etc. o Validation and code generation techniques for correctness

Technology: Eclipse Modeling Framework (EMF)  Originally started at IBM as an implementation of the Object Management Group’s (OMG) Meta Object Facility (MOF).  i.e. an object-oriented model  i.e. a property graph-like structure with a metamodel

MODEL VALIDATION  Implemented with model queries  Models are typed, attributed graphs

Complex graph queries

 Typical queries o Get two components connected by a particular edge MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s) o Check if two objects are reachable MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s) o Property checks MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y)

RAILWAY NETWORK MODEL sensor B

segment

route

1

switch 2

sensor A

sensor C

RAILWAY NETWORK MODEL route 1

sensor B

segment segment

sensor A

switch

segment

sensor C

route 2

RAILWAY NETWORK MODEL route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route: Route

:FOLLOWS

swP: SwitchPosition

:REQUIRES sensor: Sensor

:TARGET :MONITORED_BY

sw: Switch

MATCH (route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw G. Szárnyas, B. Izsó, I. Ráth, D. Varró: The Train Benchmark: cross-technology performance evaluation of continuous model queries. Software and Systems Modeling, 2017

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

route 1

sensor A

route 2

sensor C

sensor B segment

segment switchPosition «diverging»

switch

segment switchPosition «straight»

INCREMENTAL VIEW MAINTENANCE (IVM) In many use cases…  queries are static  data changes slowly -> views can be maintained incrementally Graph applications  model validation  simulation  recommendation systems  fraud detection

INGRAPH: IVM ON PROPERTY GRAPHS Idea: map to relational algebra and use standard IVM techniques  Challenging aspects o Property graph data model o Cypher language

 Formalise the language in relational algebra  Use nested relational algebra -> closed on operations Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks) Gábor Szárnyas, József Marton, Dániel Varró: Formalising openCypher Graph Queries in Relational Algebra. ADBIS 2017

INGRAPH / GRAPH TO NESTED RELATIONS

INGRAPH / NESTED RELATIONAL ALGEBRA OPS

INGRAPH  ingraph uses a procedural IVM approach: the Rete algorithm. o Build caches for each operator o Maintain caches upon changes o Supports 15+ out of 25 LDBC BI queries o Details to be published in a conference paper o Extensible, but very heavy on memory

 The rest of the talk focuses on the algebraic approach. Gábor Szárnyas, József Marton et al.: Incremental View Maintenance on Property Graphs. arXiv preprint will be available on the 1st week of June

Delta Queries for openCypher

DELTA QUERIES AT A GLANCE changes Δ𝐺1 , Δ𝐺2 , …

evaluate query 𝑄 for each Δ𝐺 evaluate Δ𝑄

𝑄(𝐺) 𝐺 Δ𝑄(Δ𝐺1 ) Δ𝑄(Δ𝐺2 ) ⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ ) 𝑄 and Δ𝑄 are calculated by the same engine.

IMPLEMENTATION: TRIGGERS IN NEO4J  Event-driven programming in databases  Neo4j: TransactionEventHandler interface o afterCommit(TransactionData data, T state) o beforeCommit(TransactionData data) o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …

 Only the updated state of the graph is accessible.  GraphAware framework: ImprovedTransactionData API o Get properties and labels/types of deleted elements Michal Bachman: Neo4j Improved Transaction Event API. 2014

Max de Marzi: Triggers in Neo4j. 2015

DERIVING DELTA QUERIES Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which define positive and negative changes, respectively. But: most IVM techniques are defined for relational algebra. a b a b Notation: 1 2 1 2  𝑅 relation 3 4 3 4 𝛻𝑅 𝑅 𝑅𝑚 5 6 5 6  Δ𝑅 positive changes 7 8 Δ𝑅  𝛻𝑅 negative changes  𝑅 𝑚 : maintained relation of 𝑅 ⇒ 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅  “−” denotes set minus (∖), “+” denotes set union (∪)

a

b

1

2

5

6

7

8

RELATIONAL ALGEBRA FOR CYPHER  Query plans in Neo4j ≅ relational algebra + Expand/VarExpand.  Expand is essentially a natural join.    

Natural join Semijoin Antijoin Left outer join

𝑟⋈𝑠 𝑟 ⋉ 𝑠 = 𝜋𝑅 𝑟 ⋈ 𝑠 ഥ 𝑠 =𝑟∖ 𝑟⋉𝑠 𝑟⋉ ഥ 𝑠 //plus nulls 𝑟⟕𝑠 ≅ 𝑟 ⋈𝑠 ∪ 𝑟⋉

Andrés Taylor: Neo4j Cypher implementation. First openCypher Implementers Meeting, 2017

RELATIONAL ALGEBRA FOR CYPHER Natural join: 𝑟 ⋈ 𝑠 MATCH (v1)-[:r]->(v2)-[:s]->(v3) RETURN * Semijoin: 𝑟 ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE (v2)-[:s]->() ഥ𝑠 Antijoin: 𝑟 ⋉ MATCH (v1)-[:r]->(v2) WHERE NOT (v2)-[:s]->() Left outer join: 𝑟 ⟕ 𝑠 MATCH (v1)-[:r]->(v2) OPTIONAL MATCH (v2)-[:s]->(v3)

1 4 1 4 1 4 1 4

:r :r :r :r :r :r :r :r

2

:s :s

5 2

:s :s

5 2

5

v2

v3

1

2

3

6

1

2

6

3

v1

v2

1

2

v1

v2

4

5

v1

v2

v3

1

2

3

1

2

6

4

5

null

6 :s :s

5 2

v1

3

3 6

:s :s

3 6

DERIVING DELTA QUERIES X. Qian, G. Wiederhold: Incremental Recomputation of Active Relational Expressions. TKDE 1991 T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. TKDE 1997 T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995 T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998

DELTA QUERIES  The seminal paper  Δ/𝛻 delta queries for joins, selections, projections, etc.  Bag semantics

T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995

DELTA QUERIES  Semijoins, antijoins, outer joins  Set semantics  Later publications, e.g. Zhou-Larson’s ICDE’07 paper improved these T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998

EXAMPLE QUERY #1

v1

:a

v2

:b

v3

:c

v4

MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)  𝑎 𝑣1 , 𝑣2 RETURN v1, v2, v3, v4  𝑏 𝑣2 , 𝑣3  𝑐 𝑣3 , 𝑣4 Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐 Δ 𝑎⋈𝑏⋈𝑐 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 = 𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 .

EXAMPLE QUERY #1

v1

:a

v2

:b

v3

:c

v4

MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4)  𝑎 𝑣1 , 𝑣2 RETURN v1, v2, v3, v4  𝑏 𝑣2 , 𝑣3  𝑐 𝑣 , 𝑣 3 4 Δ 𝑎⋈𝑏⋈𝑐 = 𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 $pcs -> pass lists of nodes/edges as parameters // This only works in embedded mode, see neo4j/issues/10239

POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 // r1 = a⋈b⋈Δc UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r2 = a⋈Δb⋈c UNWIND $pbs AS pb MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r3 = Δa⋈b⋈c UNWIND $pas AS pa MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4

POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 Long WITH chains are cumbersome -> patterns+list comprehensions.

WITH [pc IN $pcs | // r1 = a⋈b⋈Δc [(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) | [v1, v2, v3, v4]]] [pb IN $pbs | // r2 = a⋈Δb⋈c [(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] + [pa IN $pas | // r3 = Δa⋈b⋈c [(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] AS r RETURN r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4

EXAMPLE QUERY #2

route: Route

:FOLLOWS

:REQUIRES MATCH (route:Route) :MONITORED_BY -[:FOLLOWS]->(swP:SwitchPosition) sensor: Sensor -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw

MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4

v1

:a

swP: SwitchPosition :TARGET sw: Switch

v2 :b

:d v4

:c

v3

NEGATIVE CONDITIONS v1

:a

v2 :b

:d v4

:c

v3

MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4

ഥ𝑑 ⇒𝑎⋈𝑏⋈𝑐⋉

   

𝑎 𝑏 𝑐 𝑑

𝑣1 , 𝑣2 𝑣2 , 𝑣3 𝑣3 , 𝑣4 𝑣1 , 𝑣4

DELTA QUERIES FOR JOINS AND ANTIJOINS Natural join  Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚 + 𝑆 𝑚 ⋈ Δ𝑇  𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇 Antijoin ഥ𝑇 =  Δ 𝑆⋉

ഥ 𝑇𝑚 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ⋉

ഥ𝑇 =  𝛻 𝑆⋉

ഥ𝑇 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ⋉

Expression 1 Expression 2 Only 𝑆 𝑚 and 𝑇 𝑚 are available.

ഥ 𝑇𝑚 + Δ𝑆 ⋉ ഥ𝑇 + 𝛻𝑆 ⋉

SUBEXPRESSIONS ഥ 𝑇 =? 1. Δ𝑇 ⋉ ഥ 𝑅2 , where 𝑅1 and 𝑅2 both have schema 𝑅.  R1 ⋉ ഥ 𝜃 𝑅2 = 𝑅1 − 𝜋𝑅 𝑅1 ⋈𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈𝜃 𝑅2  𝑅1 ⋉  If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈𝜃 ) becomes a natural join, which is an intersection for relations with the same schema.  𝑅1 ⋈𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2  𝑅1 − 𝑅1 ⋈𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2 ഥ 𝜃 𝑅2 = 𝑅1 − 𝑅2 ⇒ ∗ 𝑅1 ⋉ 2. 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅

ഥ 𝜃 𝑅2 = 𝑅1 − 𝑅2 ∗ 𝑅1 ⋉ ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅

DELTAS FOR ANTIJOINS Based on Griffin-Kumar’s ’98 paper. ഥ𝑇 =  Δ 𝑆⋉

ഥ 𝑇𝑚 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ⋉

ഥ 𝑇𝑚 + Δ𝑆 ⋉



ഥ 𝑇𝑚 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ⋉

∗∗

ഥ𝑇 =  Δ 𝑆⋉

ഥ 𝑇𝑚 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ⋉

ഥ𝑇 =  Δ 𝑆⋉

ഥ 𝑇𝑚 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ⋉

ഥ𝑇 =  𝛻 𝑆⋉

ഥ𝑇 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ⋉

ഥ𝑇 =  Δ 𝑆⋉

ഥ𝑇 + 𝛻𝑆 ⋉



ഥ𝑇 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ⋉

∗∗

ഥ𝑇 =  𝛻 𝑆⋉

ഥ 𝑇 𝑚 − 𝛻𝑇 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ⋉

ഥ𝑇 =  𝛻 𝑆⋉

ഥ 𝑇 𝑚 − 𝛻𝑇 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ⋉

ഥ𝑇 =  𝛻 𝑆⋉

NEGATIVE CONDITIONS

v1

:a

:b

:d v4

ഥ𝑑 = Δ 𝑎⋈𝑏⋈𝑐 ⋉ 𝑎⋈𝑏⋈𝑐

𝑚

𝑎⋈𝑏⋈𝑐

⋉ 𝛻𝑑

𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 Pushdown 𝛻𝑑: 𝑎𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑



𝑚

v2

−Δ 𝑎 ⋈𝑏 ⋈𝑐

:c

v3

   

𝑎 𝑏 𝑐 𝑑

𝑣1 , 𝑣2 𝑣2 , 𝑣3 𝑣3 , 𝑣4 𝑣1 , 𝑣4

ഥ 𝑑𝑚 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉

Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚

NEGATIVE CONDITIONS Δ





⋉ ⋈ ⋈

⋉ ⋈ ⋈Δ ⋈Δ





v1

:a

v2 :b

:d



⋉ ⋈Δ ⋈ ⋈Δ ⋈ ⋈

⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈Δ ⋉ ⋈Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉

v4

Δ ⋈ Δ ⋈

⋈ ⋈

:c

v3

⋉ ⋉

   

, , , ,

NEGATIVE CONDITIONS Δ





⋉ ⋈ ⋈

R1 R2 R3 R4 R5 R6 R7

⋉ ⋈ ⋈Δ ⋈Δ





v1

:a

v2 :b

:d



⋉ ⋈Δ ⋈ ⋈Δ ⋈ ⋈

v4

Δ ⋈ Δ ⋈

⋈ ⋈

:c

v3

   

, , , ,

⋉ ⋉

⋉ ⋉ S1 ∈ . ⋉ ⋈ ⋈ Δ ⋉ Δ S2 Δ ⋉ ∈ . ⋉ S3 ⋉ ⋈Δ ⋈ ⋉ ∈ . Δ S4 Δ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ∈ . ⋈ ⋈Δ ⋉ ⋈Δ ⋈ ⋉ ⋉ , where ∩ is a ∈ . Δ ⋈ ⋈ ⋉ single vertex, because and represent edges.

ഥ𝑑 POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ WITH [] AS pes, [] AS nes WITH [pe [pe [pe [pe [ne [ne [ne [ne

IN IN IN IN IN IN IN IN

pes pes pes pes nes nes nes nes

WHERE WHERE WHERE WHERE WHERE WHERE WHERE WHERE

type(pe) type(pe) type(pe) type(pe) type(ne) type(ne) type(ne) type(ne)

nd_v1s, nd_v4s, s1s, s2s,note collect({v3: v3, v4: v4}) AS s3s = = = = = = = =

'a'|pe] 'b'|pe] 'c'|pe] 'd'|pe] 'a'|ne] 'b'|ne] 'c'|ne] 'd'|ne]

AS AS AS AS AS AS AS AS

WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, [nd IN nds | startNode(nd)] AS nd_v1s, [nd IN nds | endNode(nd)] AS nd_v4s // calculating s1s...s4s // s1s: (𝑎⋉∇𝑑) UNWIND nd_v1s AS v1 MATCH (v1)-[:a]->(v2) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, collect({v1: v1, v2: v2}) AS s1s // s2s: (Δ𝑎⋉∇𝑑) UNWIND pas AS pa MATCH (v1)-[pa]->(v2) WHERE v1 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, collect({v1: v1, v2: v2}) AS s2s // s3s: (𝑐⋉∇𝑑) UNWIND nd_v4s AS v4 MATCH (v3)-[:c]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds,

pas, pbs, pcs, pds, nas, nbs, ncs, nds

// s4s: (Δ𝑐⋉∇𝑑) UNWIND pcs AS pc MATCH (v3)-[pc]->(v4) WHERE v4 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, collect({v3: v3, v4: v4}) AS s4s

WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, collect([v1, v2, v3, v4]) AS r2

UNWIND pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, collect([v1, v2, v3, v4]) AS r5

// r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 // r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅ ⋉𝑑 WITH UNWIND pbs AS pb pas, nas, pbs, nbs, pcs, ncs, pds, nds, MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4) nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, WHERE NOT (v1)-[:d]->(v4) r1, r2, WITH // calculating r1...r7 s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 pas, nas, pbs, nbs, pcs, ncs, pds, nds, WITH AS v4 nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, pas, nas, pbs, nbs, pcs, ncs, pds, nds, MATCH (v2)-[b:b]->(v3) r1, r2, r3, r4, r5, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s WHERE b IN pbs collect([v1, v2, v3, v4]) AS r6 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, // r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) // r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅ ⋉𝑑 UNWIND s1s AS s1 nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, UNWIND pas AS pa UNWIND s3s AS s3 r1, r2, MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4) WITH collect([v1, v2, v3, v4]) AS r3 WHERE NOT (v1)-[:d]->(v4) pas, nas, pbs, nbs, pcs, ncs, pds, nds, WITH nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, pas, nas, pbs, nbs, pcs, ncs, pds, nds, // r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 UNWIND s2s AS s2 nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, AS v4 UNWIND s3s AS s3 r1, r2, r3, r4, r5, r6, WHERE (v2)-[:b]->(v3) WITH collect([v1, v2, v3, v4]) AS r7 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, WITH nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r1 + r5 + r6 + r7 AS rp, collect([v1, v2, v3, v4]) AS r1 s2.v1 AS v1, s2.v2 AS v2, s3.v3 AS v3, s3.v4 r2 + r3 + r4 AS rn AS v4 MATCH (v2)-[b:b]->(v3) RETURN // r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑) UNWIND s1s AS s1 WHERE b IN pbs [r IN rp WHERE NOT r IN rn] AS results UNWIND s4s AS s4 WITH WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r1, collect([v1, v2, v3, v4]) AS r4 s1.v1 AS v1, s1.v2 AS v2, s4.v3 AS v3, s4.v4 AS v4 // r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅ ⋉ 𝑑

ഥ𝑑 POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉  Workaround: knowing the change workload helps o Only consider changes in Δ𝑑 and 𝛻𝑑 o Query is cleaner and much more efficient

UNWIND $nds AS nd WITH startNode(nd) AS v1, endNode(nd) AS v4 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4  This can outperform recomputing the query from scratch.

LIST OPERATIONS IN CYPHER Instead of subqueries, use chained queries and combine lists. WITH [1, 2, 3] AS xs, [2] AS ys RETURN xs + ys AS append, [x IN xs WHERE NOT x IN ys] AS subtraction, [x IN xs WHERE x IN ys] AS intersection

WITH [1, 1, 2, 2, 3] AS xs RETURN reduce(acc = [], x in xs | acc + CASE x IN acc WHEN false THEN [x] ELSE [] END) AS unique

Get unique list in openCypher WITH [1, 1, 2, 2, 3] AS xs UNWIND xs AS x RETURN collect(DISTINCT x) AS unique

DELTA QUERIES IN CYPHER Delta queries are complex. Features that would be nice:  Subqueries //pattern comprehensions go some length  Named subqueries //help reusability  Subtracting lists //related: CIR-2017-180  Use collection elements or function results for matching MATCH (n) WITH collect(n) AS ns MATCH (ns[0]) RETURN *

MATCH (n) WITH collect(n) AS ns WITH ns[0] AS x MATCH (x)  RETURN *

These are probably too much to ask. -> recommended approach: compile directly to query plans.

CHALLENGES FOR PROPERTY GRAPH QUERIES Data model  NF2 (Non-First Normal Form): maps, lists  No schema (schema-optionality)  Graph structure Queries  Nulls, antijoins and left outerjoins  Updates on property values  Aggregates on aggregates, non-distributive functions  Ordering + skip/limit  Reachability queries

CHALLENGES FOR PROPERTY GRAPH QUERIES Data model  NF2  No schema  Graph Queries  Nulls  Updates  Aggregates  Ordering  Reachability

       

Decades of research -> 2 long surveys

A. Gupta, I. S. Mumick: Materialized Views. MIT Press, 1999 R. Chirkova, J. Yang: Materialized Views. Foundations and Trends in Databases, 2012

OUR SURVEY OF RELATED IVM TECHNIQUES

DBTOASTER    

Shows the scale of the problem Relational data model and SQL queries R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc. Approach o Queries over an algebraic ring o Higher-order recursive IVM

 Compiler in OCaml  Backend with code generation for C++, Scala/Spark Christoph Koch et al.: DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. VLDB Journal 2014

FUTURE DIRECTIONS    

Work out derivation rules for Expand/VarExpand, … Automate delta query derivation Integrate to Neo4j Run performance experiments o Train Benchmark (set semantics) o LDBC Social Network Benchmark’s BI workload (bag semantics)

Short news

LDBC BENCHMARKS  Social Network Benchmark o Business Intelligence workload published o openCypher reference implementation o Next goal: full conference paper

 Graphalytics o Competition is online at graphalytics.org o Neo4j implementation (using the Graph Algorithms library) WIP Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.: An early look at the LDBC Social Network Benchmark’s BI Workload. GRADES-NDA at SIGMOD, 2018 ldbc/ldbc_snb_implementations

graphalytics-platforms-neo4j/pull/6

GRAPH ANALYTICS ON THE PANAMA PAPERS  Network science approach: multidimensional graph metrics from social network analysis, biology, physics, etc.  Our work originally targeted software and system models.  Progress in 2018 o Q1: implemented adapters for Neo4j and CSV o Q2 goal: analyse Panama papers and using metrics Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró: Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics, MODELS 2016 ftsrg/model-analyzer

MAPPING CYPHER TO SQL  Evaluate graph queries in an RDB - similar to ORM  Approaches o Cytosm: Cypher to SQL Mapper / gTop: graph topology o GraphGen – extracting graphs from RDBs o Ongoing work to map TCK to SQLite B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne: Cytosm: Declarative Property Graph Queries Without Data Migration. GRADES 2017 cytosm/cytosm K. Xirogiannopoulos, V. Srinivas, A. Deshpande: GraphGen: Adaptive Graph Processing using Relational Databases. GRADES 2017 KonstantinosX/graphgen-project

NEO4J APOC LIBRARY  CSV loader that follows the schema of the neo4j-import tool  Goal o Use headers to generate LOAD CSV commands. o 1st pass: CALL apoc.import.csv.node(file, labels, …) o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)

 Result o Many corner cases -> ~700 LOC + tests o Covers most use cases, but is very slow o APOC PR pending neo4j-apoc-procedures/pull/581

neo4j-documentation/pull/121

SCOPING FOR OPENCYPHER p

c

p

x c

 Xtext grammar for the Slizaa software analysis workbench  Progress in 2018 o Q1: scope analyser implemented for Cypher grammar of M05 o Q2 goal: update to M10

slizaa-opencypher-xtext/issues/7