4th openCypher Implementers Meeting
Incremental View Maintenance for openCypher Queries Gábor Szárnyas, József Marton
MODEL-DRIVEN ENGINEERING Primarily for designing critical systems Models are first class citizens during development o SysML / requirements, statecharts, etc. o Validation and code generation techniques for correctness
Technology: Eclipse Modeling Framework (EMF) Originally started at IBM as an implementation of the Object Management Group’s (OMG) Meta Object Facility (MOF). i.e. an object-oriented model i.e. a property graph-like structure with a metamodel
MODEL VALIDATION Implemented with model queries Models are typed, attributed graphs
Complex graph queries
Typical queries o Get two components connected by a particular edge MATCH (r:R)…(s:S) WHERE NOT (r)-[:E]->(s) o Check if two objects are reachable MATCH (r:R)…(s:S) WHERE NOT (r)-[:E1|E2*]->(s) o Property checks MATCH (r:R)-->(s:S) WHERE r.a = 'x' OR (s:Y)
RAILWAY NETWORK MODEL sensor B
segment
route
1
switch 2
sensor A
sensor C
RAILWAY NETWORK MODEL route 1
sensor B
segment segment
sensor A
switch
segment
sensor C
route 2
RAILWAY NETWORK MODEL route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route: Route
:FOLLOWS
swP: SwitchPosition
:REQUIRES sensor: Sensor
:TARGET :MONITORED_BY
sw: Switch
MATCH (route:Route) -[:FOLLOWS]->(swP:SwitchPosition) -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw G. Szárnyas, B. Izsó, I. Ráth, D. Varró: The Train Benchmark: cross-technology performance evaluation of continuous model queries. Software and Systems Modeling, 2017
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
route 1
sensor A
route 2
sensor C
sensor B segment
segment switchPosition «diverging»
switch
segment switchPosition «straight»
INCREMENTAL VIEW MAINTENANCE (IVM) In many use cases… queries are static data changes slowly -> views can be maintained incrementally Graph applications model validation simulation recommendation systems fraud detection
INGRAPH: IVM ON PROPERTY GRAPHS Idea: map to relational algebra and use standard IVM techniques Challenging aspects o Property graph data model o Cypher language
Formalise the language in relational algebra Use nested relational algebra -> closed on operations Prototype tool: ingraph (OCIM1, OCIM2, GraphConnect talks) Gábor Szárnyas, József Marton, Dániel Varró: Formalising openCypher Graph Queries in Relational Algebra. ADBIS 2017
INGRAPH / GRAPH TO NESTED RELATIONS
INGRAPH / NESTED RELATIONAL ALGEBRA OPS
INGRAPH ingraph uses a procedural IVM approach: the Rete algorithm. o Build caches for each operator o Maintain caches upon changes o Supports 15+ out of 25 LDBC BI queries o Details to be published in a conference paper o Extensible, but very heavy on memory
The rest of the talk focuses on the algebraic approach. Gábor Szárnyas, József Marton et al.: Incremental View Maintenance on Property Graphs. arXiv preprint will be available on the 1st week of June
Delta Queries for openCypher
DELTA QUERIES AT A GLANCE changes Δ𝐺1 , Δ𝐺2 , …
evaluate query 𝑄 for each Δ𝐺 evaluate Δ𝑄
𝑄(𝐺) 𝐺 Δ𝑄(Δ𝐺1 ) Δ𝑄(Δ𝐺2 ) ⇒ 𝑄(𝐺 + Δ𝐺1 + Δ𝐺2 + ⋯ ) 𝑄 and Δ𝑄 are calculated by the same engine.
IMPLEMENTATION: TRIGGERS IN NEO4J Event-driven programming in databases Neo4j: TransactionEventHandler interface o afterCommit(TransactionData data, T state) o beforeCommit(TransactionData data) o TransactionData contains Δ𝐺: createdNodes, deletedNodes, …
Only the updated state of the graph is accessible. GraphAware framework: ImprovedTransactionData API o Get properties and labels/types of deleted elements Michal Bachman: Neo4j Improved Transaction Event API. 2014
Max de Marzi: Triggers in Neo4j. 2015
DERIVING DELTA QUERIES Idea: given query 𝑄, derive delta queries Δ𝑄 and 𝛻𝑄, which define positive and negative changes, respectively. But: most IVM techniques are defined for relational algebra. a b a b Notation: 1 2 1 2 𝑅 relation 3 4 3 4 𝛻𝑅 𝑅 𝑅𝑚 5 6 5 6 Δ𝑅 positive changes 7 8 Δ𝑅 𝛻𝑅 negative changes 𝑅 𝑚 : maintained relation of 𝑅 ⇒ 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅 “−” denotes set minus (∖), “+” denotes set union (∪)
a
b
1
2
5
6
7
8
RELATIONAL ALGEBRA FOR CYPHER Query plans in Neo4j ≅ relational algebra + Expand/VarExpand. Expand is essentially a natural join.
Natural join Semijoin Antijoin Left outer join
𝑟⋈𝑠 𝑟 ⋉ 𝑠 = 𝜋𝑅 𝑟 ⋈ 𝑠 ഥ 𝑠 =𝑟∖ 𝑟⋉𝑠 𝑟⋉ ഥ 𝑠 //plus nulls 𝑟⟕𝑠 ≅ 𝑟 ⋈𝑠 ∪ 𝑟⋉
Andrés Taylor: Neo4j Cypher implementation. First openCypher Implementers Meeting, 2017
RELATIONAL ALGEBRA FOR CYPHER Natural join: 𝑟 ⋈ 𝑠 MATCH (v1)-[:r]->(v2)-[:s]->(v3) RETURN * Semijoin: 𝑟 ⋉ 𝑠 MATCH (v1)-[:r]->(v2) WHERE (v2)-[:s]->() ഥ𝑠 Antijoin: 𝑟 ⋉ MATCH (v1)-[:r]->(v2) WHERE NOT (v2)-[:s]->() Left outer join: 𝑟 ⟕ 𝑠 MATCH (v1)-[:r]->(v2) OPTIONAL MATCH (v2)-[:s]->(v3)
1 4 1 4 1 4 1 4
:r :r :r :r :r :r :r :r
2
:s :s
5 2
:s :s
5 2
5
v2
v3
1
2
3
6
1
2
6
3
v1
v2
1
2
v1
v2
4
5
v1
v2
v3
1
2
3
1
2
6
4
5
null
6 :s :s
5 2
v1
3
3 6
:s :s
3 6
DERIVING DELTA QUERIES X. Qian, G. Wiederhold: Incremental Recomputation of Active Relational Expressions. TKDE 1991 T. Griffin, L. Libkin, H. Trickey: An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions. TKDE 1997 T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995 T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998
DELTA QUERIES The seminal paper Δ/𝛻 delta queries for joins, selections, projections, etc. Bag semantics
T. Griffin, L. Libkin: Incremental Maintenance of Views with Duplicates. SIGMOD 1995
DELTA QUERIES Semijoins, antijoins, outer joins Set semantics Later publications, e.g. Zhou-Larson’s ICDE’07 paper improved these T. Griffin, B. Kumar: Algebraic Change Propagation for Semijoin and Outerjoin Queries. SIGMOD Record 1998
EXAMPLE QUERY #1
v1
:a
v2
:b
v3
:c
v4
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) 𝑎 𝑣1 , 𝑣2 RETURN v1, v2, v3, v4 𝑏 𝑣2 , 𝑣3 𝑐 𝑣3 , 𝑣4 Relational algebra expression: 𝑎 ⋈ 𝑏 ⋈ 𝑐 Δ 𝑎⋈𝑏⋈𝑐 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 𝑚 = 𝑎 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 = 𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 Similarly to 𝛻 𝑎 ⋈ 𝑏 ⋈ 𝑐 .
EXAMPLE QUERY #1
v1
:a
v2
:b
v3
:c
v4
MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) 𝑎 𝑣1 , 𝑣2 RETURN v1, v2, v3, v4 𝑏 𝑣2 , 𝑣3 𝑐 𝑣 , 𝑣 3 4 Δ 𝑎⋈𝑏⋈𝑐 = 𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 $pcs -> pass lists of nodes/edges as parameters // This only works in embedded mode, see neo4j/issues/10239
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 // r1 = a⋈b⋈Δc UNWIND $pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r2 = a⋈Δb⋈c UNWIND $pbs AS pb MATCH (v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 UNION ALL // r3 = Δa⋈b⋈c UNWIND $pas AS pa MATCH (v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4
POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 Long WITH chains are cumbersome -> patterns+list comprehensions.
WITH [pc IN $pcs | // r1 = a⋈b⋈Δc [(v1)-[:a]->(v2)-[:b]->(v3)-[pc]->(v4) | [v1, v2, v3, v4]]] [pb IN $pbs | // r2 = a⋈Δb⋈c [(v1)-[:a]->(v2)-[pb]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] + [pa IN $pas | // r3 = Δa⋈b⋈c [(v1)-[pa]->(v2)-[:b]->(v3)-[:c]->(v4) | [v1, v2, v3, v4]]] AS r RETURN r[0] AS v1, r[1] AS v2, r[2] AS v3, r[3] AS v4
EXAMPLE QUERY #2
route: Route
:FOLLOWS
:REQUIRES MATCH (route:Route) :MONITORED_BY -[:FOLLOWS]->(swP:SwitchPosition) sensor: Sensor -[:TARGET]->(sw:Switch) -[:MONITORED_BY]->(sensor:Sensor) WHERE NOT (route)-[:REQUIRES]->(sensor) RETURN route, sensor, swP, sw
MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4
v1
:a
swP: SwitchPosition :TARGET sw: Switch
v2 :b
:d v4
:c
v3
NEGATIVE CONDITIONS v1
:a
v2 :b
:d v4
:c
v3
MATCH (v1) -[:a]->(v2) -[:b]->(v3) -[:c]->(v4) WHERE NOT (v1)-[:d]->(v4) RETURN v1, v2, v3, v4
ഥ𝑑 ⇒𝑎⋈𝑏⋈𝑐⋉
𝑎 𝑏 𝑐 𝑑
𝑣1 , 𝑣2 𝑣2 , 𝑣3 𝑣3 , 𝑣4 𝑣1 , 𝑣4
DELTA QUERIES FOR JOINS AND ANTIJOINS Natural join Δ 𝑆 ⋈ 𝑇 = Δ𝑆 ⋈ 𝑇 𝑚 + 𝑆 𝑚 ⋈ Δ𝑇 𝛻 𝑆 ⋈ 𝑇 = 𝛻𝑆 ⋈ 𝑇 + 𝑆 ⋈ 𝛻𝑇 Antijoin ഥ𝑇 = Δ 𝑆⋉
ഥ 𝑇𝑚 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ⋉
ഥ𝑇 = 𝛻 𝑆⋉
ഥ𝑇 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ⋉
Expression 1 Expression 2 Only 𝑆 𝑚 and 𝑇 𝑚 are available.
ഥ 𝑇𝑚 + Δ𝑆 ⋉ ഥ𝑇 + 𝛻𝑆 ⋉
SUBEXPRESSIONS ഥ 𝑇 =? 1. Δ𝑇 ⋉ ഥ 𝑅2 , where 𝑅1 and 𝑅2 both have schema 𝑅. R1 ⋉ ഥ 𝜃 𝑅2 = 𝑅1 − 𝜋𝑅 𝑅1 ⋈𝜃 𝑅2 = 𝑅1 − 𝑅1 ⋈𝜃 𝑅2 𝑅1 ⋉ If 𝜃 defines equality on all attributes of 𝑅, the theta join (⋈𝜃 ) becomes a natural join, which is an intersection for relations with the same schema. 𝑅1 ⋈𝜃 𝑅2 = 𝑅1 ⋈ 𝑅2 = 𝑅1 ∩ 𝑅2 𝑅1 − 𝑅1 ⋈𝜃 𝑅2 = 𝑅1 − 𝑅1 ∩ 𝑅2 = 𝑅1 − 𝑅2 ഥ 𝜃 𝑅2 = 𝑅1 − 𝑅2 ⇒ ∗ 𝑅1 ⋉ 2. 𝑅 𝑚 = 𝑅 − 𝛻𝑅 + Δ𝑅 ⇒ ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
ഥ 𝜃 𝑅2 = 𝑅1 − 𝑅2 ∗ 𝑅1 ⋉ ∗∗ 𝑅 = 𝑅 𝑚 − Δ𝑅 + 𝛻𝑅
DELTAS FOR ANTIJOINS Based on Griffin-Kumar’s ’98 paper. ഥ𝑇 = Δ 𝑆⋉
ഥ 𝑇𝑚 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 ⋉
ഥ 𝑇𝑚 + Δ𝑆 ⋉
∗
ഥ 𝑇𝑚 𝑆 − 𝛻𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ⋉
∗∗
ഥ𝑇 = Δ 𝑆⋉
ഥ 𝑇𝑚 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ⋉
ഥ𝑇 = Δ 𝑆⋉
ഥ 𝑇𝑚 𝑆 𝑚 − Δ𝑆 ⋉ 𝛻𝑇 + Δ𝑆 ⋉
ഥ𝑇 = 𝛻 𝑆⋉
ഥ𝑇 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 ⋉
ഥ𝑇 = Δ 𝑆⋉
ഥ𝑇 + 𝛻𝑆 ⋉
∗
ഥ𝑇 𝑆 − 𝛻𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ⋉
∗∗
ഥ𝑇 = 𝛻 𝑆⋉
ഥ 𝑇 𝑚 − 𝛻𝑇 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ⋉
ഥ𝑇 = 𝛻 𝑆⋉
ഥ 𝑇 𝑚 − 𝛻𝑇 𝑆 𝑚 − Δ𝑆 ⋉ Δ𝑇 + 𝛻𝑆 ⋉
ഥ𝑇 = 𝛻 𝑆⋉
NEGATIVE CONDITIONS
v1
:a
:b
:d v4
ഥ𝑑 = Δ 𝑎⋈𝑏⋈𝑐 ⋉ 𝑎⋈𝑏⋈𝑐
𝑚
𝑎⋈𝑏⋈𝑐
⋉ 𝛻𝑑
𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑 Pushdown 𝛻𝑑: 𝑎𝑚 ⋉ 𝛻𝑑 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚 ⋉ 𝛻𝑑
−
𝑚
v2
−Δ 𝑎 ⋈𝑏 ⋈𝑐
:c
v3
𝑎 𝑏 𝑐 𝑑
𝑣1 , 𝑣2 𝑣2 , 𝑣3 𝑣3 , 𝑣4 𝑣1 , 𝑣4
ഥ 𝑑𝑚 ⋉ 𝛻𝑑 + Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉
Δ 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ 𝛻𝑑 𝑎𝑚 ⋈ 𝑏 𝑚 ⋈ Δ𝑐 + 𝑎𝑚 ⋈ Δ𝑏 ⋈ 𝑐 𝑚 + Δ𝑎 ⋈ 𝑏 𝑚 ⋈ 𝑐 𝑚
NEGATIVE CONDITIONS Δ
⋈
⋈
⋉ ⋈ ⋈
⋉ ⋈ ⋈Δ ⋈Δ
⋉
⋈
v1
:a
v2 :b
:d
⋈
⋉ ⋈Δ ⋈ ⋈Δ ⋈ ⋈
⋉ ⋉ ⋈ ⋈ Δ ⋉ ⋉ ⋈Δ ⋈ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ⋈ ⋈Δ ⋉ ⋈Δ ⋈ ⋉ Δ ⋈ ⋈ ⋉
v4
Δ ⋈ Δ ⋈
⋈ ⋈
:c
v3
⋉ ⋉
, , , ,
NEGATIVE CONDITIONS Δ
⋈
⋈
⋉ ⋈ ⋈
R1 R2 R3 R4 R5 R6 R7
⋉ ⋈ ⋈Δ ⋈Δ
⋉
⋈
v1
:a
v2 :b
:d
⋈
⋉ ⋈Δ ⋈ ⋈Δ ⋈ ⋈
v4
Δ ⋈ Δ ⋈
⋈ ⋈
:c
v3
, , , ,
⋉ ⋉
⋉ ⋉ S1 ∈ . ⋉ ⋈ ⋈ Δ ⋉ Δ S2 Δ ⋉ ∈ . ⋉ S3 ⋉ ⋈Δ ⋈ ⋉ ∈ . Δ S4 Δ ⋉ Δ ⋉ ⋈ ⋈ ⋉ ∈ . ⋈ ⋈Δ ⋉ ⋈Δ ⋈ ⋉ ⋉ , where ∩ is a ∈ . Δ ⋈ ⋈ ⋉ single vertex, because and represent edges.
ഥ𝑑 POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ WITH [] AS pes, [] AS nes WITH [pe [pe [pe [pe [ne [ne [ne [ne
IN IN IN IN IN IN IN IN
pes pes pes pes nes nes nes nes
WHERE WHERE WHERE WHERE WHERE WHERE WHERE WHERE
type(pe) type(pe) type(pe) type(pe) type(ne) type(ne) type(ne) type(ne)
nd_v1s, nd_v4s, s1s, s2s,note collect({v3: v3, v4: v4}) AS s3s = = = = = = = =
'a'|pe] 'b'|pe] 'c'|pe] 'd'|pe] 'a'|ne] 'b'|ne] 'c'|ne] 'd'|ne]
AS AS AS AS AS AS AS AS
WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, [nd IN nds | startNode(nd)] AS nd_v1s, [nd IN nds | endNode(nd)] AS nd_v4s // calculating s1s...s4s // s1s: (𝑎⋉∇𝑑) UNWIND nd_v1s AS v1 MATCH (v1)-[:a]->(v2) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, collect({v1: v1, v2: v2}) AS s1s // s2s: (Δ𝑎⋉∇𝑑) UNWIND pas AS pa MATCH (v1)-[pa]->(v2) WHERE v1 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, collect({v1: v1, v2: v2}) AS s2s // s3s: (𝑐⋉∇𝑑) UNWIND nd_v4s AS v4 MATCH (v3)-[:c]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds,
pas, pbs, pcs, pds, nas, nbs, ncs, nds
// s4s: (Δ𝑐⋉∇𝑑) UNWIND pcs AS pc MATCH (v3)-[pc]->(v4) WHERE v4 IN nd_v1s WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, collect({v3: v3, v4: v4}) AS s4s
WHERE (v2)-[:b]->(v3) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, collect([v1, v2, v3, v4]) AS r2
UNWIND pcs AS pc MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:pc]->(v4) WHERE NOT (v1)-[:d]->(v4) WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r4, collect([v1, v2, v3, v4]) AS r5
// r3: -(𝑎⋉∇𝑑) ⋈ Δ𝑏 ⋈ (𝑐⋉∇𝑑) UNWIND s1s AS s1 UNWIND s3s AS s3 // r6: 𝑎 ⋈ Δ𝑏 ⋈ 𝑐 ̅ ⋉𝑑 WITH UNWIND pbs AS pb pas, nas, pbs, nbs, pcs, ncs, pds, nds, MATCH (v1)-[:a]->(v2)-[:pb]->(v3)-[:c]->(v4) nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, WHERE NOT (v1)-[:d]->(v4) r1, r2, WITH // calculating r1...r7 s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 pas, nas, pbs, nbs, pcs, ncs, pds, nds, WITH AS v4 nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, pas, nas, pbs, nbs, pcs, ncs, pds, nds, MATCH (v2)-[b:b]->(v3) r1, r2, r3, r4, r5, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s WHERE b IN pbs collect([v1, v2, v3, v4]) AS r6 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, // r1: (𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) // r7: Δ𝑎 ⋈ 𝑏 ⋈ 𝑐 ̅ ⋉𝑑 UNWIND s1s AS s1 nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, UNWIND pas AS pa UNWIND s3s AS s3 r1, r2, MATCH (v1)-[:pa]->(v2)-[:b]->(v3)-[:c]->(v4) WITH collect([v1, v2, v3, v4]) AS r3 WHERE NOT (v1)-[:d]->(v4) pas, nas, pbs, nbs, pcs, ncs, pds, nds, WITH nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, pas, nas, pbs, nbs, pcs, ncs, pds, nds, // r4: -(Δ𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (𝑐⋉∇𝑑) s1.v1 AS v1, s1.v2 AS v2, s3.v3 AS v3, s3.v4 UNWIND s2s AS s2 nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, AS v4 UNWIND s3s AS s3 r1, r2, r3, r4, r5, r6, WHERE (v2)-[:b]->(v3) WITH collect([v1, v2, v3, v4]) AS r7 WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, WITH nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r1 + r5 + r6 + r7 AS rp, collect([v1, v2, v3, v4]) AS r1 s2.v1 AS v1, s2.v2 AS v2, s3.v3 AS v3, s3.v4 r2 + r3 + r4 AS rn AS v4 MATCH (v2)-[b:b]->(v3) RETURN // r2: -(𝑎⋉∇𝑑) ⋈ 𝑏 ⋈ (Δ𝑐⋉∇𝑑) UNWIND s1s AS s1 WHERE b IN pbs [r IN rp WHERE NOT r IN rn] AS results UNWIND s4s AS s4 WITH WITH pas, nas, pbs, nbs, pcs, ncs, pds, nds, pas, nas, pbs, nbs, pcs, ncs, pds, nds, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, nd_v1s, nd_v4s, s1s, s2s, s3s, s4s, r1, r2, r3, r1, collect([v1, v2, v3, v4]) AS r4 s1.v1 AS v1, s1.v2 AS v2, s4.v3 AS v3, s4.v4 AS v4 // r5: 𝑎 ⋈ 𝑏 ⋈ Δ𝑐 ̅ ⋉ 𝑑
ഥ𝑑 POSITIVE DELTA QUERY FOR 𝑎 ⋈ 𝑏 ⋈ 𝑐 ⋉ Workaround: knowing the change workload helps o Only consider changes in Δ𝑑 and 𝛻𝑑 o Query is cleaner and much more efficient
UNWIND $nds AS nd WITH startNode(nd) AS v1, endNode(nd) AS v4 MATCH (v1)-[:a]->(v2)-[:b]->(v3)-[:c]->(v4) RETURN v1, v2, v3, v4 This can outperform recomputing the query from scratch.
LIST OPERATIONS IN CYPHER Instead of subqueries, use chained queries and combine lists. WITH [1, 2, 3] AS xs, [2] AS ys RETURN xs + ys AS append, [x IN xs WHERE NOT x IN ys] AS subtraction, [x IN xs WHERE x IN ys] AS intersection
WITH [1, 1, 2, 2, 3] AS xs RETURN reduce(acc = [], x in xs | acc + CASE x IN acc WHEN false THEN [x] ELSE [] END) AS unique
Get unique list in openCypher WITH [1, 1, 2, 2, 3] AS xs UNWIND xs AS x RETURN collect(DISTINCT x) AS unique
DELTA QUERIES IN CYPHER Delta queries are complex. Features that would be nice: Subqueries //pattern comprehensions go some length Named subqueries //help reusability Subtracting lists //related: CIR-2017-180 Use collection elements or function results for matching MATCH (n) WITH collect(n) AS ns MATCH (ns[0]) RETURN *
MATCH (n) WITH collect(n) AS ns WITH ns[0] AS x MATCH (x) RETURN *
These are probably too much to ask. -> recommended approach: compile directly to query plans.
CHALLENGES FOR PROPERTY GRAPH QUERIES Data model NF2 (Non-First Normal Form): maps, lists No schema (schema-optionality) Graph structure Queries Nulls, antijoins and left outerjoins Updates on property values Aggregates on aggregates, non-distributive functions Ordering + skip/limit Reachability queries
CHALLENGES FOR PROPERTY GRAPH QUERIES Data model NF2 No schema Graph Queries Nulls Updates Aggregates Ordering Reachability
Decades of research -> 2 long surveys
A. Gupta, I. S. Mumick: Materialized Views. MIT Press, 1999 R. Chirkova, J. Yang: Materialized Views. Foundations and Trends in Databases, 2012
OUR SURVEY OF RELATED IVM TECHNIQUES
DBTOASTER
Shows the scale of the problem Relational data model and SQL queries R&D for ~5 years @ EPFL, Johns Hopkins, Cornell, etc. Approach o Queries over an algebraic ring o Higher-order recursive IVM
Compiler in OCaml Backend with code generation for C++, Scala/Spark Christoph Koch et al.: DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views. VLDB Journal 2014
FUTURE DIRECTIONS
Work out derivation rules for Expand/VarExpand, … Automate delta query derivation Integrate to Neo4j Run performance experiments o Train Benchmark (set semantics) o LDBC Social Network Benchmark’s BI workload (bag semantics)
Short news
LDBC BENCHMARKS Social Network Benchmark o Business Intelligence workload published o openCypher reference implementation o Next goal: full conference paper
Graphalytics o Competition is online at graphalytics.org o Neo4j implementation (using the Graph Algorithms library) WIP Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton et al.: An early look at the LDBC Social Network Benchmark’s BI Workload. GRADES-NDA at SIGMOD, 2018 ldbc/ldbc_snb_implementations
graphalytics-platforms-neo4j/pull/6
GRAPH ANALYTICS ON THE PANAMA PAPERS Network science approach: multidimensional graph metrics from social network analysis, biology, physics, etc. Our work originally targeted software and system models. Progress in 2018 o Q1: implemented adapters for Neo4j and CSV o Q2 goal: analyse Panama papers and using metrics Gábor Szárnyas, Zsolt Kővári, Ágnes Salánki, Dániel Varró: Towards the Characterization of Realistic Models: Evaluation of Multidisciplinary Graph Metrics, MODELS 2016 ftsrg/model-analyzer
MAPPING CYPHER TO SQL Evaluate graph queries in an RDB - similar to ORM Approaches o Cytosm: Cypher to SQL Mapper / gTop: graph topology o GraphGen – extracting graphs from RDBs o Ongoing work to map TCK to SQLite B. A. Steer, A. Alnaimi, M. Lotz, F. Cuadrado, L. Vaquero, J. Varvenne: Cytosm: Declarative Property Graph Queries Without Data Migration. GRADES 2017 cytosm/cytosm K. Xirogiannopoulos, V. Srinivas, A. Deshpande: GraphGen: Adaptive Graph Processing using Relational Databases. GRADES 2017 KonstantinosX/graphgen-project
NEO4J APOC LIBRARY CSV loader that follows the schema of the neo4j-import tool Goal o Use headers to generate LOAD CSV commands. o 1st pass: CALL apoc.import.csv.node(file, labels, …) o 2nd pass: CALL apoc.import.csv.relationship(file, type, …)
Result o Many corner cases -> ~700 LOC + tests o Covers most use cases, but is very slow o APOC PR pending neo4j-apoc-procedures/pull/581
neo4j-documentation/pull/121
SCOPING FOR OPENCYPHER p
c
p
x c
Xtext grammar for the Slizaa software analysis workbench Progress in 2018 o Q1: scope analyser implemented for Cypher grammar of M05 o Q2 goal: update to M10
slizaa-opencypher-xtext/issues/7