Property graphs with time Julia Stoyanovich, joint work with Vera Moffitt Drexel University Philadelphia, PA USA stoyanovich.org
October 25, 2017
openCypher Meetup
2007
2008
2010 October 25, 2017
2009
2011 2
openCypher Meetup
https://www.kenedict.com/apples-internal-innovation-network-unraveled-part-1-evolving-networks/ October 25, 2017
3
openCypher Meetup
https://arxiv.org/abs/1709.06176 October 25, 2017
4
openCypher Meetup
Exploratory analysis of evolving graphs
• •
Which nodes are showing an increasing popularity trend?
• •
At what time scale can interesting trends be observed?
Have any changes in network connectivity been observed?
How can multiple data sources be used jointly to complement or corroborate information about network evolution?
October 25, 2017
5
openCypher Meetup
Goal
Principled and systematics support for usable, scalable and extensible analysis of evolving graphs
October 25, 2017
6
openCypher Meetup
Are Alice and Bill connected?
… by a path? TNGP
October 25, 2017
7
openCypher Meetup
Snapshot reducibility
October 25, 2017
8
openCypher Meetup
Are Alice and Bill connected?
… by a journey? … by a path that persists over >2 time instants extended snapshot reducibility October 25, 2017
9
openCypher Meetup
TGraph: an evolving property graph
October 25, 2017
10
openCypher Meetup
TGA: Temporal Graph Algebra •
Temporal variants of standard graph operators + novel timespecific operators
•
Compositional: TGraph (or a pair of TGraphs) as input TGraph as output
•
Operations maintain model integrity -
graph integrity at each time instant: no dangling edges, a node/edge appears at most once
-
temporal integrity: semantics of temporal operations are automatically enforced (formally: point semantics)
October 25, 2017
11
openCypher Meetup
TGA operations • •
•
trim temporal versions of -
vertex-map, edge-map
-
subgraph, path
-
aggregate messages
-
union, intersection, difference - binary
snapshot analytics -
PageRank, connected components,… - Pregel
October 25, 2017
12
openCypher Meetup
TGA operations
•
node creation
• • •
based on temporal window: temporal zoom attribute-based: structural zoom
edge creation
October 25, 2017
13
openCypher Meetup
Structural zoom add university nodes Drexel and CMU, and edges between students and these universities
October 25, 2017
14
openCypher Meetup
Structural zoom
October 25, 2017
15
openCypher Meetup
Temporal zoom coarsen taxi trip start-times into 10-min intervals
October 25, 2017
16
openCypher Meetup
System architecture Worker
Portal Interactive Shell Query Parser Portal Runtime
(optimizer, operators, etc)
System Catalog Spark Runtime
SparkSQL
Spark Runtime HDFS Worker Spark Runtime HDFS
Data Structures
…
GraphX
Spark 2.0, interoperable with SparkSQL and with BigDatalog October 25, 2017
17
openCypher Meetup
Physical data representation •
•
On-disk: Apache Parquet -
vertex / edge files
-
broken down into snapshot groups
-
each file sorted on start time followed by node /edge id
In-memory: -
nested relational (Vertex-Edge RDDs)
-
GraphX-based: RepresentativeGraphs (RG), One Graph (OG), HybridGraph (HG) BitSet(p1,p2,p3,p4)
BitSet(p2,p3,p4,p5) BitSet(p1,p2,p3,p4,p5)
1 BitSet(p2,p3)
October 25, 2017
3
2
18
BitSet(p5)
openCypher Meetup
Performance highlights
• • •
16-node Open Stack cluster Apache Spark 2.0 4 cores, 16GB / RAM per node
October 25, 2017
19
openCypher Meetup
PageRank on wiki-talk
October 25, 2017
20
openCypher Meetup
PageRank on nGrams
October 25, 2017
21
openCypher Meetup
PageRank on Twitter
October 25, 2017
22
openCypher Meetup
Aggregate messages on wiki-talk
October 25, 2017
23
openCypher Meetup
Vertex-subgraph on wiki-talk
October 25, 2017
24
openCypher Meetup
Portal vs. G*
average node degree, wiki-talk October 25, 2017
25
openCypher Meetup
Take-aways • •
TGraph: a logical model of property graphs with time
•
Portal: a library on top of Apache Spark, interoperable with SparkSQL
•
Ongoing work on a declarative language, multioperator query optimization, benchmarking
•
Planned open source release this Fall
TGA: a compositional temporal graph algebra under point semantics
October 25, 2017
26
openCypher Meetup
References •
Temporal Graph Algebra, Moffitt & Stoyanovich, DBPL 2017.
•
Zooming in on NYC taxi data with Portal, Stoyanovich, Gilbride and Moffitt, DSSG 2017 (arXiv).
•
Towards sequenced semantics for evolving graphs, Moffitt & Stoyanovich, EDBT 2017.
•
Towards a distributed infrastructure for evolving graph analytics, Moffitt & Stoyanovich, TempWeb 2016.
•
Vera Moffitt’s Ph.D. thesis.
October 25, 2017
27
openCypher Meetup
Thank you!
October 25, 2017
openCypher Meetup