Property graphs with time

Report 2 Downloads 94 Views
Property graphs with time Julia Stoyanovich, joint work with Vera Moffitt Drexel University Philadelphia, PA USA stoyanovich.org

October 25, 2017

openCypher Meetup

2007

2008

2010 October 25, 2017

2009

2011 2

openCypher Meetup

https://www.kenedict.com/apples-internal-innovation-network-unraveled-part-1-evolving-networks/ October 25, 2017

3

openCypher Meetup

https://arxiv.org/abs/1709.06176 October 25, 2017

4

openCypher Meetup

Exploratory analysis of evolving graphs

• •

Which nodes are showing an increasing popularity trend?

• •

At what time scale can interesting trends be observed?

Have any changes in network connectivity been observed?

How can multiple data sources be used jointly to complement or corroborate information about network evolution?

October 25, 2017

5

openCypher Meetup

Goal

Principled and systematics support for usable, scalable and extensible analysis of evolving graphs

October 25, 2017

6

openCypher Meetup

Are Alice and Bill connected?

… by a path? TNGP

October 25, 2017

7

openCypher Meetup

Snapshot reducibility

October 25, 2017

8

openCypher Meetup

Are Alice and Bill connected?

… by a journey? … by a path that persists over >2 time instants extended snapshot reducibility October 25, 2017

9

openCypher Meetup

TGraph: an evolving property graph

October 25, 2017

10

openCypher Meetup

TGA: Temporal Graph Algebra •

Temporal variants of standard graph operators + novel timespecific operators



Compositional: TGraph (or a pair of TGraphs) as input TGraph as output



Operations maintain model integrity -

graph integrity at each time instant: no dangling edges, a node/edge appears at most once

-

temporal integrity: semantics of temporal operations are automatically enforced (formally: point semantics)

October 25, 2017

11

openCypher Meetup

TGA operations • •



trim temporal versions of -

vertex-map, edge-map

-

subgraph, path

-

aggregate messages

-

union, intersection, difference - binary

snapshot analytics -

PageRank, connected components,… - Pregel

October 25, 2017

12

openCypher Meetup

TGA operations



node creation

• • •

based on temporal window: temporal zoom attribute-based: structural zoom

edge creation

October 25, 2017

13

openCypher Meetup

Structural zoom add university nodes Drexel and CMU, and edges between students and these universities

October 25, 2017

14

openCypher Meetup

Structural zoom

October 25, 2017

15

openCypher Meetup

Temporal zoom coarsen taxi trip start-times into 10-min intervals

October 25, 2017

16

openCypher Meetup

System architecture Worker

Portal Interactive Shell Query Parser Portal Runtime

(optimizer, operators, etc)

System Catalog Spark Runtime

SparkSQL

Spark Runtime HDFS Worker Spark Runtime HDFS

Data Structures



GraphX

Spark 2.0, interoperable with SparkSQL and with BigDatalog October 25, 2017

17

openCypher Meetup

Physical data representation •



On-disk: Apache Parquet -

vertex / edge files

-

broken down into snapshot groups

-

each file sorted on start time followed by node /edge id

In-memory: -

nested relational (Vertex-Edge RDDs)

-

GraphX-based: RepresentativeGraphs (RG), One Graph (OG), HybridGraph (HG) BitSet(p1,p2,p3,p4)

BitSet(p2,p3,p4,p5) BitSet(p1,p2,p3,p4,p5)

1 BitSet(p2,p3)

October 25, 2017

3

2

18

BitSet(p5)

openCypher Meetup

Performance highlights

• • •

16-node Open Stack cluster Apache Spark 2.0 4 cores, 16GB / RAM per node

October 25, 2017

19

openCypher Meetup

PageRank on wiki-talk

October 25, 2017

20

openCypher Meetup

PageRank on nGrams

October 25, 2017

21

openCypher Meetup

PageRank on Twitter

October 25, 2017

22

openCypher Meetup

Aggregate messages on wiki-talk

October 25, 2017

23

openCypher Meetup

Vertex-subgraph on wiki-talk

October 25, 2017

24

openCypher Meetup

Portal vs. G*

average node degree, wiki-talk October 25, 2017

25

openCypher Meetup

Take-aways • •

TGraph: a logical model of property graphs with time



Portal: a library on top of Apache Spark, interoperable with SparkSQL



Ongoing work on a declarative language, multioperator query optimization, benchmarking



Planned open source release this Fall

TGA: a compositional temporal graph algebra under point semantics

October 25, 2017

26

openCypher Meetup

References •

Temporal Graph Algebra, Moffitt & Stoyanovich, DBPL 2017.



Zooming in on NYC taxi data with Portal, Stoyanovich, Gilbride and Moffitt, DSSG 2017 (arXiv).



Towards sequenced semantics for evolving graphs, Moffitt & Stoyanovich, EDBT 2017.



Towards a distributed infrastructure for evolving graph analytics, Moffitt & Stoyanovich, TempWeb 2016.



Vera Moffitt’s Ph.D. thesis.

October 25, 2017

27

openCypher Meetup

Thank you!

October 25, 2017

openCypher Meetup