Mix-It 2015 - OrientDB.key

GeeCON 2015

OrientDB - the 2nd generation of (Multi-Model) NoSQL And why GraphDB are the starting point of this revolution Enrico Risa Lead Enterprise Engineer Orient Technologies LTD Twitter: @wolf4ood http://www.orientdb.com

Welcome to Big Data

“90% of the data 
 in the world today 
 has been created 
 in the last two years alone.” - IBM

Just Data Commodore Amiga 1200 (Product)

Frank (Customer)

Order #134

John

(Order)

Bruno

(Provider)

Monitor 40” (Product)

(Provider)

Mouse (Product)

Just Data Commodore Amiga 1200 (Product)

Data by itself has little value, it’s the relationship between data that gives it incredible value Frank

(Customer)

Order #134

John

(Order)

Bruno

(Provider)

Monitor 40” (Product)

(Provider)

Mouse (Product)

Relationships give data “meaning” Commodore Amiga 1200 (Product)

Frank (Customer)

(Makes)

(Has) (Sells)

Order #134 (Order)

(Provider)

(Has) (Has)

Bruno (Provider)

(Sells) Mouse (Product)

John

Monitor 40” (Product)

(Sells)

Top NoSQL categories

Key/Value Databases Document Databases Column Databases Graph Databases

Top NoSQL categories

Key/Value Databases

Question?
 Why is different?

Document Databases Column Databases

Graph Databases

Why do most NoSQL products avoid managing relationships?

Customer

CustomerAddress

Address

ID

Name

ID

Address

ID

Location

10

John

10

24

24

Milan

11

John

10

33

33

London

24

Mike

32

44

18

Paris

28

Mike

18

Madrid

44

Moscow

Is this familiar?

What’s wrong with JOIN?

Index Lookup: how does it work? A-Z A-L

Imagine an Address Book where we want to find Luke’s phone number

M-Z

Index Lookup: how does it work? A-Z A-L A-L A-D

E-L

M-Z M-Z M-R

S-Z

Index algorithms are all similar and based on balanced trees

Index Lookup: how does it work? A-Z A-L A-L A-D A-D A-B

C-D

M-Z M-Z

E-L

M-R E-L E-G

H-L

S-Z

Index Lookup: how does it work? A-Z A-L

M-Z

A-L A-D

M-Z

E-L

A-D A-B

M-R E-L

C-D

E-G E-G E-F

H-L H-L

G

H-J

K-L

S-Z

Index Lookup: how does it work? A-Z A-L

M-Z

A-L A-D

E-L

A-D A-B

E-L

C-D

E-G E-G E-F

Found! This lookup took 5 M-R S-Z steps. With millions of indexed records, the tree depth could be 1000’s of levels! M-Z

H-L H-L

G

H-J

K-L

Luke

Joins Kill Performance Customer ID

10

Address

10

33

Name ID Address every ID time Location Joins are executed Johnyou cross 10 24 24 Milan relationships

11

John

24

Mike

28

CustomerAddress

33

Querying32 million of records 44 18 joining 3-4 tables could Mike 18 generate billions of44 combinations

London Paris Madrid Moscow

This is why the database query performance suffers as the database increases in size O(Log N)

RDBMS performance on traversal E

TA A D

S BA

E Z I S

How many of you experience this problem?

PE

RF OR M

AN

CE

In a world that’s becoming more connected, we need a better way to store data and manage relationships Read: Data is important, but relationships are even more fundamental today

“A graph database is any storage system that provides index-free adjacency” - Marko Rodriguez (author of TinkerPop Blueprints)

Back to school: Graph Theory crash course

Basic Graph

Enrico

Visited

Krakow

Property Graph Model* Vertices are directed

Enrico company: OrientTechnologies

Visited

Krakow

on: 2015

people: 756,183

Vertices and Edges can have properties * https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

1-N and N-M Relationships Visited on: 2015

Krakow

Enrico

Worked on: 2015

An Edge connects only 2 vertices Use multiple edges to represent 1N and N-M relationships

Congrats! This is your diploma in «Graph Theory»

The Graph theory is so simple, yet so powerful

How does a true* Graph Database manage relationships? *a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB

Each element in the Graph has own immutable Record ID

#22:11

Visited on: 2015

#13:55

Enrico (Vertex)

(Edge)

#15:99

Krakow (Vertex)

#22:11

Visited :11

#13:55

22 t=#

ou

Enrico (Vertex)

Connections use persistent pointers

3:55

out = #1

in = #15:9

9

on: 2015

(Edge)

in =

#22:

#15:99 11

Krakow (Vertex)

#22:11

Visited :11

#13:55

Enrico (Vertex)

ou

22 t=#

3:55

out = #1

in = #15:9

9

on: 2015

(Edge)

in =

#22:

#15:99 11

Krakow (Vertex)

#22:11

Visited :11

#13:55

Enrico (Vertex)

ou

22 t=#

3:55

out = #1

in = #15:9

9

on: 2015

(Edge)

in =

#22:

#15:99 11

Krakow (Vertex)

A Graph Database creates the relationship just once (when the edge is created) VS RDBMS computes the relationship every time you query a database

When you move from a RDBMS to a Graph Database you jump from a O(log N) speed to a near O(1) With a Graph Database, the traversing time is not affected by database size! This is huge in the BigData age

Graph Databases Easily Manage Complex Relationships Lives in John Thriller

Pulp Fiction

Theater B NYC

s ke Li Comedy

Mr Bean

Theater A San Josè

No costs to traverse relationships: • Recommendation engines • Social Applications • Spatial Apps • Master Data Management • Information Clustering

Theater C

GraphDB Database Quadrant

Relationships Complexity >

Graph

Relational

Column Key Value

Data Complexity >

Document

GraphDB Database Quadrant

Graph

Relationships Complexity >

These were 1st generation NoSQL products, where each tool was only good at a few use cases Relational

Column Key Value

Data Complexity >

Document

1st Generation NoSQL: Scenario Redis or Memcache (Key/Value)

Application

Primary DB

Oracle (RDBMS)

ETL

MongoDB (DocDB)

Neo4j (GraphDB)

1st Generation NoSQL: Fact

In > 90% of use cases, NoSQL products are used as second DBMS How many of you uses Nosql?

And how many as primary?

1st Generation NoSQL: Problems - No standard between NoSQL products - Multiple vendors = multiple skills - ETL + synchronization code is costly to write and maintain - Performance and Reliability is hard to predict Oracle (RDBMS)

ETL

Redis or Memcache (Key/Value)

Application

MongoDB (DocDB)

Neo4j (GraphDB)

2nd Generation NoSQL is Multi-Model

What’s Multi-Model DBMS?

Key/Value

Graph

Document

Object Multi Model represents the intersection of multiple models in just one product

What’s Multi-Model DBMS?

-

Key/Value

Just one product to learn and maintain Just one vendor relationship to manage No ETL,Document no synchronization required Graph Performance and Reliability is easy to test from the beginning Object

Multi Model represents the intersection of multiple models in just one product

Relationships give data “meaning” Commodore Amiga 1200 (Product)

Jill (Customer)

(Makes)

(Has) (Sells)

Order #134 (Order)

(Provider)

(Has) (Has)

Bruno (Provider)

(Sells) 3 Wheel Mouse (Product)

Luca

Monitor 40” (Product)

(Sells)

Multi-Model domain schema Legenda: Actor

V

name: string surname: string

Vertex Edge Inherits

Customer

Provider

Makes Order

Sells

number: int date: datetime

price: decimal

Has

Product

price: decimal

name: string qty: int

Vertices and Edges are Documents Frank

M

}

es ak

{

”@rid": “12:382”, ”@class": ”Customer", “name”: “Frank”, “surname” : `“Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” }

Order

General purpose solution: • JSON • Schema-less • Schema-full • Schema-hybrid • Nested documents • Rich indexing and querying • Developer friendly

Polymorphic queries SELECT * FROM Customer

SELECT * FROM Provider

SELECT * FROM Actor

Frank (Customer)

Bruno

John

(Provider)

(Provider)

Bruno

Frank

John

(Provider)

(Customer)

(Provider)

Multi-Model complex domains schema Legenda: V

Vertex

Account

MusicTaste

Likes

Edge Inherits

Genre

Band

Performs

Location

Plays

Multi-Model complex domains (Likes)

Frank

Indie (Genre)

(Account)

(Plays)

(Likes) John (Account)

Snow Patrol (Band)

123, 1st Street Austin, TX (Location)

(Performs) April 7, 2015 9pm-11.30pm

(Likes)

(Likes) Rock (Genre)

Multi-Model Database Quadrant

Multi-Model

Relationships Complexity >

Graph

Relational

Column Key Value

Data Complexity >

Document

Multi-Model Solutions

There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine. The “Graph” is only a layer on top of the engine. Under the hood they do JOINs, which means traversal time is affected by database size.

Meet OrientDB

The First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs

With a true Graph, Document, Key/Value and Object Oriented engine

OrientDB features FEATURES Operational Database Graph Database Document Database Object-Oriented Concepts Schema-full, Schema-less, Schema mix User and Role & Record Level Security Record Level Locking SQL ACID Transaction Relationships (Linked Documents) Custom Data Types Embedded Documents Multi-Master Zero Configuration Replication Sharding Server Side Functions Native HTTP Rest/ JSON Embeddable with No Restrictions

ORIENTDB))

MONGODB

X X X X X X X X X X X X X X X X X

X

NEO4J

MYSQL) (RDBMS) X

X X

X X X X X X X X

X X X X X

X

DEMO

API & Standards • Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API • SQL + extensions for graphs • JDBC driver to connect any BI tool • HTTP/JSON support • Drivers in Java, Node.js, Python, PHP, .NET, Perl, C/C++ and more

Availability and Integrity C C

C

Master Node

C

C C

Multi-master Replication

C

Master Node

• Atomic, Consistent, Isolated and Durable (ACID)

multi-statement transactions

Scalability and Performance C

Master Node

C

C

C C

C

Master Node

C

AutoDiscovered Node

• Multi-Master Replication, Sharding and Auto-

Discovery to Simplify Ops • +200k Tps on Commodity Hardware

Some numbers

70+

Committers contributing to the product

1000s

Users from SMBs to Fortune 10 Companies.

50,000

Downloads per Month from 200+ countries.

17+

Years of Research have been put in the product

A Bright Future

Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3rd fastest growing category

Some of Our Customers

Get Started for Free OrientDB Community Edition is FREE for any purpose (Apache 2 license) OrientDB Enterprise is Free for Development Udemy Getting Started Training is ★★★★★ and Free http://www.orientdb.com/getting-started

Thank you! Enrico Risa @wolf4ood http://www.orientdb.com

Q/A