GeeCON 2015
OrientDB - the 2nd generation of (Multi-Model) NoSQL And why GraphDB are the starting point of this revolution Enrico Risa Lead Enterprise Engineer Orient Technologies LTD Twitter: @wolf4ood http://www.orientdb.com
Welcome to Big Data
“90% of the data
in the world today
has been created
in the last two years alone.” - IBM
Just Data Commodore Amiga 1200 (Product)
Frank (Customer)
Order #134
John
(Order)
Bruno
(Provider)
Monitor 40” (Product)
(Provider)
Mouse (Product)
Just Data Commodore Amiga 1200 (Product)
Data by itself has little value, it’s the relationship between data that gives it incredible value Frank
(Customer)
Order #134
John
(Order)
Bruno
(Provider)
Monitor 40” (Product)
(Provider)
Mouse (Product)
Relationships give data “meaning” Commodore Amiga 1200 (Product)
Frank (Customer)
(Makes)
(Has) (Sells)
Order #134 (Order)
(Provider)
(Has) (Has)
Bruno (Provider)
(Sells) Mouse (Product)
John
Monitor 40” (Product)
(Sells)
Top NoSQL categories
Key/Value Databases Document Databases Column Databases Graph Databases
Top NoSQL categories
Key/Value Databases
Question?
Why is different?
Document Databases Column Databases
Graph Databases
Why do most NoSQL products avoid managing relationships?
Customer
CustomerAddress
Address
ID
Name
ID
Address
ID
Location
10
John
10
24
24
Milan
11
John
10
33
33
London
24
Mike
32
44
18
Paris
28
Mike
18
Madrid
44
Moscow
Is this familiar?
What’s wrong with JOIN?
Index Lookup: how does it work? A-Z A-L
Imagine an Address Book where we want to find Luke’s phone number
M-Z
Index Lookup: how does it work? A-Z A-L A-L A-D
E-L
M-Z M-Z M-R
S-Z
Index algorithms are all similar and based on balanced trees
Index Lookup: how does it work? A-Z A-L A-L A-D A-D A-B
C-D
M-Z M-Z
E-L
M-R E-L E-G
H-L
S-Z
Index Lookup: how does it work? A-Z A-L
M-Z
A-L A-D
M-Z
E-L
A-D A-B
M-R E-L
C-D
E-G E-G E-F
H-L H-L
G
H-J
K-L
S-Z
Index Lookup: how does it work? A-Z A-L
M-Z
A-L A-D
E-L
A-D A-B
E-L
C-D
E-G E-G E-F
Found! This lookup took 5 M-R S-Z steps. With millions of indexed records, the tree depth could be 1000’s of levels! M-Z
H-L H-L
G
H-J
K-L
Luke
Joins Kill Performance Customer ID
10
Address
10
33
Name ID Address every ID time Location Joins are executed Johnyou cross 10 24 24 Milan relationships
11
John
24
Mike
28
CustomerAddress
33
Querying32 million of records 44 18 joining 3-4 tables could Mike 18 generate billions of44 combinations
London Paris Madrid Moscow
This is why the database query performance suffers as the database increases in size O(Log N)
RDBMS performance on traversal E
TA A D
S BA
E Z I S
How many of you experience this problem?
PE
RF OR M
AN
CE
In a world that’s becoming more connected, we need a better way to store data and manage relationships Read: Data is important, but relationships are even more fundamental today
“A graph database is any storage system that provides index-free adjacency” - Marko Rodriguez (author of TinkerPop Blueprints)
Back to school: Graph Theory crash course
Basic Graph
Enrico
Visited
Krakow
Property Graph Model* Vertices are directed
Enrico company: OrientTechnologies
Visited
Krakow
on: 2015
people: 756,183
Vertices and Edges can have properties * https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
1-N and N-M Relationships Visited on: 2015
Krakow
Enrico
Worked on: 2015
An Edge connects only 2 vertices Use multiple edges to represent 1N and N-M relationships
Congrats! This is your diploma in «Graph Theory»
The Graph theory is so simple, yet so powerful
How does a true* Graph Database manage relationships? *a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB
Each element in the Graph has own immutable Record ID
#22:11
Visited on: 2015
#13:55
Enrico (Vertex)
(Edge)
#15:99
Krakow (Vertex)
#22:11
Visited :11
#13:55
22 t=#
ou
Enrico (Vertex)
Connections use persistent pointers
3:55
out = #1
in = #15:9
9
on: 2015
(Edge)
in =
#22:
#15:99 11
Krakow (Vertex)
#22:11
Visited :11
#13:55
Enrico (Vertex)
ou
22 t=#
3:55
out = #1
in = #15:9
9
on: 2015
(Edge)
in =
#22:
#15:99 11
Krakow (Vertex)
#22:11
Visited :11
#13:55
Enrico (Vertex)
ou
22 t=#
3:55
out = #1
in = #15:9
9
on: 2015
(Edge)
in =
#22:
#15:99 11
Krakow (Vertex)
A Graph Database creates the relationship just once (when the edge is created) VS RDBMS computes the relationship every time you query a database
When you move from a RDBMS to a Graph Database you jump from a O(log N) speed to a near O(1) With a Graph Database, the traversing time is not affected by database size! This is huge in the BigData age
Graph Databases Easily Manage Complex Relationships Lives in John Thriller
Pulp Fiction
Theater B NYC
s ke Li Comedy
Mr Bean
Theater A San Josè
No costs to traverse relationships: • Recommendation engines • Social Applications • Spatial Apps • Master Data Management • Information Clustering
Theater C
GraphDB Database Quadrant
Relationships Complexity >
Graph
Relational
Column Key Value
Data Complexity >
Document
GraphDB Database Quadrant
Graph
Relationships Complexity >
These were 1st generation NoSQL products, where each tool was only good at a few use cases Relational
Column Key Value
Data Complexity >
Document
1st Generation NoSQL: Scenario Redis or Memcache (Key/Value)
Application
Primary DB
Oracle (RDBMS)
ETL
MongoDB (DocDB)
Neo4j (GraphDB)
1st Generation NoSQL: Fact
In > 90% of use cases, NoSQL products are used as second DBMS How many of you uses Nosql?
And how many as primary?
1st Generation NoSQL: Problems - No standard between NoSQL products - Multiple vendors = multiple skills - ETL + synchronization code is costly to write and maintain - Performance and Reliability is hard to predict Oracle (RDBMS)
ETL
Redis or Memcache (Key/Value)
Application
MongoDB (DocDB)
Neo4j (GraphDB)
2nd Generation NoSQL is Multi-Model
What’s Multi-Model DBMS?
Key/Value
Graph
Document
Object Multi Model represents the intersection of multiple models in just one product
What’s Multi-Model DBMS?
-
Key/Value
Just one product to learn and maintain Just one vendor relationship to manage No ETL,Document no synchronization required Graph Performance and Reliability is easy to test from the beginning Object
Multi Model represents the intersection of multiple models in just one product
Relationships give data “meaning” Commodore Amiga 1200 (Product)
Jill (Customer)
(Makes)
(Has) (Sells)
Order #134 (Order)
(Provider)
(Has) (Has)
Bruno (Provider)
(Sells) 3 Wheel Mouse (Product)
Luca
Monitor 40” (Product)
(Sells)
Multi-Model domain schema Legenda: Actor
V
name: string surname: string
Vertex Edge Inherits
Customer
Provider
Makes Order
Sells
number: int date: datetime
price: decimal
Has
Product
price: decimal
name: string qty: int
Vertices and Edges are Documents Frank
M
}
es ak
{
”@rid": “12:382”, ”@class": ”Customer", “name”: “Frank”, “surname” : `“Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” }
Order
General purpose solution: • JSON • Schema-less • Schema-full • Schema-hybrid • Nested documents • Rich indexing and querying • Developer friendly
Polymorphic queries SELECT * FROM Customer
SELECT * FROM Provider
SELECT * FROM Actor
Frank (Customer)
Bruno
John
(Provider)
(Provider)
Bruno
Frank
John
(Provider)
(Customer)
(Provider)
Multi-Model complex domains schema Legenda: V
Vertex
Account
MusicTaste
Likes
Edge Inherits
Genre
Band
Performs
Location
Plays
Multi-Model complex domains (Likes)
Frank
Indie (Genre)
(Account)
(Plays)
(Likes) John (Account)
Snow Patrol (Band)
123, 1st Street Austin, TX (Location)
(Performs) April 7, 2015 9pm-11.30pm
(Likes)
(Likes) Rock (Genre)
Multi-Model Database Quadrant
Multi-Model
Relationships Complexity >
Graph
Relational
Column Key Value
Data Complexity >
Document
Multi-Model Solutions
There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine. The “Graph” is only a layer on top of the engine. Under the hood they do JOINs, which means traversal time is affected by database size.
Meet OrientDB
The First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs
With a true Graph, Document, Key/Value and Object Oriented engine
OrientDB features FEATURES Operational Database Graph Database Document Database Object-Oriented Concepts Schema-full, Schema-less, Schema mix User and Role & Record Level Security Record Level Locking SQL ACID Transaction Relationships (Linked Documents) Custom Data Types Embedded Documents Multi-Master Zero Configuration Replication Sharding Server Side Functions Native HTTP Rest/ JSON Embeddable with No Restrictions
ORIENTDB))
MONGODB
X X X X X X X X X X X X X X X X X
X
NEO4J
MYSQL) (RDBMS) X
X X
X X X X X X X X
X X X X X
X
DEMO
API & Standards • Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API • SQL + extensions for graphs • JDBC driver to connect any BI tool • HTTP/JSON support • Drivers in Java, Node.js, Python, PHP, .NET, Perl, C/C++ and more
Availability and Integrity C C
C
Master Node
C
C C
Multi-master Replication
C
Master Node
• Atomic, Consistent, Isolated and Durable (ACID)
multi-statement transactions
Scalability and Performance C
Master Node
C
C
C C
C
Master Node
C
AutoDiscovered Node
• Multi-Master Replication, Sharding and Auto-
Discovery to Simplify Ops • +200k Tps on Commodity Hardware
Some numbers
70+
Committers contributing to the product
1000s
Users from SMBs to Fortune 10 Companies.
50,000
Downloads per Month from 200+ countries.
17+
Years of Research have been put in the product
A Bright Future
Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3rd fastest growing category
Some of Our Customers
Get Started for Free OrientDB Community Edition is FREE for any purpose (Apache 2 license) OrientDB Enterprise is Free for Development Udemy Getting Started Training is ★★★★★ and Free http://www.orientdb.com/getting-started
Thank you! Enrico Risa @wolf4ood http://www.orientdb.com
Q/A