Scaling Distributed Erlang Zandra – Erlang/OTP - Ericsson Public | 2015-03-26 | Page 1
Distributed Systems
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” › Leslie Lamport
Public | 2015-03-26 | Page 2
Overview Distributed Erlang Today Future Plans/Considerations Scaling Distributed Erlang
Public | 2015-03-26 | Page 3
Distributed Erlang Used for Elixir distribution too Multiple Interconnected Nodes Peer to Peer Fully Connected
Public | 2015-03-26 | Page 4
What is a Node? Erlang VM/beam -name/-sname OS process Many concurrent processes Scheduling Load Balancing Message Passing ...
Public | 2015-03-26 | Page 5
Message Passing Transparent (local/remote processes): send pid, message Not transparent (only local processes): send name, message send {name, node}, message
Public | 2015-03-26 | Page 6
Local Registration Process.register(pid, name) Process.unregister(name) Process.whereis(name) Process.send(name, message)
Public | 2015-03-26 | Page 7
:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 8
Will this scale?
Public | 2015-03-26 | Page 9
Will this scale? No.... Works up to 32-50 nodes We are working on it...
Public | 2015-03-26 | Page 10
Scalability plans Issue: Connections stay up Plan: Automatic disconnects
Public | 2015-03-26 | Page 11
Scalability plans Issue: Connections stay up Plan: Automatic disconnects Issue: Too many connections Plan: Avoid fully connected network
Public | 2015-03-26 | Page 12
Scalability plans Issue: Connections stay up Plan: Automatic disconnects Issue: Too many connections Plan: Avoid fully connected network Issue: Expensive when new nodes join Plan: Make node joins cheaper
Public | 2015-03-26 | Page 13
Scalability plans Issue: Connections stay up Plan: Automatic disconnects Issue: Too many connections Plan: Avoid fully connected network Issue: Expensive when new nodes join Plan: Make node joins cheaper Issue: :global chats a lot Plan: Make :global less chatty Public | 2015-03-26 | Page 14
:global issues Fully Connected :global sets it up... # file descriptors not unlimited Avoid it now: -connect_all false Plan: Change :global
Public | 2015-03-26 | Page 15
:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 16
Distributed Hash Table Distributed Key Value store Consistent Hashing Decentralized Scalable Fault Tolerant
Public | 2015-03-26 | Page 17
Our choice: Kademlia O(log(n)) lookup & store time Easy to maintain Config info spread during lookups
Public | 2015-03-26 | Page 18
Kademlia: Tree
1111 = 15
0000 = 0 14
Public | 2015-03-26 | Page 19
12
11
10
7
6
5
4
2
Kademlia: Tree Nodes: Leaves in a binary search tree Node ID = hash(node_name) Good distribution Used to locate values
Public | 2015-03-26 | Page 20
Kademlia: Subtrees
15
14
Public | 2015-03-26 | Page 21
12
11
10
7
6
5
4
2
0
Kademlia: Subtrees
15
14
Public | 2015-03-26 | Page 22
12
11
10
7
6
5
4
2
0
Subtrees Subtrees Common Prefixes Distance Guarantee: know at least 1 node per subtree Can know up to K per subtree
Public | 2015-03-26 | Page 23
Kademlia: Routing Tables
15
14
Public | 2015-03-26 | Page 24
12
11
10
7
6
5
4
2
0
Kademlia: Routing Tables
15
14
Public | 2015-03-26 | Page 25
12
11
10
7
6
5
4
2
0
Kademlia: Routing Table XOR based distance A xor B Node A: 0111 (= 7) Node B: 1011 (= 11) Distance: 1100 Subtree number: 1000
Public | 2015-03-26 | Page 26
Kademlia: Routing Tables
15
14
Public | 2015-03-26 | Page 27
12
11
10
7
6
5
4
2
0
Kademlia: Node Lookup K nodes closest to a key Storing, lookups and joins :global.register(name, pid) :global.whereis(name) :global.send(name, message)
Public | 2015-03-26 | Page 28
Whereis_name (Worst case, no replication)
15
14
12
11
10
7
:global.whereis_name(:this_name) hash(:this_name) = 5
Public | 2015-03-26 | Page 29
6
5
4
2
0
whereis_name
15
14
Public | 2015-03-26 | Page 30
12
11
10
7
6
5
4
2
0
whereis_name
15
12
14 1
Public | 2015-03-26 | Page 31
11
10
7
6
5
4
2
0
whereis_name
15
12
14 1
Public | 2015-03-26 | Page 32
11
10
7
6
5
4
2
0
whereis_name
15
12
14 1
2
Public | 2015-03-26 | Page 33
11
10
7
6
5
4
2
0
whereis_name
15
12
14 1
3 2
Public | 2015-03-26 | Page 34
11
10
7
6
5
4
2
0
Parallel Lookup
15
12
14 1 1
Public | 2015-03-26 | Page 35
11
10
7
6
5
4
2
0
Recursive Lookup
15
12
14
11
10
7
6
4 3
1 2
Public | 2015-03-26 | Page 36
5
2
0
recursive vs iterative recursive › most connections already up › faster iterative › updating routing tables › parallel
Public | 2015-03-26 | Page 37
:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 38
“Owner” Node unregister_name re_register_name Caching
Public | 2015-03-26 | Page 39
:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 40
New Node joins 1) Has to know one other node 2) Look up the own node ID 3) One node lookup per subtree Populates its own routing table Spreads information about itself
Public | 2015-03-26 | Page 41
Node 8 joins
1111 = 15
0000 = 0 14
12
11
10
1) Know one node: 11
Public | 2015-03-26 | Page 42
8
7
6
5
4
2
Node 8 joins
1111 = 15
0000 = 0 14
12
11
10
2) Lookup 8 (itself)
Public | 2015-03-26 | Page 43
8
7
6
5
4
2
New node joins
1111 = 15
0000 = 0 14
12
11
10
8
3) One lookup per subtree Here: 10, 13 and 3 Public | 2015-03-26 | Page 44
7
6
5
4
2
Automatic disconnects Avoid too many connections Inactive connections brought down Can, should and will be fixed
Public | 2015-03-26 | Page 45
Will this scale then?
Public | 2015-03-26 | Page 46
Will this scale then? Research: yes Initial measures: looks promising Left: More measurements Optimizations A lot more fun stuff!
Public | 2015-03-26 | Page 47
Other Considerations RPC improvements Remote spawn_monitor EPMD in Erlang Protocol improvements (TCP/SSL) Fragment large messages Preserve Sub-term Sharing Public | 2015-03-26 | Page 48
Feedback welcome! Bugs/feature Requests bugs.erlang.org erlang-questions mailing list erlang.org/community Me:
[email protected] Public | 2015-03-26 | Page 49
Thank you! Public | 2015-03-26 | Page 50