Scaling Distributed Erlang

Report 2 Downloads 32 Views
Scaling Distributed Erlang Zandra – Erlang/OTP - Ericsson Public | 2015-03-26 | Page 1

Distributed Systems

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” › Leslie Lamport

Public | 2015-03-26 | Page 2

Overview Distributed Erlang Today Future Plans/Considerations Scaling Distributed Erlang

Public | 2015-03-26 | Page 3

Distributed Erlang Used for Elixir distribution too Multiple Interconnected Nodes Peer to Peer Fully Connected

Public | 2015-03-26 | Page 4

What is a Node? Erlang VM/beam -name/-sname OS process Many concurrent processes Scheduling Load Balancing Message Passing ...

Public | 2015-03-26 | Page 5

Message Passing Transparent (local/remote processes): send pid, message Not transparent (only local processes): send name, message send {name, node}, message

Public | 2015-03-26 | Page 6

Local Registration Process.register(pid, name) Process.unregister(name) Process.whereis(name) Process.send(name, message)

Public | 2015-03-26 | Page 7

:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 8

Will this scale?

Public | 2015-03-26 | Page 9

Will this scale? No.... Works up to 32-50 nodes We are working on it...

Public | 2015-03-26 | Page 10

Scalability plans Issue: Connections stay up Plan: Automatic disconnects

Public | 2015-03-26 | Page 11

Scalability plans Issue: Connections stay up Plan: Automatic disconnects Issue: Too many connections Plan: Avoid fully connected network

Public | 2015-03-26 | Page 12

Scalability plans Issue: Connections stay up Plan: Automatic disconnects Issue: Too many connections Plan: Avoid fully connected network Issue: Expensive when new nodes join Plan: Make node joins cheaper

Public | 2015-03-26 | Page 13

Scalability plans Issue: Connections stay up Plan: Automatic disconnects Issue: Too many connections Plan: Avoid fully connected network Issue: Expensive when new nodes join Plan: Make node joins cheaper Issue: :global chats a lot Plan: Make :global less chatty Public | 2015-03-26 | Page 14

:global issues Fully Connected :global sets it up... # file descriptors not unlimited Avoid it now: -connect_all false Plan: Change :global

Public | 2015-03-26 | Page 15

:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 16

Distributed Hash Table Distributed Key Value store Consistent Hashing Decentralized Scalable Fault Tolerant

Public | 2015-03-26 | Page 17

Our choice: Kademlia O(log(n)) lookup & store time Easy to maintain Config info spread during lookups

Public | 2015-03-26 | Page 18

Kademlia: Tree

1111 = 15

0000 = 0 14

Public | 2015-03-26 | Page 19

12

11

10

7

6

5

4

2

Kademlia: Tree Nodes: Leaves in a binary search tree Node ID = hash(node_name) Good distribution Used to locate values

Public | 2015-03-26 | Page 20

Kademlia: Subtrees

15

14

Public | 2015-03-26 | Page 21

12

11

10

7

6

5

4

2

0

Kademlia: Subtrees

15

14

Public | 2015-03-26 | Page 22

12

11

10

7

6

5

4

2

0

Subtrees Subtrees Common Prefixes Distance Guarantee: know at least 1 node per subtree Can know up to K per subtree

Public | 2015-03-26 | Page 23

Kademlia: Routing Tables

15

14

Public | 2015-03-26 | Page 24

12

11

10

7

6

5

4

2

0

Kademlia: Routing Tables

15

14

Public | 2015-03-26 | Page 25

12

11

10

7

6

5

4

2

0

Kademlia: Routing Table XOR based distance A xor B Node A: 0111 (= 7) Node B: 1011 (= 11) Distance: 1100 Subtree number: 1000

Public | 2015-03-26 | Page 26

Kademlia: Routing Tables

15

14

Public | 2015-03-26 | Page 27

12

11

10

7

6

5

4

2

0

Kademlia: Node Lookup K nodes closest to a key Storing, lookups and joins :global.register(name, pid) :global.whereis(name) :global.send(name, message)

Public | 2015-03-26 | Page 28

Whereis_name (Worst case, no replication)

15

14

12

11

10

7

:global.whereis_name(:this_name) hash(:this_name) = 5

Public | 2015-03-26 | Page 29

6

5

4

2

0

whereis_name

15

14

Public | 2015-03-26 | Page 30

12

11

10

7

6

5

4

2

0

whereis_name

15

12

14 1

Public | 2015-03-26 | Page 31

11

10

7

6

5

4

2

0

whereis_name

15

12

14 1

Public | 2015-03-26 | Page 32

11

10

7

6

5

4

2

0

whereis_name

15

12

14 1

2

Public | 2015-03-26 | Page 33

11

10

7

6

5

4

2

0

whereis_name

15

12

14 1

3 2

Public | 2015-03-26 | Page 34

11

10

7

6

5

4

2

0

Parallel Lookup

15

12

14 1 1

Public | 2015-03-26 | Page 35

11

10

7

6

5

4

2

0

Recursive Lookup

15

12

14

11

10

7

6

4 3

1 2

Public | 2015-03-26 | Page 36

5

2

0

recursive vs iterative recursive › most connections already up › faster iterative › updating routing tables › parallel

Public | 2015-03-26 | Page 37

:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 38

“Owner” Node unregister_name re_register_name Caching

Public | 2015-03-26 | Page 39

:global register_name(name, pid) re_register_name(name, pid) unregister_name(name) whereis_name(name) send(name, message) Node joins... Public | 2015-03-26 | Page 40

New Node joins 1) Has to know one other node 2) Look up the own node ID 3) One node lookup per subtree Populates its own routing table Spreads information about itself

Public | 2015-03-26 | Page 41

Node 8 joins

1111 = 15

0000 = 0 14

12

11

10

1) Know one node: 11

Public | 2015-03-26 | Page 42

8

7

6

5

4

2

Node 8 joins

1111 = 15

0000 = 0 14

12

11

10

2) Lookup 8 (itself)

Public | 2015-03-26 | Page 43

8

7

6

5

4

2

New node joins

1111 = 15

0000 = 0 14

12

11

10

8

3) One lookup per subtree Here: 10, 13 and 3 Public | 2015-03-26 | Page 44

7

6

5

4

2

Automatic disconnects Avoid too many connections Inactive connections brought down Can, should and will be fixed

Public | 2015-03-26 | Page 45

Will this scale then?

Public | 2015-03-26 | Page 46

Will this scale then? Research: yes Initial measures: looks promising Left: More measurements Optimizations A lot more fun stuff!

Public | 2015-03-26 | Page 47

Other Considerations RPC improvements Remote spawn_monitor EPMD in Erlang Protocol improvements (TCP/SSL) Fragment large messages Preserve Sub-term Sharing Public | 2015-03-26 | Page 48

Feedback welcome! Bugs/feature Requests bugs.erlang.org erlang-questions mailing list erlang.org/community Me: [email protected]

Public | 2015-03-26 | Page 49

Thank you! Public | 2015-03-26 | Page 50