Storage, Backup and Disaster Recovery in the Cloud - Amazon Web ...

Report 2 Downloads 87 Views
Berlin 2015

Storage, Backup and Disaster Recovery in the Cloud AWS Customer Case Study: HERE „Maps for Life“

Storage, Backup and Disaster Recovery in the Cloud Robert Schmid, Storage Business Development, AWS

Ali Abbas, Principal Architect, HERE Case Study: AWS Customer HERE „Maps for Life: Satellite Imagery - S3“ ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

What we will cover in this session • •

• •

Amazon storage options Amazon Elastic File System Use cases (Backup, Archive, DR) Customer Use Case: HERE “Maps for Life, Satellite Imagery-S3”

S3 usage 102% year-over-year increase in data transfer to and from S3 (Q4 2014 vs Q4 2013, not including Amazon use)

Amazon S3 Simple Storage Service

Amazon S3 Simple Storage Service

99.999999999% durability

$0.03 per GB-month

$360 per TB/year

Amazon Glacier Low-cost archiving service

Amazon Glacier Low-cost archiving service

$0.01 per GB-month

$120 per TB/year

99.999999999% durability 3–5 hours data retrieval

Amazon EBS Elastic Block Storage

EBS General Purpose (SSD)

Provisioned IOPS (SSD)

Up to 16 TB

Up to 16 TB

10,000 IOPS

20,000 IOPS

$0.10

$0.125

per GB-month

per GB-month 0.065/provisioned IOPS

Amazon Storage Gateway

Storage Gateway Your on-ramp to AWS cloud storage: • Back up into S3 • Archive into Amazon Glacier • iSCSI or VTL interface

Summary: AWS Storage Options • Object Storage (S3, Glacier) • Elastic Block Storage (EBS) • Storage Gateway (iSCSI, VTL)

• Elastic File System for EC2 (EFS)

Introducing Amazon Elastic File System for EC2 Instances pilot availability later this summer US-WEST (Oregon)

What is EFS? • • • • •

Fully managed file system for EC2 instances Provides standard file system semantics (NFSv4) Elastically grows to petabyte scale and shrinks elastically Delivers performance for a wide variety of workloads Highly available and durable 1

2

simple

3

elastic

scalable

Amazon Storage Use Cases: Backup, Archive, Disaster Recovery

Backup, Archive, Disaster Recovery

Customer Data Center

Block

Archive

File

Backup

Customer /CSP Assets

Colocation Data Center

Disaster Recovery

Storage Gateways

AWS SGW

Private Storage for AWS

DirectConnect

AWS Cloud S3

Glacier

Internet

AWS Direct Connect

S3

Glacier

AWS Customer Case Study Ali Abbas HERE: Maps for Life Principal Architect • High Resolution Satellite Imagery • Predictive Analytics/Machine Learning [email protected]

http://www.here.com 18

HERE Maps

19

HERE Drive

HERE Transit

HERE City Lens

Explore

Maps for Life Web and Mobile App available on: Android/iOS/Windows Phone 20

Save the maps of your country or state on your phone

Offline Map

21

Use your phone offline

Explore anywhere without an internet connection

Unified Route Planning

Pocket Nav Sat

22

Route Alternatives

Turn-by-turn Navigation

Route Alternatives

Urban Navigation

23

Step-by-step transit

Turn-by-turn walk guidance

Collections

Personal Maps

24

Easy location sharing

Train Schedule

Interactive Maps

25

Traffic incidents

3D Maps

Reality Capture Processing

Satellite/Aerial Delivery

Enterprise Businesses

End to End User Integration

26

99.99% availability, 99.999999999% durability High throughput/Good Performance for most use-cases

Good price ratio Design simplifies creating integration pipelines

27

The case f Satellite Imagery

28

Continuous increase global coverage with a higher frequency of refresh 29

Billion of tiles • Huge storage requirements due to high resolution content across zoom levels • Big amount of small tile size to keep track and deliver

Challenges • Exponential growth rate (today some billions, tomorrow some trillions)

• Increased data volume refresh rate • Maintain low latency requirements and service level agreement 30

Behind the curtain • Specialized spatial file system to deliver tile imagery with sub-ms lookup time over the network. • Simple Architecture with CDN Caches and Core sites (with full dataset) • Remote sites had CDN type caches with geospatial shard-ing placement algorithms. •

Some select cache regions suffered sometimes from inter-continental network latency due to non-optimized routing

• The scale of data implies massive storage infrastructure to maintain on top 31

Mercator based shard-ing layer Intelligent Filter layer

Specialized Spatial Blob Store

Specialized Adaptive Spatial Blob Store

Shared Store

32

Core

Singleton Store

Caches

Given the success of S3 usage across HERE and the recent enhancement to the offering, we started to look at S3 to solve 2 main problems with 1 solution

Simplify the storage handling layer with getting rid of the storage compute from our architecture and simplify Operations.

Reduce the network latency from core data to our delivery instances by adding core data presence in each availability regions.

33

Satellite on S3 • Easy life-cycle management for recurring update • Big Data store requirements on-demand (ease capacity planning) • Easy pipeline integration with SQS/SNS for background jobs • Good performance out of the box, however did not fulfill our requirements Too much variation in response time ~ AVG 150-300ms.

34

S3 Load constrain

Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

35

S3 Load constrain + Satellite

Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

36

S3 Load constrain + Satellite Stored lexicographically across S3 partitions z

37

x

y

quadkey representation

Satellite example tile ID: 15/18106/11272

302013232331232

15/18089/11275

302013232321201

17/72409/45094

30201323233033003

S3 Load constrain + Satellite Stored lexicographically across S3 partitions z

x

y

quadkey representation

Satellite example tile ID: 15/18106/11272

302013232331232

15/18089/11275

302013232321201

17/72409/45094

30201323233033003

Each zoom level has 4^level_detail tiles, a quadkey length is equal to the level of detail of the corresponding tile. 38

S3 Load constrain + Satellite Stored lexicographically across S3 partitions Alternative to quadkeys use random hash, increase base number Remaining problem At the scale of satellite, the ratio of requests in regards to the lexicographic overlap produced with a random hash was still significant and would not scale well. Performance was still unacceptable in light of our requirements. Billion of PUT requests would considerably increase recurring-updates cost.

39

S3 Load constrain + Satellite Stored lexicographically across S3 partitions Better solution Reduce the amount of files by creating binary blob on S3, index the tiles inside the blobs and use HTTP range-request for access. New Challenge Managing updates got more complicated, more logic requires to distribute tiles inside the blobs and more important the predicted index size was in magnitude of terabytes and growing… cost and complexity overhead.

40

Back on the whiteboard…

41

New Pseudo-Quad Index

42



New compact O(1) data-structure to work around the performance constrains of S3



It minimizes the index size constrain to keep track of tiles and random hashes • 194.605% size reduction in comparison to generic optimized hash tables



It reduces and sets boundaries for proximity regions to cause better dispersion on the n-gram load split algorithm used by S3



Simplified Imagery updates; geometrical consistency across all S3 buckets



Performance: • S3: >150-300ms • S3 + PQI: