Berlin 2015
Storage, Backup and Disaster Recovery in the Cloud AWS Customer Case Study: HERE „Maps for Life“
Storage, Backup and Disaster Recovery in the Cloud Robert Schmid, Storage Business Development, AWS
Ali Abbas, Principal Architect, HERE Case Study: AWS Customer HERE „Maps for Life: Satellite Imagery - S3“ ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
What we will cover in this session • •
• •
Amazon storage options Amazon Elastic File System Use cases (Backup, Archive, DR) Customer Use Case: HERE “Maps for Life, Satellite Imagery-S3”
S3 usage 102% year-over-year increase in data transfer to and from S3 (Q4 2014 vs Q4 2013, not including Amazon use)
Amazon S3 Simple Storage Service
Amazon S3 Simple Storage Service
99.999999999% durability
$0.03 per GB-month
$360 per TB/year
Amazon Glacier Low-cost archiving service
Amazon Glacier Low-cost archiving service
$0.01 per GB-month
$120 per TB/year
99.999999999% durability 3–5 hours data retrieval
Amazon EBS Elastic Block Storage
EBS General Purpose (SSD)
Provisioned IOPS (SSD)
Up to 16 TB
Up to 16 TB
10,000 IOPS
20,000 IOPS
$0.10
$0.125
per GB-month
per GB-month 0.065/provisioned IOPS
Amazon Storage Gateway
Storage Gateway Your on-ramp to AWS cloud storage: • Back up into S3 • Archive into Amazon Glacier • iSCSI or VTL interface
Summary: AWS Storage Options • Object Storage (S3, Glacier) • Elastic Block Storage (EBS) • Storage Gateway (iSCSI, VTL)
• Elastic File System for EC2 (EFS)
Introducing Amazon Elastic File System for EC2 Instances pilot availability later this summer US-WEST (Oregon)
What is EFS? • • • • •
Fully managed file system for EC2 instances Provides standard file system semantics (NFSv4) Elastically grows to petabyte scale and shrinks elastically Delivers performance for a wide variety of workloads Highly available and durable 1
2
simple
3
elastic
scalable
Amazon Storage Use Cases: Backup, Archive, Disaster Recovery
Backup, Archive, Disaster Recovery
Customer Data Center
Block
Archive
File
Backup
Customer /CSP Assets
Colocation Data Center
Disaster Recovery
Storage Gateways
AWS SGW
Private Storage for AWS
DirectConnect
AWS Cloud S3
Glacier
Internet
AWS Direct Connect
S3
Glacier
AWS Customer Case Study Ali Abbas HERE: Maps for Life Principal Architect • High Resolution Satellite Imagery • Predictive Analytics/Machine Learning
[email protected] http://www.here.com 18
HERE Maps
19
HERE Drive
HERE Transit
HERE City Lens
Explore
Maps for Life Web and Mobile App available on: Android/iOS/Windows Phone 20
Save the maps of your country or state on your phone
Offline Map
21
Use your phone offline
Explore anywhere without an internet connection
Unified Route Planning
Pocket Nav Sat
22
Route Alternatives
Turn-by-turn Navigation
Route Alternatives
Urban Navigation
23
Step-by-step transit
Turn-by-turn walk guidance
Collections
Personal Maps
24
Easy location sharing
Train Schedule
Interactive Maps
25
Traffic incidents
3D Maps
Reality Capture Processing
Satellite/Aerial Delivery
Enterprise Businesses
End to End User Integration
26
99.99% availability, 99.999999999% durability High throughput/Good Performance for most use-cases
Good price ratio Design simplifies creating integration pipelines
27
The case f Satellite Imagery
28
Continuous increase global coverage with a higher frequency of refresh 29
Billion of tiles • Huge storage requirements due to high resolution content across zoom levels • Big amount of small tile size to keep track and deliver
Challenges • Exponential growth rate (today some billions, tomorrow some trillions)
• Increased data volume refresh rate • Maintain low latency requirements and service level agreement 30
Behind the curtain • Specialized spatial file system to deliver tile imagery with sub-ms lookup time over the network. • Simple Architecture with CDN Caches and Core sites (with full dataset) • Remote sites had CDN type caches with geospatial shard-ing placement algorithms. •
Some select cache regions suffered sometimes from inter-continental network latency due to non-optimized routing
• The scale of data implies massive storage infrastructure to maintain on top 31
Mercator based shard-ing layer Intelligent Filter layer
Specialized Spatial Blob Store
Specialized Adaptive Spatial Blob Store
Shared Store
32
Core
Singleton Store
Caches
Given the success of S3 usage across HERE and the recent enhancement to the offering, we started to look at S3 to solve 2 main problems with 1 solution
Simplify the storage handling layer with getting rid of the storage compute from our architecture and simplify Operations.
Reduce the network latency from core data to our delivery instances by adding core data presence in each availability regions.
33
Satellite on S3 • Easy life-cycle management for recurring update • Big Data store requirements on-demand (ease capacity planning) • Easy pipeline integration with SQS/SNS for background jobs • Good performance out of the box, however did not fulfill our requirements Too much variation in response time ~ AVG 150-300ms.
34
S3 Load constrain
Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
35
S3 Load constrain + Satellite
Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
36
S3 Load constrain + Satellite Stored lexicographically across S3 partitions z
37
x
y
quadkey representation
Satellite example tile ID: 15/18106/11272
302013232331232
15/18089/11275
302013232321201
17/72409/45094
30201323233033003
S3 Load constrain + Satellite Stored lexicographically across S3 partitions z
x
y
quadkey representation
Satellite example tile ID: 15/18106/11272
302013232331232
15/18089/11275
302013232321201
17/72409/45094
30201323233033003
Each zoom level has 4^level_detail tiles, a quadkey length is equal to the level of detail of the corresponding tile. 38
S3 Load constrain + Satellite Stored lexicographically across S3 partitions Alternative to quadkeys use random hash, increase base number Remaining problem At the scale of satellite, the ratio of requests in regards to the lexicographic overlap produced with a random hash was still significant and would not scale well. Performance was still unacceptable in light of our requirements. Billion of PUT requests would considerably increase recurring-updates cost.
39
S3 Load constrain + Satellite Stored lexicographically across S3 partitions Better solution Reduce the amount of files by creating binary blob on S3, index the tiles inside the blobs and use HTTP range-request for access. New Challenge Managing updates got more complicated, more logic requires to distribute tiles inside the blobs and more important the predicted index size was in magnitude of terabytes and growing… cost and complexity overhead.
40
Back on the whiteboard…
41
New Pseudo-Quad Index
42
•
New compact O(1) data-structure to work around the performance constrains of S3
•
It minimizes the index size constrain to keep track of tiles and random hashes • 194.605% size reduction in comparison to generic optimized hash tables
•
It reduces and sets boundaries for proximity regions to cause better dispersion on the n-gram load split algorithm used by S3
•
Simplified Imagery updates; geometrical consistency across all S3 buckets
•
Performance: • S3: >150-300ms • S3 + PQI: