11/12/2013
Re-engineering Your Application for AWS Section 04 – Weathering the Storm Joe Baron, AWS November 12, 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Overview • Purpose: Use a multi-AZ deployment to make the application fully highly-available and productionready. • To accomplish this, we – – – –
Discuss failures, high-availability, and disaster recovery Cover which AWS services are inherently highly-available Introduce the use of multiple Availability Zones Modify WordPress app to use a multi-AZ deployment at the Web Tier and the RDS Database Tier – Configure WordPress to store media files in an Amazon S3 bucket and to distribute the files via Amazon CloudFront
1
11/12/2013
Lab 04 – Weathering the Storm • Complete Setup section of Lab 04 – Setup S3 Bucket and CloudFront Distribution – Create WordPress CloudFormation Stack – Then STOP!
Re-engineering Your Application for AWS High Availability Joe Baron, AWS November 12, 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
2
11/12/2013
“Everything fails, all the time” Werner Vogels, CTO Amazon.com
High Availability and Disaster Recovery High Availability
Disaster Recovery
• How is high availability different from disaster recovery?
3
11/12/2013
High Availability Defined Availability =
Uptime Uptime + Downtime
Availability
Downtime/Year
Downtime/Month
99.9%
8.76 hours
43.2 minutes
99.99%
52.56 minutes
4.32 minutes
99.999%
5.26 minutes
25.9 seconds
99.9999%
31.5 seconds
2.59 seconds
Can’t rely on a human to respond in a 5 minute window, or less. Requires automation. “High-Availability Infrastructure in the Cloud,” Evan Cooke, Co-Founder & CTO twillio, Web 2.0 Expo NYC 2011
High Availability − Discussion • Q: How many of you have enabled high availability for your applications on AWS?
4
11/12/2013
Fault-Tolerant & Highly Available Services
Auto Scaling across Availability Zones
Elastic Load Balancing
Amazon CloudFront
Amazon Simple Storage Service (S3)
Amazon Relational Database Service (RDS) Multi−Availability Zone
Amazon DynamoDB
Is a single Availability Zone deployment highly available? What if… • a power failure, WP WP
or
WP
WP (M1.Med) Auto scaling Group
EC2SecurityGroup
• a networking event Causes the whole AZ to go off-line?
Public Subnet
DBSecurityGroup Private Subnet
DynamoDB Session State
Availability Zone A
5
11/12/2013
Multi−Availability Zone Deployment
Elastic Load Balancer
WP
WP
WP
WP
Auto scaling Group EC2SecurityGroup PublicSubnet
PublicSubnet
DBSecurityGroup Private Subnet
Private Subnet
Availability Zone A
Availability Zone B
Multi−Availability Zone Deployment: Web Tier • • •
ELB load balances across multiple AZs (VPC subnets) EC2 Auto Scaling replaces failed instances Resilient to loss of instance or whole AZ
Elastic Load Balancer
WP
WP
WP
WP
Auto scaling Group EC2SecurityGroup PublicSubnet
PublicSubnet
DBSecurityGroup Private Subnet
Availability Zone A
Private Subnet
Availability Zone B
6
11/12/2013
Multi−Availability Zone Deployment: DB Tier
Elastic Load Balancer
WP
WP
WP
WP
Auto scaling Group EC2SecurityGroup PublicSubnet
PublicSubnet
• • •
RDS Multi-AZ Deployment Synchronous replication Resilient to loss of DB instance or whole AZ
DBSecurityGroup Private Subnet
Private Subnet
Availability Zone A
Availability Zone B
Other High Availability Services: DynamoDB
Users
Internet Gateway
Elastic Load Balancer
Admins
Bootstrap Content
S3 Bucket for Static Content WP
WP
WP
CloudFront Distribution
WP
Auto scaling Group EC2SecurityGroup PublicSubnet
PublicSubnet
LDAP or AD Server DynamoDB Session State
Launch
AWS Management Console
CloudFormation Template
IAM User Created
DBSecurityGroup Private Subnet
Private Subnet
Availability Zone A
Availability Zone B VPC
Region
7
11/12/2013
High Availability in Amazon DynamoDB
Availability Zone 1
Availability Zone 2
Availability Zone 3 Region
Other High Availability Services: S3 + CloudFront
Users
Internet Gateway
Elastic Load Balancer
Admins
Bootstrap Content
S3 Bucket for Static Content WP
WP
WP
CloudFront Distribution
WP
Auto scaling Group EC2SecurityGroup PublicSubnet
PublicSubnet
LDAP or AD Server DynamoDB Session State
Launch
AWS Management Console
CloudFormation Template
IAM User Created
DBSecurityGroup Private Subnet
Private Subnet
Availability Zone A
Availability Zone B VPC
Region
8
11/12/2013
Simple Storage Service (S3) Region Data center
T
Data center
Data center Node 1
...
Node n
Amazon CloudFront
9
11/12/2013
S3 + CloudFront • S3 is highly-durable and highly-scalable storage for internet • CloudFront is easy-to-use CDN with 40+ edge locations around the globe • CloudFront Distribution maps S3 content to CloudFront edge locations • WordPress plugin uses S3 + CloudFront for storage and distribution of media files – – – –
Content automatically delivered to end user from the closest edge location Maximizes throughput, minimizes latency Reduces load on your web app infrastructure Give you resiliency to DDoS attacks
Inherent Redundancy
Users
Internet Gateway
Elastic Load Balancer
Admins
Bootstrap Content
S3 Bucket for Static Content WP
WP
WP
CloudFront Distribution
WP
Auto scaling Group EC2SecurityGroup PublicSubnet
PublicSubnet
LDAP or AD Server DynamoDB Session State
Launch
AWS Management Console
CloudFormation Template
IAM User Created
DBSecurityGroup Private Subnet
Private Subnet
Availability Zone A
Availability Zone B VPC
Region
10
11/12/2013
High Availability − Discussion Q: How much does it cost to implement high availability? Q: Why high availability instead of disaster recovery? Q: Where do you stop? Q: High availability across Regions?
High Availability – Best Practices • • • • •
Build loosely coupled systems Use inherently fault-tolerant AWS services Use multiple Availability Zones Implement elasticity through Auto Scaling Use automation to meet SLAs (Auto Scaling, CloudFormation templates) • Test your solution
11
11/12/2013
Simian Army from Netflix • Chaos Monkey is a service that runs in AWS – Seeks out Auto Scaling groups (ASGs) and terminates instances (virtual machines) per group
• Others: Latency Monkey, Conformity Monkey, Doctor Monkey, Janitor Monkey, Security Monkey, 10-18 Monkey, and Chaos Gorilla • From the next Netflix blog: – “Do your traffic load balancers correctly detect and route requests round instances that go offline? Can you reliably rebuild your instances? Perhaps an engineer ‘quick patched’ an instance last week and forgot to commit the changes to your source repository?”
• https://github.com/Netflix/SimianArmy
Test Your High Availability Solution •
From the Netflix blog:
•
Simple monkey: – Kill any instance in the account Complex monkey: – Kill instances with specific tags – Introduce other faults (e.g., connectivity via security group) Human monkey: – Kill instances from the AWS Management Console
•
•
12
11/12/2013
Re-engineering Your Application for AWS Lab 04 – Weathering the Storm Joe Baron, AWS November 12, 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Lab 04 – Weathering the Storm • Focus on making the application fully highly-available and production-ready, by adding: – Elastic Load Balancing across multiple AZs – EC2 Auto Scaling across multiple AZs – Scalable media file storage and distribution using S3 and CloudFront
• In this lab you will: – Create a multi-AZ deployment at the Web Tier and the DB Tier – Configure WordPress to store objects in an Amazon S3 bucket and content distribution via Amazon CloudFront – Simulate an Availability Zone failure including Amazon RDS failure – Test application availability
13
11/12/2013
Lab 01 – Lift and Shift
Users
Internet Gateway
Admins
Bootstrap Content
WP + MySQL (M1.Large)
EC2SecurityGroup Public Subnet
Launch
AWS Management Console
IAM User Created
CloudFormation Template
Availability Zone A Default VPC
Region
Lab 02 – Freedom and Security
Users
Internet Gateway
Admins WP Bootstrap Content
(M1.Med)
EC2SecurityGroup Public Subnet
LDAP or AD Server DBSecurityGroup Private Subnet Launch
AWS Management Console
CloudFormation Template
IAM User Created
Availability Zone A VPC
Region
14
11/12/2013
Lab 03 – Balance and Scale
Users
Internet Gateway
Admins WP WP
Bootstrap Content
WP
WP (M1.Med) Auto scaling Group
EC2SecurityGroup Public Subnet
LDAP or AD Server DynamoDB Session State
DBSecurityGroup Private Subnet Launch
AWS Management Console
IAM User Created
Availability Zone A
CloudFormation Template
VPC
Region
Lab 04 – Weathering the Storm
Users
Internet Gateway
Elastic Load Balancer
Admins
Bootstrap Content
S3 Bucket for Static Content WP
WP
WP
CloudFront Distribution
WP
Auto scaling Group EC2SecurityGroup PublicSubnet
PublicSubnet
LDAP or AD Server DynamoDB Session State
Launch
AWS Management Console
CloudFormation Template
IAM User Created
DBSecurityGroup Private Subnet
Private Subnet
Availability Zone A
Availability Zone B VPC
Region
15
11/12/2013
Please give us your feedback on this presentation
1280 As a thank you, we will select prize winners daily for completed surveys!
16