11/12/2013
Taking AWS Operations to the Next Level Chris Munns, Solutions Architect, AWS November 12, 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Taking AWS Operations to the Next Level
Welcome! • • •
Take a seat! Meet your neighbors! We’ll be starting a bit after 9.
1
11/12/2013
Today: • 9am-5pm – Hour break for lunch around noon • Bio breaks during labs are cool too • If we get done early? To the bar/tables!
• Some presentation, some hands on labs • Ask questions, but let’s not get too sidetracked • Everyone is here to learn and level up – Mixed backgrounds, experience, titles, companies, industries, countries
Welcome to the AWS re:Invent 2013 Me: Chris Munns(
[email protected]) – AWS Solutions Architect based in NYC – Previously Senior Sys-Ops Engineer at a number of places like Etsy and Meetup
Also: – (today’s helpers)
2
11/12/2013
Today’s Goals: Mine: • Have everyone here learn something new and feel confident in that knowledge to take action on it when you return to your normal day to day. That action should make your infrastructure(and hence life and business) better.
Today’s Goals: Yours: •
Understand the power of AWS CloudFormation to create, provision, manage, and update your infrastructure on AWS
•
Use host based application configuration management tools and methodology to manage the systems and applications living inside your infrastructure
•
Combine the above with the Amazon Simple Workflow Service, service APIs and other tools, to automate routine tasks in your infrastructure
3
11/12/2013
Sound good?
https://secure.flickr.com/photos/stevendepolo/5749192025/
Taking AWS Operations to the Next Level
4
11/12/2013
Taking AWS Operations to the Next Level
Taking AWS Operations to the Next Level
5
11/12/2013
Taking AWS Operations to the Next Level
Operations: • “Systems Operations refers to a team, or possibly even a department within the IT group, which is responsible for the running of the centralized systems and networks. “ – http://www.yourwindow.to/information-security/gl_systemsoperations.htm
6
11/12/2013
Operations: • Creating, modifying, provisioning, updating systems, software and networks • Perform day to day tasks to keep the infrastructure: – – – – –
Available? Growing? Scaling? Performing? Secure?
Operations: • Creating, modifying, provisioning, updating systems, software and networks • Perform day to day tasks to keep the infrastructure: Working the way the business needs or will need for the business to meet or exceed its goals
7
11/12/2013
Operations: • Creating, modifying, provisioning, updating systems, software and networks • Perform day to day tasks to keep the infrastructure:
HELPING THE BUSINESS MAKE MONEY
Next Level:
• How many RPG (role playing game) fans do we have in the room?
8
11/12/2013
http://diablo3crack.com/images/Demon-Hunter-Level-Up-Diablo-3.png
Next Level: In RPGs, reaching the next level (leveling up) can mean you are now better and more capable at what you were doing before – Be Smarter – Be Stronger – Be Faster – Kill monsters/bad guys better. Win.
9
11/12/2013
Next Level: In operations on AWS, reaching the next level (leveling up) can mean you are now better and more capable at what you were just doing before – Better use of APIs & automation ( Be Smarter ) – Better availability/uptime ( Be Stronger ) – Better agility as an organization ( Be Faster ) – AWS gives you the tools. ( to kill those infrastructure bad guys )
10
11/12/2013
http://findingagap.files.wordpress.com/2012/08/level-up.jpg
11
11/12/2013
Today’s Agenda: • Cover three tools that will help you to take things to the next level: – AWS CloudFormation – Host-based configuration management with Chef + AWS OpsWorks – Amazon Simple WorkFlow Service + APIs
Today’s Agenda: • Going to be covering this from a Linux/open source perspective. ( sorry Windows d00ds!) • The big picture is around automation and ease of deployment • These are just some examples, but lots of flexibility and options
12
11/12/2013
Today’s Agenda: • We’re going to build out a highly available, scalable, and self healing WordPress site. • It will live inside a VPC, be served via Elastic Load Balancing, make use of AWS OpsWorks for deployment and management of the web tier, and use ElastiCache and Amazon RDS for caching and database. • We’ll also use Amazon SWF later to help with some of the self-healing aspects of our infrastructure.
Our Infrastructure Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
SWF Worker Instance
Amazon CloudWatch
Amazon SNS
NAT Instance
AWS Amazon CloudFormation SQS
Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
13
11/12/2013
AWS CloudFormation
AWS CloudFormation • AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion First released in 2010
14
11/12/2013
AWS CloudFormation
Templates to describe the AWS resources and any associated dependencies or runtime parameters required to run your application
AWS CloudFormation
You don’t need to figure out the order in which AWS services need to be provisioned or the subtleties of how to make those dependencies work
15
11/12/2013
AWS CloudFormation Once deployed, you can modify and update the AWS resources in a controlled and predictable way, allowing you to version your AWS infrastructure in the same way as you version your software
AWS CloudFormation
AWS CloudFormation takes care of this for you
16
11/12/2013
AWS CloudFormation
AWS CloudFormation is available at no additional charge, and you pay only for the AWS resources needed to run your applications
AWS CloudFormation • Templates to describe the AWS resources • Modify and update your AWS resources in a controlled and predictable way • Version control your AWS infrastructure
17
11/12/2013
AWS CloudFormation • Templates to describe the AWS resources • Modify and update your AWS resources in a controlled and predictable way • Version control your AWS infrastructure
Anatomy of a template
18
11/12/2013
JSON
Perfect for version control
Plain text
JSON Validatable
19
11/12/2013
Declarative language {
"AWSTemplateFormatVersion" : "2010-09-09", "Description" : "AWS CloudFormation Sample Template EC2InstanceSample: Create an Amazon EC2 instance running the Amazon Linux AMI. The AMI is chosen based on the region in which the stack is run. This example uses the default security group, so to SSH to the new instance using the KeyPair you enter, you will need to have port 22 open in your default security group. **WARNING** This template an Amazon EC2 instances. You will be billed for the AWS resources used if you create a stack from this template.", "Parameters" : { "KeyName" : { "Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance", "Type" : "String" } }, "Mappings" : { "RegionMap" : { "us-east-1" : { "AMI" : "ami-7f418316" }, "us-west-1" : { "AMI" : "ami-951945d0" }, "us-west-2" : { "AMI" : "ami-16fd7026" }, "eu-west-1" : { "AMI" : "ami-24506250" }, "sa-east-1" : { "AMI" : "ami-3e3be423" }, "ap-southeast-1" : { "AMI" : "ami-74dda626" }, "ap-northeast-1" : { "AMI" : "ami-dcfa4edd" } } }, "Resources" : { "Ec2Instance" : { "Type" : "AWS::EC2::Instance", "Properties" : { "KeyName" : { "Ref" : "KeyName" }, "ImageId" : { "Fn::FindInMap" : [ "RegionMap", { "Ref" : "AWS::Region" }, "AMI" ]}, "UserData" : { "Fn::Base64" : "80" } } } }, "Outputs" : { "InstanceId" : { "Description" : "InstanceId of the newly created EC2 instance", "Value" : { "Ref" : "Ec2Instance" } }, "AZ" : { "Description" : "Availability Zone of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "AvailabilityZone" ] } }, "PublicDNS" : { "Description" : "Public DNSName of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "PublicDnsName" ] } } } }
20
11/12/2013
"AWSTemplateFormatVersion" : "2010-09-09", "Description" : "AWS CloudFormation Sample Template EC2InstanceSample: Create an Amazon EC2 instance running the Amazon Linux AMI. The AMI is chosen based on the region in which the stack is run. This example uses the default security group, so to SSH to the new instance using the KeyPair you enter, you will need to have port 22 open in your default security group. **WARNING** This template an Amazon EC2 instances. You will be billed for the AWS resources used if you create a stack from this template.", "Parameters" : { "KeyName" : { "Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance", "Type" : "String" } },
PARAMETERS
"Mappings" : { "RegionMap" : { "us-east-1" : { "AMI" : "ami-7f418316" }, "us-west-1" : { "AMI" : "ami-951945d0" }, "us-west-2" : { "AMI" : "ami-16fd7026" }, "eu-west-1" : { "AMI" : "ami-24506250" }, "sa-east-1" : { "AMI" : "ami-3e3be423" }, "ap-southeast-1" : { "AMI" : "ami-74dda626" }, "ap-northeast-1" : { "AMI" : "ami-dcfa4edd" } } },
HEADERS
MAPPINGS
"Resources" : { "Ec2Instance" : { "Type" : "AWS::EC2::Instance", "Properties" : { "KeyName" : { "Ref" : "KeyName" }, "ImageId" : { "Fn::FindInMap" : [ "RegionMap", { "Ref" : "AWS::Region" }, "AMI" ]}, "UserData" : { "Fn::Base64" : "80" } } } },
RESOURCES
"Outputs" : { "InstanceId" : { "Description" : "InstanceId of the newly created EC2 instance", "Value" : { "Ref" : "Ec2Instance" } }, "AZ" : { "Description" : "Availability Zone of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "AvailabilityZone" ] } }, "PublicDNS" : { "Description" : "Public DNSName of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "PublicDnsName" ] } } }
OUTPUTS
}
Parameters Provision-time specification Command-line options
21
11/12/2013
Mappings Conditionals Case statements
22
11/12/2013
23
11/12/2013
Resources
24
11/12/2013
25
11/12/2013
26
11/12/2013
27
11/12/2013
Outputs
28
11/12/2013
AWS CloudFormation Resources: Almost any AWS service – What’s missing(right now)? • • • • • •
Amazon Elastic MapReduce (Amazon EMR) Amazon Simple Workflow Service (Amazon SWF) Amazon Simple Email Service (Amazon SES) Amazon Glacier Amazon CloudSearch Might be small new features from other services not yet incorporated
Let us know what you need and how badly!
29
11/12/2013
AWS CloudFormation Resources – Amazon Elastic Compute Cloud (Amazon EC2): { "Type" : "AWS::EC2::Instance", "Properties" : { "AvailabilityZone" : String, "DisableApiTermination" : Boolean, "EbsOptimized" : Boolean, "IamInstanceProfile" : String, "ImageId" : String, "InstanceType" : String,
AWS CloudFormation Resources – Amazon EC2: "KernelId" : String, "KeyName" : String, "Monitoring" : Boolean, "PlacementGroupName" : String, "PrivateIpAddress" : String, "RamdiskId" : String, "SecurityGroupIds" : [ String, ... ], "SecurityGroups" : [ String, ... ],
30
11/12/2013
AWS CloudFormation Resources – Amazon EC2: "SourceDestCheck" : Boolean, "SubnetId" : String, "Tags" : [ EC2 Tag, ... ], "Tenancy" : String, "UserData" : String, "Volumes" : [ EC2 MountPoint, ... ] } }
AWS CloudFormation /dev/null 2>&1" end
63
11/12/2013
Being a Chef Defining a role: ( frontends_role.json) { "name": ”Frontends", "chef_type": "role", "json_class": "Chef::Role", "default_attributes": { }, "description": ”Front End Web hosts", "run_list": [ "role[Base]", “recipe[apache::frontend]”, “recipe[syslog::frontend]”, “recipe[ganglia]” ], "override_attributes": { …
Being a Chef Defining a role: ( frontends_role.json) …. "override_attributes": { "ganglia": { "config": "/etc/ganglia/gmond.conf", "gmetad_name": [ ”ganglia.munnsdemo.prv" ], "cluster_name": [ ”FrontEnds" ] } } }
64
11/12/2013
Being a Chef In recipe:
In Attributes:
variables( :gmetad_name => node[:ganglia][:gmetad_name], :cluster_name => node[:ganglia][:cluster_name] )
{ "ganglia": { "config": "/etc/ganglia/gmond.conf", "gmetad_name": [ "ganglia.munnsdemo.prv" ], "cluster_name": [ ”FrontEnds" ] } }
In template: … name = "” …. host = port = 8649 ttl = 1 }
Being a Chef In recipe:
In Attributes:
variables( :gmetad_name => node[:ganglia][:gmetad_name], :cluster_name => node[:ganglia][:cluster_name] )
{ "ganglia": { "config": "/etc/ganglia/gmond.conf", "gmetad_name": [ "ganglia.munnsdemo.prv" ], "cluster_name": [ ”FrontEnds" ] } }
In template: … name = "” …. host = port = 8649 ttl = 1 }
65
11/12/2013
Working with Chef
Chef main application components: – Chef-solo OR – Chef-client & Chef-server
Working with Chef Opscode Chef-server + Chef-client options: – Hosted Chef – “Get instant access to a highly available, dynamically scalable, fully managed and supported automation environment - powered by the experts at Opscode.”
– Private Chef – “All the benefits of Hosted Chef, delivered with 24/7/365 support and implementation consulting, installed as enterprise software behind the corporate firewall.”
– Open Source – You do it all. Not that difficult, community supported, free.
66
11/12/2013
Working with Chef
OR…
67
11/12/2013
AWS OpsWorks • Integrated application management solution for ops-minded developers and IT admins • Model, control, and automate applications of nearly any scale and complexity • AWS Management Console, SDKs, or CLI • No additional cost
AWS OpsWorks SIMPLE Easy to use, quick to get started and productive
PRODUCTIVE
FLEXIBLE
POWERFUL
SECURE
Reduces errors with conventions and scripted configuration
Simplifies deployments of any scale and complexity
Reduces cost and time with automation
Enables control with fine grained permissions
68
11/12/2013
AWS OpsWorks
A stack represents the cloud infrastructure and applications that you want to manage together.
A layer defines how to setup and configure a set of instances and related resources.
Decide how to scale: manually, with 24/7 instances, or automatically, with load-based or time-based instances.
Then deploy your app to specific instances and customize the deployment with Chef recipes.
AWS OpsWorks Instance Lifecycle Agent on each EC2 instance… …understands a set of commands that are triggered by AWS OpsWorks. The agent then runs a Chef solo run.
Setup
Configure
Deploy
Undeploy
Shutdown
69
11/12/2013
AWS OpsWorks Instance Lifecycle configure
online
all instances get configure event
configure all instances get configure event
running setup
terminating
instance gets Setup event
instance gets shutdown event
booting shutting down pending
requested new / stopped
AWS OpsWorks Agent Communication 1. “Deploy App”
2.
AWS OpsWorks
EC2 instance
3. 4. 5. 6. 7.
Instance connects with AWS OpsWorks to send keep alive heartbeat and receive lifecycle events AWS OpsWorks sends lifecycle event with pointer to configuration JSON (metadata, recipes) in S3 bucket Download configuration JSON Pull recipe and other build assets from your repo Execute recipe with metadata Upload Chef log Report Chef run status
Your repo, e.g. GitHub
70
11/12/2013
How AWS OpsWorks Bootstraps the EC2 Instance • Instance is started with IAM role – UserData passed with instance private key, AWS OpsWorks public key – Instance downloads and installs AWS OpsWorks agent
• Agent connects to instance service, gets run info – Authenticate instance using instance’s IAM role – Pick-up configuration JSON from the AWS OpsWorks instance queue – Decrypt & verify message, run Chef recipes – Upload Chef log, return Chef run status
• Agent polls instance service for more messages
AWS OpsWorks + Chef • AWS OpsWorks uses Chef Solo to configure the software on the instance • AWS OpsWorks provides many Chef Server functions to users. – Associate cookbooks with instances – Dynamic metadata that describes each registered node in the infrastructure
• Supports "Push" Command and Control Client Runs • With Chef 11, broader support for community cookbooks
71
11/12/2013
AWS OpsWorks + Chef Predefined Layers: • Elastic Load Balancing • HAProxy • MySQL • Memcached • Ganglia
• • • • •
Ruby on Rails Node.js PHP Static Web Server Java
AWS OpsWorks + Chef Differences between AWS OpsWorks & Chef-Server Recipes:
Environments = Stacks Data bags = Custom JSON / Amazon S3 Search = Stack Configuration / Deployment JSON
72
11/12/2013
AWS OpsWorks Chef Recipes github.com/aws/opsworks-cookbooks
AWS OpsWorks + Chef include_recipe 'deploy‘ node[:deploy].each do |application, deploy| template "#{deploy[:current_path]}/wp-config.php" do source "wp-config.php.erb" owner "root" group "root" mode "0644" variables( :database => node['wordpress']['db']['database'], :user => node['wordpress']['db']['user'], :password => node['wordpress']['db']['password'], :dbhost => node['wordpress']['dbhost'], :lang => node['wordpress']['languages']['lang'], :cachenode => node['wordpress']['cachenode'] ) end end
73
11/12/2013
AWS OpsWorks + Chef Resources to learn more: • • • • • •
https://aws.amazon.com/opsworks/ https://aws.amazon.com/documentation/opsworks/ https://github.com/aws/opsworks-cookbooks http://www.opscode.com/ http://www.opscode.com/chef http://wiki.opscode.com/display/chef/Documentation
Lab 2 Deploying our App Using AWS OpsWorks
74
11/12/2013
Lab 2 • Three main parts: – Add Amazon RDS to our infrastructure with AWS CloudFormation – Deploy our WordPress site with AWS OpsWorks – Modify WordPress adding in ElastiCache Memcached, a caching module to the code, and deploying in AWS OpsWorks
• One hour. • Again, ask questions if stumped!
Lab 2 Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
RDS DB Instance Standby (Multi-AZ)
ElastiCache
NAT Instance
ELB
Private VPC Subnet
Amazon CloudWatch
AWS CloudFormation Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
Amazon S3 ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Leap/Bastion Instance
AWS OpsWorks
75
11/12/2013
Taking AWS Operations to the Next Level Lab 2 Recap: 1. 2.
3.
Added in host based configuration management to help with software/configuration management on the host Using AWS CloudFormation with AWS OpsWorks together helps you get from nothing to fully-running infrastructure with lots of ongoing benefits, easier maintainability Keep track of the who/what/when/why of your infrastructure/servers/applications
Taking AWS Operations to the Next Level Lab 2 Recap: – Continue to make use of software revision control/testing techniques for our Chef recipes – Using AWS OpsWorks makes deployment/configuration even easier – Can start off building this onto your current infrastructure today, no need for greenfield
76
11/12/2013
Lab 2 Recap Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
RDS DB Instance Standby (Multi-AZ)
ElastiCache
NAT Instance
ELB
Private VPC Subnet
Amazon CloudWatch
AWS CloudFormation Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
Amazon S3 ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Leap/Bastion Instance
AWS OpsWorks
http://nintendobuddyboards.freeforums.org/sonic-3-and-knuckles-t402.html
77
11/12/2013
So now we have provisioning and configuration down pretty well. But what could be missing?
Taking AWS Operations to the Next Level • What about resources that aren’t easy to use AWS CloudFormation or Chef on? • What about responding to failure/incidents? • What might wake us up at 4am when we’re on call? • What about SPOFs? • What other routine things can be automated?
78
11/12/2013
Taking AWS Operations to the Next Level Some current deficiencies from what we’ve covered today: – AWS CloudFormation is passive and won’t react to instance issues/failures • How to re-assign resources like Amazon VPC routes, ENIs, EBS?
– AWS OpsWorks won’t handle out of band resources that it currently doesn’t support
– Pools of hosts, clusters, and things that shard are easy (one-offs, not so much)
What’s the missing piece? https://secure.flickr.com/photos/moxievision/470346677/
79
11/12/2013
• Auto Scaling • Amazon Simple Work Flow • AWS APIs • and a little elbow grease
Traffic to our site vs Provisioned Capacity Manually Provisioned capacity
80
11/12/2013
Traffic to our site vs Provisioned Capacity Manually 76% Provisioned capacity
24%
Traffic to our site vs Provisioned Capacity with Auto‐Scaling
Provisioned capacity
81
11/12/2013
Auto Scaling • Works with Amazon EC2 • Triggers via Amazon CloudWatch alarms, scheduled actions, or manually • Works with Spot Instance pricing! • Scale up number of instances of a certain class • Scale down number of instances of a certain class OR
Auto-Scaling • You can use Auto Scaling for singular instances that don’t scale up or down – min = 1, max = 1
• Auto Scaling gives you the ability to specify multiple Availability Zones, even you only need a single host – gives you multi-AZ failover
• Auto Scaling supports notifications on instance creation/termination – Useful for configuring other resources, bootstrapping, and provisioning
• Auto Scaling is free!
82
11/12/2013
Auto Scaling
Min = 1, Max = 1, Desired = 1 Set this around a single host and get a self-healing pool of 1
Auto Scaling What other fun things can we do with Auto Scaling? Event triggers – Send messages to via Amazon SNS to a destination “topic” such as Amazon SQS, Email, Amazon SMS, or HTTP(S) POST • When? • Instance initialization • Instance termination • Errors of either
83
11/12/2013
How does knowing when an instance is created/terminated help us? From before: – AWS CloudFormation is passive. Won’t react to instance issues/failures. – AWS OpsWorks won’t handle out-of-band resources that it currently doesn’t support.
So...
How does knowing when an instance is created/terminated help us? From before: – AWS CloudFormation is passive. Won’t react to instance issues/failures. – AWS OpsWorks won’t handle out-of-band resources that it currently doesn’t support.
So...
Use these Auto Scaling notifications as a trigger to do other things
84
11/12/2013
Things such as? • Change Amazon VPC routes to point to a new NAT instance • Restore a backup Amazon EBS data volume snapshot • Remove hosts from monitoring/alerting • Interact with any other AWS resources • Configure something in another piece of software • Update DNS • Update third-party tools • Remove anything manual in getting this host running
New Automation Process 1. Auto Scaling does something – Start, stop, scale up, scale down 2. A notification is sent – Where?
Ideally somewhere scalable, redundant, multihomed, reliable, API-able
85
11/12/2013
Like Amazon SQS?
Like Amazon SQS?
YES! 86
11/12/2013
Amazon SQS • • • • • •
One of the first AWS services Extremely scalable – potentially millions of messages Extremely reliable – multi-AZ built in Simple – messages get sent in, messages get pulled out Secure – API credentials needed Inexpensive - $0.50 per 1,000,000 OR $0.0000005 per request –
Free tier: “You can get started with Amazon SQS for free. New and existing customers receive 1 million Amazon SQS queuing Requests for free each month.”
Amazon SQS • • • • • •
One of the first AWS services Extremely scalable – potentially millions of messages Extremely reliable – multi-AZ built in Simple – messages get sent in, messages get pulled out Secure – API credentials needed Inexpensive - $0.50 per 1,000,000 OR $0.0000005 per request –
Free tier: “You can get started with Amazon SQS for free. New and existing customers receive 1 million Amazon SQS queuing Requests for free each month.”
87
11/12/2013
New Automation Process 1. Auto Scaling does something – Start, stop, scale up, scale down 2. A notification is sent – To an Amazon SQS queue 3. Pull message out of Amazon SQS – With what?
New Automation Process 1. Auto Scaling does something – Start, stop, scale up, scale down 2. A notification is sent – To an Amazon SQS queue 3. Pull message out of Amazon SQS – With what?
Some instance we can use as a management host
88
11/12/2013
New Automation Process Some instance we can use as a management host: – That can be set up via AWS OpsWorks – Run Chef to install scripts/tools to interact with Amazon SQS and other things – Can be a very lightweight host ( t1.micro perhaps ) – Run cron on it to regularly poll Amazon SQS for changes to your infrastructure – Assign an Amazon EC2 IAM role giving it permissions to just what it needs
New Automation Process 1. Auto Scaling does something – Start, stop, scale up, scale down 2. A notification is sent – To an Amazon SQS queue 3. Pull message out of Amazon SQS – With a management host 4. Do something – Shell scripts??
89
11/12/2013
The problem with doing this as a huge shell script Large monolithic scripts lead to bad behaviors – Hard to handle chained actions with various logic – Hard to handle failure situations • What happens if a shell script dies mid way? • What happens if the host running this job dies? – What if various different systems need to do independent actions? – Separation of tasks and duties within an infrastructure? – Handling coordination of multiple flows and intertwining dependencies – SOA all the things!
We need a common way of reliably coordinating all the steps in our automation
90
11/12/2013
Amazon Simple Workflow (Amazon SWF) • • • • • • •
Orchestration tool across your infrastructure Use it as a middle layer to pass messages and setup tasks to be completed Break down individual tasks into different workers You define logic between workers Anything that can be scripted, can be made into a worker task Built in retries, timeouts, logging Low cost, reliability, and scalability built in
YOUR CODE =
Deciders
&
Amazon SWF
Workers
Amazon SWF Working with Amazon SWF – Terms to know: – Domain -> collection of workflows – Workflow -> collection of actions – Action -> task or workflow step
Actors: – Workflow starters -> start a workflow – Activity Workers -> implement actions – Deciders -> coordinate workflow actions
91
11/12/2013
Amazon SWF Other features: – No more than one delivery (unlike Amazon SQS) – Uses long polling, which reduces number of polls without results – Visibility of task state via API – Timers, signals, markers, child workflows – Supports versioning – Keeps workflow history for a user-specified time
Amazon SWF Deciders & Workers You make these. Independent, stateless, decoupled. Poll for work. Run in parallel to scale. Run multiple of each for different purposes.
92
11/12/2013
Amazon SWF
Amazon SWF EXAMPLE: Making a new NAT host host
Activity: Add an EIP Activity: Change Src/Dest Check Activity: Change VPC Routing Activity: Update monitoring
93
11/12/2013
New Automation Process 1. Auto Scaling does something – Start, stop, scale up, scale down 2. A notification is sent – To an Amazon SQS queue 3. Pull message out of Amazon SQS – With a management host 4. Do something
New Automation Process 1. Auto Scaling does something – Start, stop, scale up, scale down 2. A notification is sent – To an Amazon SQS queue 3. Pull message out of Amazon SQS – With a management host 4. Kick off Amazon SWF workflow – Takes actions to complete desired scaling flow
94
11/12/2013
New Automation Process 1. Auto Scaling does something – Start, stop, scale up, scale down 2. A notification is sent – To an Amazon SQS queue 3. Pull message out of Amazon SQS – With a management host 4. Kick off Amazon SWF workflow – Takes actions to complete desired scaling flow
Amazon SWF Other uses: – – – – –
Help infrastructure management Backup orchestration Dev/test environment refreshes Processes within your application Response to infrastructure incidents
What else can you think of??
95
11/12/2013
Lab 3 Adding Auto Scaling, Amazon SQS, and Amazon SWF to our infrastructure for total host life cycle automation
Lab 3 • Three main parts, set up Amazon SWF; set up Amazon SQS/Amazon SNS hooks and infrastructure helper host; re-launching our NAT instance in Auto Scaling; • One hour • One extra part: Terminate the NAT instance. See it fix itself.
96
11/12/2013
Lab 3 Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
SWF Worker Instance
Amazon CloudWatch
Amazon SNS
NAT Instance
AWS Amazon CloudFormation SQS
Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Taking AWS Operations to the Next Level Lab 3 Recap: 1. 2. 3.
Wrapped hosts in Auto Scaling to make self healing pools of 1 Took notifications from Auto Scaling through Amazon SQS and into Amazon SWF Used Amazon SWF to make out-of-band changes to our infrastructure to complete provisioning/configuration, and recover from incidents
97
11/12/2013
Taking AWS Operations to the Next Level Lab 3 Recap: 4. All of this automation for low cost (if not almost free) 5. Still mostly set up via AWS CloudFormation + AWS OpsWorks
Lab 3 Recap Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
SWF Worker Instance
Amazon CloudWatch
Amazon SNS
NAT Instance
AWS Amazon CloudFormation SQS
Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
98
11/12/2013
Wrapping It All Up We’ve covered a whole whole lot today: – Treating your infrastructure as code with AWS CloudFormation – Using Chef via AWS OpsWorks to control the software lifecycle on our instances – Building a scalable automation framework with Amazon SWF, Amazon SQS, Auto Scaling, etc.
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
SWF Worker Instance
Amazon CloudWatch
Amazon SNS
NAT Instance
AWS Amazon CloudFormation SQS
Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
99
11/12/2013
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
SWF Worker Instance
Amazon CloudWatch
Amazon SNS
NAT Instance
AWS Amazon CloudFormation SQS
Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Amazon CloudWatch
Amazon SNS
Leap/Bastion Instance
Availability Zone C
Virtual Private Cloud AWS US-West-2
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
SWF Worker Instance
NAT Instance
AWS Amazon CloudFormation SQS
Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
100
11/12/2013
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Public VPC Subnet
Web Instance Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
SWF Worker Instance
Amazon CloudWatch
Amazon SNS
NAT Instance
AWS Amazon CloudFormation SQS
Public VPC Subnet
Web Instance Availability Zone B
Internet Gateway
ELB
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Availability Zone C
Virtual Private Cloud AWS US-West-2
Wrapping it all up Where to go from here? – None of these things require greenfield, you can add them in pieces and small bits to what you do today on AWS – Start using software revision control for your infrastructure and server configurations ASAP – Start thinking about what manual things you do regularly that can be broken up and automated
101
11/12/2013
Wrapping it all up Where to go from here? – Documentation! • https://aws.amazon.com/documentation/ (reads like a romance novel, I swear!) – Free tier to play! • https://aws.amazon.com/free/ – Ask us!!! • Talk with your account reps, SAs, TAMs, etc.
Please give us your feedback on this presentation
Taking AWS Operations To the Next Level As a thank you, we will select prize winners daily for completed surveys!
102