Taking AWS Operations to the Next Level
Taking AWS Operations to the Next Level
Welcome! • • •
Take a seat! Meet your neighbors! We’ll be starting a bit after 9.
Today: 9am-5pm Hour break for lunch 11:30-12:30 • •
Bio breaks during labs are cool too If we get done early? To the bar/tables!
Some presentation, some hands on labs Ask questions, but let’s not get too sidetracked Everyone is here to learn and level up Mixed backgrounds, experience, titles, companies, industries, countries
Welcome to the AWS Summit 2014 Me: Chris Munns(
[email protected]) AWS Solutions Architect based in NYC Previously Senior Sys-Ops Engineer at a number of places like
Etsy and Meetup
Also: (today’s helpers)
Today’s Goals: Mine: •
Have everyone here learn something new and feel confident in that knowledge to take action on it when you return to your normal day to day. That action should make your infrastructure(and hence life and business) better.
Today’s Goals: Yours: •
•
•
Understand the power of AWS CloudFormation to create, provision, manage, and update your infrastructure on AWS Use host based application configuration management tools and methodology to manage the systems and applications living inside your infrastructure Combine the above with the Amazon Simple Workflow Service, service APIs and other tools, to automate routine tasks in your infrastructure
Sound good?
https://secure.flickr.com/photos/stevendepolo/5749192025/
Taking AWS Operations to the Next Level
Taking AWS Operations to the Next Level
Taking AWS Operations to the Next Level
Taking AWS Operations to the Next Level
Operations: “Systems Operations refers to a team, or possibly even a department within the IT group, which is responsible for the running of the centralized systems and networks. “ – http://www.yourwindow.to/information-security/gl_systemsoperations.htm
Operations: Creating, modifying, provisioning, updating systems, software and networks Perform day to day tasks to keep the infrastructure: Available? Growing? Scaling? Performing? Secure?
Operations: • Creating, modifying, provisioning, updating systems, software and networks • Perform day to day tasks to keep the infrastructure: Working the way the business needs or will need for the business to meet or exceed its goals
Operations: • Creating, modifying, provisioning, updating systems, software and networks • Perform day to day tasks to keep the infrastructure:
HELPING THE BUSINESS MAKE MONEY
Next Level: How many RPG (role playing game) fans do we have in the room?
http://diablo3crack.com/images/Demon-Hunter-Level-Up-Diablo-3.png
Next Level: In RPGs, reaching the next level (leveling up) can mean you are now better and more capable at what you were doing before Be Smarter Be Stronger Be Faster Kill monsters/bad guys better. Win.
Next Level: In operations on AWS, reaching the next level (leveling up) can mean you are now better and more capable at what you were just doing before Better use of APIs & automation ( Be Smarter ) Better availability/uptime ( Be Stronger ) Better agility as an organization ( Be Faster ) AWS gives you the tools. ( to kill those infrastructure bad guys )
http://findingagap.files.wordpress.com/2012/08/level-up.jpg
Today’s Agenda: Cover three tools that will help you to take things to the next level: AWS CloudFormation Host based configuration management with
Chef + AWS OpsWorks Amazon Simple WorkFlow Service + APIs
Today’s Agenda: Going to be covering this from a Linux/open source perspective. ( sorry Windows d00ds!) The big picture is around automation and ease of deployment These are just some examples, but lots of flexibility and options
Today’s Agenda: We’re going to build out a highly available, scalable, and self healing WordPress site. It will live inside a VPC, be served via ELB, make use of OpsWorks for deployment and management of the web tier, and use ElastiCache and RDS for caching and database. We’ll also use SWF later to help with some of the selfhealing aspects of our infrastructure.
Our Infrastructure Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
Amazon CloudWatch
Public VPC Subnet
SWF Worker Instance
NAT Instance
Amazon SNS
AWS Amazon CloudFormation SQS
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
AWS CloudFormation
AWS CloudFormation AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion First released in 2010
AWS CloudFormation Templates to describe the AWS resources and any associated dependencies or runtime parameters required to run your application
AWS CloudFormation You don’t need to figure out the order in which AWS services need to be provisioned or the subtleties of how to make those dependencies work
AWS CloudFormation Once deployed, you can modify and update the AWS resources in a controlled and predictable way, allowing you to version your AWS infrastructure in the same way as you version your software
AWS CloudFormation
CloudFormation takes care of this for you
AWS CloudFormation AWS CloudFormation is available at no additional charge, and you pay only for the AWS resources needed to run your applications
AWS CloudFormation Templates to describe the AWS resources Modify and update your AWS resources in a controlled and predictable way Version control your AWS infrastructure
AWS CloudFormation Templates to describe the AWS resources Modify and update your AWS resources in a controlled and predictable way Version control your AWS infrastructure
Anatomy of a template
JSON
Perfect for version control
Plain text
JSON Validatable
Declarative language
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "AWS CloudFormation Sample Template EC2InstanceSample: Create an Amazon EC2 instance running the Amazon Linux AMI. The AMI is chosen based on the region in which the stack is run. This example uses the default security group, so to SSH to the new instance using the KeyPair you enter, you will need to have port 22 open in your default security group. **WARNING** This template an Amazon EC2 instances. You will be billed for the AWS resources used if you create a stack from this template.", "Parameters" : { "KeyName" : { "Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance", "Type" : "String" } }, "Mappings" : { "RegionMap" : { "us-east-1" : { "AMI" : "ami-7f418316" }, "us-west-1" : { "AMI" : "ami-951945d0" }, "us-west-2" : { "AMI" : "ami-16fd7026" }, "eu-west-1" : { "AMI" : "ami-24506250" }, "sa-east-1" : { "AMI" : "ami-3e3be423" }, "ap-southeast-1" : { "AMI" : "ami-74dda626" }, "ap-northeast-1" : { "AMI" : "ami-dcfa4edd" } } }, "Resources" : { "Ec2Instance" : { "Type" : "AWS::EC2::Instance", "Properties" : { "KeyName" : { "Ref" : "KeyName" }, "ImageId" : { "Fn::FindInMap" : [ "RegionMap", { "Ref" : "AWS::Region" }, "AMI" ]}, "UserData" : { "Fn::Base64" : "80" } } } },
}
"Outputs" : { "InstanceId" : { "Description" : "InstanceId of the newly created EC2 instance", "Value" : { "Ref" : "Ec2Instance" } }, "AZ" : { "Description" : "Availability Zone of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "AvailabilityZone" ] } }, "PublicDNS" : { "Description" : "Public DNSName of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "PublicDnsName" ] } } }
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Description" : "AWS CloudFormation Sample Template EC2InstanceSample: Create an Amazon EC2 instance running the Amazon Linux AMI. The AMI is chosen based on the region in which the stack is run. This example uses the default security group, so to SSH to the new instance using the KeyPair you enter, you will need to have port 22 open in your default security group. **WARNING** This template an Amazon EC2 instances. You will be billed for the AWS resources used if you create a stack from this template.", "Parameters" : { "KeyName" : { "Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance", "Type" : "String" } }, "Mappings" : { "RegionMap" : { "us-east-1" : { "AMI" : "ami-7f418316" }, "us-west-1" : { "AMI" : "ami-951945d0" }, "us-west-2" : { "AMI" : "ami-16fd7026" }, "eu-west-1" : { "AMI" : "ami-24506250" }, "sa-east-1" : { "AMI" : "ami-3e3be423" }, "ap-southeast-1" : { "AMI" : "ami-74dda626" }, "ap-northeast-1" : { "AMI" : "ami-dcfa4edd" } } },
MAPPINGS
"Resources" : { "Ec2Instance" : { "Type" : "AWS::EC2::Instance", "Properties" : { "KeyName" : { "Ref" : "KeyName" }, "ImageId" : { "Fn::FindInMap" : [ "RegionMap", { "Ref" : "AWS::Region" }, "AMI" ]}, "UserData" : { "Fn::Base64" : "80" } } } },
}
PARAMETERS
"Outputs" : { "InstanceId" : { "Description" : "InstanceId of the newly created EC2 instance", "Value" : { "Ref" : "Ec2Instance" } }, "AZ" : { "Description" : "Availability Zone of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "AvailabilityZone" ] } }, "PublicDNS" : { "Description" : "Public DNSName of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "PublicDnsName" ] } } }
RESOURCES
OUTPUTS
HEADERS
Parameters Provision-time specification Command-line options
Mappings Conditionals Case statements
Resources
Outputs
AWS CloudFormation Resources: Almost any AWS service What’s missing(right now)? • • • • • •
Amazon Elastic MapReduce (EMR) Amazon Simple Workflow Service (SWF) Amazon Simple Email Service (SES) Amazon Glacier Amazon CloudSearch Might be small new features from other services not yet incorporated
Let us know what you need and how badly!
AWS CloudFormation Resources – Amazon Elastic Compute Cloud (EC2): { "Type" : "AWS::EC2::Instance", "Properties" : { "AvailabilityZone" : String, "DisableApiTermination" : Boolean, "EbsOptimized" : Boolean, "IamInstanceProfile" : String, "ImageId" : String, "InstanceType" : String,
AWS CloudFormation Resources – Amazon EC2: "KernelId" : String, "KeyName" : String, "Monitoring" : Boolean, "PlacementGroupName" : String, "PrivateIpAddress" : String, "RamdiskId" : String, "SecurityGroupIds" : [ String, ... ], "SecurityGroups" : [ String, ... ],
AWS CloudFormation Resources – Amazon EC2: "SourceDestCheck" : Boolean, "SubnetId" : String, "Tags" : [ EC2 Tag, ... ], "Tenancy" : String, "UserData" : String, "Volumes" : [ EC2 MountPoint, ... ] } }
AWS CloudFormation /dev/null 2>&1" end
Being a Chef Defining a role: ( frontends_role.json) { "name": ”Frontends", "chef_type": "role", "json_class": "Chef::Role", "default_attributes": { }, "description": ”Front End Web hosts", "run_list": [ "role[Base]", “recipe[apache::frontend]”, “recipe[syslog::frontend]”, “recipe[ganglia]” ], "override_attributes": { …
Being a Chef Defining a role: ( frontends_role.json) …. "override_attributes": { "ganglia": { "config": "/etc/ganglia/gmond.conf", "gmetad_name": [ ”ganglia.munnsdemo.prv" ], "cluster_name": [ ”FrontEnds" ] } } }
Being a Chef In recipe:
In Attributes:
variables( :gmetad_name => node[:ganglia][:gmetad_name], :cluster_name => node[:ganglia][:cluster_name] )
{ "ganglia": { "config": "/etc/ganglia/gmond.conf", "gmetad_name": [ "ganglia.munnsdemo.prv" ], "cluster_name": [ ”FrontEnds" ] } }
In template: … name = "” …. host = port = 8649 ttl = 1 }
Being a Chef In recipe:
In Attributes:
variables( :gmetad_name => node[:ganglia][:gmetad_name], :cluster_name => node[:ganglia][:cluster_name] )
{ "ganglia": { "config": "/etc/ganglia/gmond.conf", "gmetad_name": [ "ganglia.munnsdemo.prv" ], "cluster_name": [ ”FrontEnds" ] } }
In template: … name = "” …. host = port = 8649 ttl = 1 }
Working with Chef Chef main application components: Chef-solo
OR Chef-client & Chef-server
Working with Chef Chef-server + Chef-client options: Hosted Chef –
“Get instant access to a highly available, dynamically scalable, fully managed and supported automation environment - powered by the experts at Opscode.”
Private Chef –
“All the benefits of Hosted Chef, delivered with 24/7/365 support and implementation consulting, installed as enterprise software behind the corporate firewall.”
Open Source –
You do it all. Not that difficult, community supported, free.
Working with Chef
OR…
AWS OpsWorks Integrated application management solution for opsminded developers and IT admins Model, control and automate applications of nearly any scale and complexity Management Console, SDKs, or CLI No additional cost
AWS OpsWorks SIMPLE Easy to use, quick to get started and productive
PRODUCTIVE
FLEXIBLE
POWERFUL
Reduces errors with conventions and scripted configuration
Simplifies deployments of any scale and complexity
Reduces cost and time with automation
SECURE Enables control with fine grained permissions
AWS OpsWorks
A stack represents the cloud infrastructure and applications that you want to manage together.
A layer defines how to setup and configure a set of instances and related resources.
Decide how to scale: manually, with 24/7 instances, or automatically, with load-based or timebased instances.
Then deploy your app to specific instances and customize the deployment with Chef recipes.
AWS OpsWorks Instance Lifecycle Agent on each EC2 instance understands a set of commands that are triggered by OpsWorks. The agent then runs a Chef solo run.
Setup
Configure
Deploy
Undeploy
Shutdown
AWS OpsWorks Instance Lifecycle configure all instances get configure event
online
running setup
configure all instances get configure event
terminating
instance gets Setup event
instance gets shutdown event
booting shutting down pending
requested new / stopped
OpsWorks Agent Communication 1. “Deploy App”
2.
EC2 Instance OpsWorks Service
Your repo, e.g. GitHub
3. 4. 5. 6. 7.
Instance connects with OpsWorks service to send keep alive heartbeat and receive lifecycle events OpsWorks sends lifecycle event with pointer to configuration JSON (metadata, recipes) in S3 bucket Download configuration JSON Pull recipe and other build assets from your repo Execute recipe with metadata Upload Chef log Report Chef run status
How OpsWorks Bootstraps the EC2 instance Instance is started with IAM role UserData passed with instance private key, OpsWorks public key Instance downloads and installs OpsWorks agent
Agent connects to instance service, gets run info Authenticate instance using instance’s IAM role Pick-up configuration JSON from the OpsWorks instance queue Decrypt & verify message, run Chef recipes Upload Chef log, return Chef run status
Agent polls instance service for more messages
AWS OpsWorks + Chef OpsWorks uses Chef Solo to configure the software on the instance OpsWorks provides many Chef Server functions to users. Associate cookbooks with instances Dynamic metadata that describes each registered node in the
infrastructure
Supports "Push" Command and Control Client Runs With Chef 11, broader support for community cookbooks
AWS OpsWorks + Chef Predefined Layers: ELB HAProxy MySQL Memcached Ganglia
Ruby on Rails Node.js PHP Static Web Server Java
AWS OpsWorks + Chef Differences between OpsWorks & Chef-Server Recipes:
Environments = Stacks Data bags = Custom JSON / S3 Search = Stack Configuration / Deployment JSON
AWS OpsWorks Chef Recipes github.com/aws/opsworks-cookbooks
AWS OpsWorks + Chef include_recipe 'deploy‘ node[:deploy].each do |application, deploy| template "#{deploy[:current_path]}/wp-config.php" do source "wp-config.php.erb" owner "root" group "root" mode "0644" variables( :database => node['wordpress']['db']['database'], :user => node['wordpress']['db']['user'], :password => node['wordpress']['db']['password'], :dbhost => node['wordpress']['dbhost'], :lang => node['wordpress']['languages']['lang'], :cachenode => node['wordpress']['cachenode'] ) end end
AWS OpsWorks + Chef Resources to learn more: https://aws.amazon.com/opsworks/ https://aws.amazon.com/documentation/opsworks/ https://github.com/aws/opsworks-cookbooks http://www.opscode.com/ http://www.opscode.com/chef http://wiki.opscode.com/display/chef/Documentation
Lab 2 Deploying our App Using AWS OpsWorks
Lab 2 Three main parts: Add RDS to our infrastructure with CloudFormation Deploy our WordPress site with OpsWorks Modify WordPress adding in ElastiCache Memcached, a caching
module to the code, and deploying in OpsWorks
One hour. Again, ask questions if stumped!
Lab 2 Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ElastiCache
NAT Instance
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
Amazon CloudWatch
Public VPC Subnet
AWS CloudFormation
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
Amazon S3 ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Leap/Bastion Instance
AWS OpsWorks
Taking AWS Operations to the Next Level Lab 2 Recap: 1. 2.
3.
Added in host based configuration management to help with software/configuration management on the host Using AWS CloudFormation with OpsWorks together helps you get from nothing to fully-running infrastructure with lots of ongoing benefits, easier maintainability Keep track of the who/what/when/why of your infrastructure/servers/applications
Taking AWS Operations to the Next Level Lab 2 Recap: Continue to make use of software revision
control/testing techniques for our Chef recipes Using OpsWorks makes deployment/configuration even easier Can start off building this onto your current infrastructure today, no need for greenfield
Lab 2 Recap Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ElastiCache
NAT Instance
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
Amazon CloudWatch
Public VPC Subnet
AWS CloudFormation
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
Amazon S3 ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Leap/Bastion Instance
AWS OpsWorks
http://nintendobuddyboards.freeforums.org/sonic-3-and-knuckles-t402.html
So now we have provisioning and configuration down pretty well. But what could be missing?
Taking AWS Operations to the Next Level What about resources that aren’t easy to use AWS CloudFormation or Chef on? What about responding to failure/incidents? What might wake us up at 4am when we’re on call? What about SPOFs? What other routine things can be automated?
Taking AWS Operations to the Next Level
Some current deficiencies from what we’ve covered today: AWS CloudFormation is passive and won’t react to
instance issues/failures •
How to re-assign resources like Amazon VPC routes, ENIs, EBS?
OpsWorks won’t handle out of band resources that it
currently doesn’t support Pools of hosts, clusters, and things that shard are
easy (one-offs, not so much)
What’s the missing piece? https://secure.flickr.com/photos/moxievision/470346677/
AWS Auto Scaling Amazon Simple Work Flow AWS APIs and a little elbow grease
Traffic to our site vs Provisioned Capacity Manually Provisioned capacity
Traffic to our site vs Provisioned Capacity Manually 76%
Provisioned capacity
24%
Traffic to our site vs Provisioned Capacity with Auto-Scaling
Provisioned capacity
Auto-Scaling Works with Amazon EC2 Triggers via Amazon CloudWatch alarms, scheduled actions, or manually Works with spot pricing! Scale up number of instances of a certain class Scale down number of instances of a certain class OR
Auto-Scaling You can use Auto-Scaling for singular instances that don’t scale up or down min = 1, max = 1
Auto-Scaling gives you the ability to specify multiple Availability Zones, even you only need a single host gives you multi-AZ failover
Auto-Scaling supports notifications on instance creation/termination Useful for configuring other resources, bootstrapping, and
provisioning
Auto-Scaling is free!
Auto-Scaling Min = 1, Max = 1, Desired = 1 Set this around a single host and get a self-healing pool of 1
Auto-Scaling What other fun things can we do with Auto Scaling? Event triggers Send messages to via Amazon SNS to a destination
“topic” such as Amazon SQS, Email, SMS, or HTTP(S) Post • When? • Instance initialization • Instance termination • Errors of either
How does knowing when an instance is created/terminated help us? From before: AWS CloudFormation is passive. Won’t react to instance
issues/failures OpsWorks won’t handle out of band resources that it currently doesn’t
support
So...
How does knowing when an instance is created/terminated help us? From before: AWS CloudFormation is passive. Won’t react to instance
issues/failures OpsWorks won’t handle out of band resources that it currently doesn’t
support
So...
Use these Auto Scaling notifications as a trigger to do other things
Things such as? Change Amazon VPC routes to point to a new NAT instance Restore a backup Amazon EBS data volume snapshot Remove hosts from monitoring/alerting Interact with any other AWS resources Configure something in another piece of software Update DNS Update third-party tools Remove anything manual in getting this host running
New Automation Process 1. Auto Scaling does something Start, stop, scale up, scale down 2. A notification is sent Where?
Ideally somewhere scalable, redundant, multihomed, reliable, API-able
Like Amazon SQS?
Like Amazon SQS?
YES!
Amazon SQS One of the first AWS services Extremely scalable – potentially millions of messages Extremely reliable – multi-AZ built in Simple – messages get sent in, messages get pulled out Secure – API credentials needed Inexpensive - $0.50 per 1,000,000 OR $0.0000005 per request
Free tier: “You can get started with Amazon SQS for free. New and existing customers receive 1 million Amazon SQS queuing Requests for free each month.”
New Automation Process 1. Auto Scaling does something Start, stop, scale up, scale down 2. A notification is sent To an Amazon SQS queue 3. Pull message out of Amazon SQS With what?
New Automation Process 1. Auto Scaling does something Start, stop, scale up, scale down 2. A notification is sent To an Amazon SQS queue 3. Pull message out of Amazon SQS With what?
Some instance we can use as a management host
New Automation Process Some instance we can use as a management host: That can be set up via AWS OpsWorks Run Chef to install scripts/tools to interact with Amazon SQS and
other things Can be a very lightweight host ( t1.micro perhaps ) Run cron on it to regularly poll Amazon SQS for changes to your infrastructure Assign an Amazon EC2 IAM role giving it permissions to just what it needs
New Automation Process 1. Auto Scaling does something Start, stop, scale up, scale down 2. A notification is sent To an Amazon SQS queue 3. Pull message out of Amazon SQS With a management host 4. Do something Shell scripts??
The problem with doing this as a huge shell script Large monolithic scripts lead to bad behaviors Hard to handle chained actions with various logic Hard to handle failure situations
What happens if a shell script dies mid way? • What happens if the host running this job dies? What if various different systems need to do independent actions? Separation of tasks and duties within an infrastructure? Handling coordination of multiple flows and intertwining dependencies SOA all the things! •
We need a common way of reliably coordinating all the steps in our automation
Amazon Simple Workflow (SFW) Orchestration tool across your infrastructure Use it as a middle layer to pass messages and setup tasks to be completed Break down individual tasks into different workers You define logic between workers Anything that can be scripted, can be made into a worker task Built in retries, timeouts, logging Low cost, reliability, and scalability built in
YOUR CODE =
Deciders
&
Workers
Amazon SWF
Amazon Simple Workflow (SFW) Working with SWF – Terms to know: Domain -> collection of workflows Workflow -> collection of actions Action -> task or workflow step
Actors: Workflow starters -> start a workflow Activity Workers -> implement actions Deciders -> coordinate workflow actions
Amazon Simple Workflow (SFW) Other features: No more than one delivery (unlike Amazon SQS) Uses long polling, which reduces number of polls
without results Visibility of task state via API Timers, signals, markers, child workflows Supports versioning Keeps workflow history for a user-specified time
Amazon Simple Workflow (SFW) Deciders & Workers You make these. Independent, stateless, decoupled. Poll for work. Run in parallel to scale. Run multiple of each for different purposes.
Amazon Simple Workflow Service ( SWF )
Amazon Simple Workflow Service ( SWF ) EXAMPLE: Making a new NAT host host
Activity: Add an EIP Activity: Change Src/Dest Check Activity: Change VPC Routing Activity: Update monitoring
New Automation Process 1. Auto Scaling does something Start, stop, scale up, scale down 2. A notification is sent To an Amazon SQS queue 3. Pull message out of Amazon SQS With a management host 4. Do something
New Automation Process 1. Auto Scaling does something Start, stop, scale up, scale down 2. A notification is sent To an Amazon SQS queue 3. Pull message out of Amazon SQS With a management host 4. Kick off Amazon SWF workflow Takes actions to complete desired scaling flow
New Automation Process 1. Auto Scaling does something Start, stop, scale up, scale down 2. A notification is sent To an Amazon SQS queue 3. Pull message out of Amazon SQS With a management host 4. Kick off Amazon SWF workflow Takes actions to complete desired scaling flow
Amazon Simple Workflow Service ( SWF ) Other uses: Help infrastructure management Backup orchestration Dev/test environment refreshes Processes within your application Response to infrastructure incidents
What else can you think of??
Lab 3 Adding Auto Scaling, Amazon SQS, and Amazon SWF to our infrastructure for total host life cycle automation
Lab 3 Three main parts, set up Amazon SWF; set up Amazon SQS/SNS hooks and infrastructure helper host; re-launching our NAT instance in Auto Scaling; One hour One extra part: Terminate the NAT instance. See it fix itself.
Lab 3 Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
Amazon CloudWatch
Public VPC Subnet
SWF Worker Instance
NAT Instance
Amazon SNS
AWS Amazon CloudFormation SQS
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Taking AWS Operations to the Next Level Lab 3 Recap: Wrapped hosts in Auto Scaling to make self healing pools of 1 2. Took notifications from Auto Scaling through Amazon SQS and into Amazon SWF 3. Used Amazon SWF to make out-of-band changes to our infrastructure to complete provisioning/configuration, and recover from incidents 1.
Taking AWS Operations to the Next Level Lab 3 Recap: All of this automation for low cost (if not almost free) 5. Still mostly set up via AWS CloudFormation + OpsWorks 4.
Lab 3 Recap Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
Amazon CloudWatch
Public VPC Subnet
SWF Worker Instance
NAT Instance
Amazon SNS
AWS Amazon CloudFormation SQS
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Wrapping It All Up We’ve covered a whole whole lot today: Treating your infrastructure as code with AWS
CloudFormation Using Chef via OpsWorks to control the software lifecycle on our instances Building a scalable automation framework with Amazon SWF, Amazon SQS, Auto Scaling, etc
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
Amazon CloudWatch
Public VPC Subnet
SWF Worker Instance
NAT Instance
Amazon SNS
AWS Amazon CloudFormation SQS
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
Amazon CloudWatch
Public VPC Subnet
SWF Worker Instance
NAT Instance
Amazon SNS
AWS Amazon CloudFormation SQS
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
Amazon CloudWatch
Public VPC Subnet
SWF Worker Instance
NAT Instance
Amazon SNS
AWS Amazon CloudFormation SQS
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Wrapping It All Up Private VPC Subnet
RDS DB Instance Primary ( Multi-AZ)
Web Instance
Availability Zone A
ELB
Private VPC Subnet
RDS DB Instance Standby (Multi-AZ)
ElastiCache
Amazon CloudWatch
Public VPC Subnet
SWF Worker Instance
NAT Instance
Amazon SNS
AWS Amazon CloudFormation SQS
Public VPC Subnet
Internet Gateway
ELB
Web Instance Availability Zone B
Private VPC Subnet
Public VPC Subnet
ELB
Web Instance Availability Zone C
Virtual Private Cloud AWS US-West-2
Amazon S3
Amazon SWF
AWS OpsWorks
EC2 API
Leap/Bastion Instance
Wrapping It All Up Where to go from here? None of these things require greenfield, you can add
them in pieces and small bits to what you do today on AWS Start using software revision control for your infrastructure and server configurations ASAP Start thinking about what manual things you do regularly that can be broken up and automated
Wrapping It All Up Where to go from here? Documentation!
https://aws.amazon.com/documentation/ (reads like a romance novel, I swear!) Free tier to play! • https://aws.amazon.com/free/ Ask us!!! • Talk with your account reps, SAs, TAMs, etc. •
Taking AWS Operations To the Next Level