AWS Loft Presentation-HERE CCI

Report 15 Downloads 29 Views
AWS Loft Peter Caron, HERE AWS Loft| November 16, 2016

Migrating and Running Continuous Integration Systems at Scale in AWS

Peter Caron AWS Loft, München | November 16, 2016

Agenda 1. Transition to Cloud Lift and Shift Crawl, Walk, Run, Sprint 2. CI / CD Overview 3. Challenges

HERE’s Business

HERE is one the world’s leading map data companies and is now able to deliver the next generation of mobility and location-based services.

HERE Products

HERE software serves map, traffic and location data to a variety of target platforms • • • •

HERE Open Location Platform Embedded Automobile Navigation Enterprise Extensions Mobile Apps

HERE’s Challenge

HERE needed a CI system that could meet complex and heterogenous deployments and releases that could scale.

Before the Cloud Jenkins In the Data Centre

Region #1

Build Systems ~5+ Builds per month 40T+ Tests cycle

Jenkins Under the Desk

Homemade Build Tools

Region #2

Region #x

CD Platform Pipelines 17 Services 2 Unique pipelines 1000+VMs on VMWare 1 ish Deployments/month 100s Acceptance tests/month ~10 Build runs / month

?

AWS Services in Production Jenkins Master EC2 instances

Amazon VPC

Amazon EFS

Jenkins Master EC2 instances

security group

security group

Amazon S3 Region #1

Region #2

Amazon Cloud Watch

Common CI Systems – CCI (Jenkins / Electric Flow) 110K+ Builds per day 25M+ Tests per day CI for Micro-services - JaaS (Jenkins as a Service) 130 Products and services

Spot Instances

CD Platform Pipelines (go as a Service) 36 Services AWS 668 Unique pipelines Device Farm 600+ VMs on AWS 40+ Deployments/month 100s Acceptance tests/day 1400+ Build runs / month

What kind(s) of integration and testing to use? Jenkins Unit testing

go integration testing

go deployment

Real Device testing

Mesos / Marathon

deployment orchestration

Real-time Data Services Static Data Services Micro Services

Customer Integration

Embedded and Downloaded Applications

Transition to Cloud Moving CI workstreams to AWS

Moving our CI / CD infrastructure to AWS … • • • • •

Git Gerrit Jenkins Go Splunk

It was a simple lift and shift from our local infrastructure

… and everything worked well from Day 1 Uh, not exactly!

.

Plan to Grow



Get your Workflow right •

• • • • •

i.e. Get your CI act together first

Know your Capacity and Limits Focus on Testing Set Expectations Internally Know your Fallback options Monitor changes (costs)

Create a Culture Change 1. Start small, iterate • A single developer group before your flagship product 2. Understand your changes • There is infrastructure outside the control of your developer. Don’t let is become Expensive Hosting 2.0 3. Infrastructure as Code is not just a buzz word • Apply it if you have one or more people using CI 4. Measure Results and Adapt WoW • Only react to verifiable metrics

What did we learn?

• • • • • • •

Don’t trust the plugins Capacity is always underestimated Costs will be high Plan fallback Trust the developers – just enough Moving the Cloud will help nothing People will use it … and what could we have done better?

Do Continuous Integration Moving CI ways of working to a Cloud

Client Pipelines Runs every 3 hrs Duration : 3hrs

Runs every day ? Duration : ?

Full Verification

Full Verification

E2E (Manual Tests)

Release candidate

Full Verification

Runs on each successful SV Duration < 20min

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Every 5 min Duration < 20min

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Mainline

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Service Pipelines Artifact

Full Verification

Full Verification

Full Verification

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Baseline Sanity Tests

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Pre-submit Verification

Challenges and Lessons Learned Maintaining a CI ways in a Cloud

Our biggest challenge Handling the Loads 2,499,109

81,283

Common CI Runs 2015-2016

Transparency Measurement and Monitoring

Dashboards provide visibility

What is the system doing?

Transparency Measure it, Use it

What slowed down this build?

Know what your system is doing!

Other Challenges EC2 Instance types and plugins • Build Rotator • Fluentd • BFA Plugin

• Hierarchy Killer Plugin • Timestamp Plugin

Special Challenges (Peaks and Valleys) • CO2 (Choose your AWS region first) • Performance (Watch your Queues) • Use Containers (Duh!)

The Advantages of CI in the Cloud • Security in our infrastructure • Stability: automated tests run reliably in a consistent infrastructure • Rapid scaling: slaves come online fast • Cost control: slaves go off-line fast • Common AWS tools are known to Engineers • High availability: Master servers are always available • Multi-regional presence reduces latency • Parallel builds and testing will reduced time and costs

Load and Scalability Number of Build Runs per day

Speed and Predictability Mean Duration of pre-commit validation runs

Final Thought

Avoid creating a big pile of poo!

Questions?

Thank you Contact Peter Caron Service Automation and Continuous Integration HERE Invalidenstrasse 116 10115 Berlin

[email protected]

Plugins Plugin name

Version affected

Issue

Action

Download

BuildRotator Plugin

---

LogRotator that comes with Jenkins tries to be much Update plugin. Replace smarter then needed. So, it "LogRotator" to loads entire job history at "BuildRotator" as build least twice to understand what discard mechanism could be removed and what - everywhere. not.

Fluentd

---

Send data to Fluentd

1.13.0 and earlier

When we have a huge amount Build failure analyzer => of aborted builds, BFA needs Advanced => "Ignore aborted to process all of them, that builds" option should be Available in Jenkins creates queue and slowdown enabled in Jenkins Jenkins/feedback itself. configuration.

BFA Plugin

Install and enjoy.

BuildRotator.hpi

fluentd.hpi

Plugins Plugin name

HierarchyKillerPlugin

Version affected

0.98 and earlier

Issue

Action

Download

When plugin goes to kill some item from queue, it kills first job in queue instead of killing job that was connected Update plugin and have fun. build-hierarchy-killer.hpi to upstream. FIX: Correct API call was used.

Timestamper Plugin

1.8.4 and earlier

Even then Jenkins needs only last 150 KB, plugin reads entire log (because of the # of users we have up to 3 GB) to calculate timestamp for last X lines. Update plugin and enjoy. Main problem that plugin stores timestamps in encoded format - VarInt. FIX: Read only last 150 KB of logs for finished builds.

Available in Jenkins

Contributions Plugins • S3 • BFA • DSL • EC2 • Unit • Gerrit • ccache

Core • Jenkins • XML library

Recommend Documents