You are viewing content from a past/completed QCon

Presentation: Lessons Learned from Reviewing 150 Infrastructures

Track: Kubernetes and Cloud Architectures

Location: Churchill, G flr.

Duration: 10:35am - 11:25am

Day of week: Wednesday

Slides: Download Slides

Share this on:

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Hear about Amazon’s Well-Architected framework.
  2. Find out some of the cloud infrastructure tips.


Since April 2018 we've had the opportunity to perform a structured review of the architectural and operational choices of 150 platform teams. In this talk I'll explore some themes, talk about common mistakes, and give some advice on how to avoid these yourselves. The review tool we use is part of the AWS Well-Architected program, but this session is relevant whether or not you're an AWS user.


Please introduce yourself and also tell us what is the work that you are working today.


I’m Jon Topper, founder and CTO at The Scale Factory. We’re a cloud infrastructure consultancy based in London, UK. We work with clients of all sizes, across a range of market sectors. We’re an Amazon Web Services Advanced Consulting Partner, and in the last year we’ve done a lot of work with an AWS program called Well-Architected. AWS have shared with us the review framework used by their own Solutions Architects when they engage with customers. This tool lets us go out and discover how our clients are using the cloud, how they’re thinking about security, cost, availability, performance, and operations. We joined the program in April 2018 and since then we’ve had the opportunity to review about 150 platforms. We’ve learned a lot about how people are using the cloud, and what things they get wrong most frequently.


Is the goal of the talk to share these lessons learned?


Yes, that’s right. Being able to look at this many different infrastructures is a fairly unique perspective, and my theory is that the trends we’ve discovered probably speak to how the wider industry is thinking about building their cloud platforms.


Can you give us a sneak preview of what is the most common mistake that you encounter?


For the majority of teams we talk to, the weakest area they have is the pillar of the framework called “Operational Excellence”. This is about how teams make operational decisions, how they share information through runbooks and playbooks, and how to go about solving problems when things aren’t working properly. Most teams we’ve reviewed seem to do a bad job of this in some way - either by not thinking adequately about how to monitor their platforms, or by failing to think about or design for common failure modes.


Can you explain a little bit more in detail what is Well-Architected providing?


Well-Architected has two main areas. It's a set of white papers and guidance on how to build infrastructure on AWS. It's also a review tool that's in the Amazon console. If you’re an Amazon user, you can go and use it today. It asks around 60 to 70 questions about how you’re using the platform and then uses your answers to score you and make recommendations about what you should be looking at next.


It's focused on AWS as a platform, right?


Yes. But the learnings that we've come to are broadly applicable. I think it's probably the case that people on Google Cloud and Azure and others are making similar errors on those platforms.. But the Well-Architectured framework is very much an Amazon tool.


What do you want the people to leave the talk with?


When we run reviews with customers, often they’re thinking about some of these architectural considerations for the very first time. I’m hoping that the audience leaving my talk will also leave with that sort of new perspective. Hopefully they’ll have a few things that they’ll take away and look at in more detail, which will help them avoid some of the common operational or security mistakes we see regularly.


Cost efficiency is also very important. I remember setting up my first DynamoDB, it was very expensive. I could have benefited from the framework.


The review framework has a whole pillar on Cost Optimisation, and a lot of this is about planning and governance. This is most relevant for bigger businesses who have a lot of different workloads. Smaller businesses and startups are less worried about cost because they understand that the cloud is giving them an opportunity to move quicker. In the early days they’re not too worried about spending, because they know they can take care of that later, and that’s a reasonable business decision to make.

Speaker: Jon Topper

CTO / CEO @scalefactory

Jon Topper runs The Scale Factory, a team of cloud infrastructure and DevOps experts based in London, UK. He's worked on infrastructure problems for Fortune 500 companies, and startups, across a range of market sectors.

Find Jon Topper at

Last Year's Tracks