Presentation: Lessons Learned from Reviewing 150 Infrastructures

Track: Kubernetes and Cloud Architectures

Location: Churchill, G flr.

Duration: 10:35am - 11:25am

Day of week: Wednesday

Share this on:

What You’ll Learn

  1. Hear about Amazon’s Well-Architected framework.
  2. Find out some of the cloud infrastructure tips.


Since April 2018 we've had the opportunity to perform a structured review of the architectural and operational choices of 150 platform teams. In this talk I'll explore some themes, talk about common mistakes, and give some advice on how to avoid these yourselves. The review tool we use is part of the AWS Well-Architected program, but this session is relevant whether or not you're an AWS user.


Please introduce yourself and also tell us what is the work that you are working today.


I’m Jon Topper, founder and CTO at The Scale Factory. We’re a cloud infrastructure consultancy based in London, UK. We work with clients of all sizes, across a range of market sectors. We’re an Amazon Web Services Advanced Consulting Partner, and in the last year we’ve done a lot of work with an AWS program called Well-Architected. AWS have shared with us the review framework used by their own Solutions Architects when they engage with customers. This tool lets us go out and discover how our clients are using the cloud, how they’re thinking about security, cost, availability, performance, and operations. We joined the program in April 2018 and since then we’ve had the opportunity to review about 150 platforms. We’ve learned a lot about how people are using the cloud, and what things they get wrong most frequently.


Is the goal of the talk to share these lessons learned?


Yes, that’s right. Being able to look at this many different infrastructures is a fairly unique perspective, and my theory is that the trends we’ve discovered probably speak to how the wider industry is thinking about building their cloud platforms.


Can you give us a sneak preview of what is the most common mistake that you encounter?


For the majority of teams we talk to, the weakest area they have is the pillar of the framework called “Operational Excellence”. This is about how teams make operational decisions, how they share information through runbooks and playbooks, and how to go about solving problems when things aren’t working properly. Most teams we’ve reviewed seem to do a bad job of this in some way - either by not thinking adequately about how to monitor their platforms, or by failing to think about or design for common failure modes.


Can you explain a little bit more in detail what is Well-Architected providing?


Well-Architected has two main areas. It's a set of white papers and guidance on how to build infrastructure on AWS. It's also a review tool that's in the Amazon console. If you’re an Amazon user, you can go and use it today. It asks around 60 to 70 questions about how you’re using the platform and then uses your answers to score you and make recommendations about what you should be looking at next.


It's focused on AWS as a platform, right?


Yes. But the learnings that we've come to are broadly applicable. I think it's probably the case that people on Google Cloud and Azure and others are making similar errors on those platforms.. But the Well-Architectured framework is very much an Amazon tool.


What do you want the people to leave the talk with?


When we run reviews with customers, often they’re thinking about some of these architectural considerations for the very first time. I’m hoping that the audience leaving my talk will also leave with that sort of new perspective. Hopefully they’ll have a few things that they’ll take away and look at in more detail, which will help them avoid some of the common operational or security mistakes we see regularly.


Cost efficiency is also very important. I remember setting up my first DynamoDB, it was very expensive. I could have benefited from the framework.


The review framework has a whole pillar on Cost Optimisation, and a lot of this is about planning and governance. This is most relevant for bigger businesses who have a lot of different workloads. Smaller businesses and startups are less worried about cost because they understand that the cloud is giving them an opportunity to move quicker. In the early days they’re not too worried about spending, because they know they can take care of that later, and that’s a reasonable business decision to make.

Speaker: Jon Topper

CTO / CEO @scalefactory

Jon Topper runs The Scale Factory, a team of cloud infrastructure and DevOps experts based in London, UK. He's worked on infrastructure problems for Fortune 500 companies, and startups, across a range of market sectors.

Find Jon Topper at

Similar Talks

Scaling N26 Technology Through Hypergrowth


Software Engineer and Tech Lead @N26

Folger Fonseca

Monitoring All the Things: Keeping Track of a Mixed Estate


Principal Engineer Operations and Reliability Programme @FT

Luke Blaney

3 Disciplines for Leading a Distributed Agile Organization


Distributed Coach/Mentor & Community Cultivator

Mark Kilby

Why Distributed Systems Are Hard


Software Engineer @Pivotal

Denise Yu

A Brief History of the Future of the API


Co-Author of gRPC for WCF Developers and Creator @VisualRecode

Mark Rendle

Preparing for the Unexpected


Principal Engineer @FinancialTimes

Samuel Parkinson


  • Architectures You've Always Wondered About

    Hard-earned lessons from the names you know on scalability, reliability, security, and performance.

  • Machine Learning: The Latest Innovations

    AI and machine learning is more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice.

  • Kubernetes and Cloud Architectures

    Learn about cloud native architectural approaches from the leading industry experts who have operated Kubernetes and FaaS at scale, and explore the associated modern DevOps practices.

  • Evolving Java

    JVM futures, JIT directions and improvements to the runtimes stack is the theme of this year’s JVM track.

  • Next Generation Microservices: Building Distributed Systems the Right Way

    Microservice-based applications are everywhere, but well-built distributed systems are not so common. Early adopters of microservices share their insights on how to design systems the right way.

  • Chaos and Resilience: Architecting for Success

    Making systems resilient involves people and tech. Learn about strategies being used, from cognitive systems engineering to chaos engineering.

  • The Future of the API: REST, gRPC, GraphQL and More

    The humble web-based API is evolving. This track provides the what, how, and why of future APIs.

  • Streaming Data Architectures

    Today's systems move huge volumes of data. Hear how the innovators in this space are designing systems and leveraging modern data stream processing platforms.

  • Modern Compilation Targets

    Learn about the innovation happening in the compilation target space. WebAssembly is only the tip of the iceberg.

  • Modern CS in the Real World

    Head back to academia to solve today's problems in software engineering.

  • Bare Knuckle Performance

    Crushing latency and getting the most out of your hardware.

  • Leading Distributed Teams

    Remote and distributed working are increasing in popularity, but many organisations underestimate the leadership challenges. Learn from those who are doing this effectively.

  • Driving Full Cycle Engineering Teams at Every Level

    "Full cycle developers" is not just another catch phrase; it's about engineers taking ownership and delivering value, and doing so with the support of their entire organisation. Learn more from the pioneers.

  • JavaScript: Pushing the Client Beyond the Browser

    JavaScript is not just the language of the web. Join this track to learn how the innovators are pushing the boundaries of this classic language and ecosystem

  • When Things Go Wrong: GDPR, Ethics, & Politics

    Privacy, confidentiality, safety and security: learning from the frontlines, from both good and bad experiences

  • Growing Unicorns in the EU: Building, Leading and Scaling Financial Tech Start Ups

    Learn how EU FinTech innovators have designed, built, and led both their technologies and organisations.

  • Building High Performing Teams

    There are many discussions outlining the secret sauce of high-performing teams. Learn how to balance the essential ingredients of high performing teams such as trust and delegation, as well as recognising the pitfalls and problems that will ruin any recipe.

  • Scaling Security, from Device to Cloud

    Implementing effective security is vitally important, regardless of where you are deploying software applications