You are viewing content from a past/completed QCon

Workshop: [SOLD OUT] Building Reliable Systems Workshop

Location: St James, 4th flr.

Duration: 9:00am - 4:00pm

Day of week: Friday

Level: Intermediate

Key Takeaways

  • How to successfully apply the best parts of Site Reliability Engineering to your organisation

  • How to Design for Failure, and Incorporate Observability into your systems.

  • How to Engineer for Resilience through enabling Learning Loops, Blameless Post-mortems and Chaos Engineering.


No prerequisites are required to get full value out of this course. The samples and practical examples explored use the Chaos Toolkit and Platform and work upon a system that comprises Kubernetes as the platform with various service implementations but no prior knowledge of these technologies is expected.

Users want reliability. Your business wants speed and agility. You need to invest in resilience, and this is the best workshop to get you rolling.
Teaching patterns, practices and hard-won lessons from the trenches, this workshop takes you through how to bring together Site Reliability Engineering, Designing for Failure, Observability, Engineering Resilience and Chaos Engineering.
This workshop gives you the patterns, practices and tools to enable your own organisation's Resilience Engineering capability, helping you build systems that are reliable and evolve at speed.
This course is for you if you are:
  • A software developer with a traditional background and you need to start taking responsibility for your code in production.
  • A site reliability engineer (SRE) with a little experience of managing production and you need to be proactive about finding system weaknesses before your customers do.
  • A system administrator who is responsible for the availability of production and you need a proactive technique for surfacing system weaknesses before your customers experience them.
  • A product owner who is responsible for delivering a business-critical product or service and you need to know how to gain trust and confidence in your system’s reliability.

Speaker: Russell Miles

CEO of @chaosiqio

Russ Miles is CEO of where he and his team build commercial and open source ( products and provide services to companies applying Chaos Engineering to build confidence in the resilience of their production systems. 

Russ is an international speaker, trainer and author. Most recently he has been writing the handbook for Chaos Engineering for O'Reilly and having published "Antifragile Software: Building Adaptable Software with Microservices" where he explores how to apply Chaos Engineering to construct and manage complex, distributed systems in production with confidence. He also delivers public and private courses on Chaos Engineering and Resilience Engineering around the world and online for O'Reilly Media.

Find Russell Miles at


  • Architectures You've Always Wondered About

    Hard-earned lessons from the names you know on scalability, reliability, security, and performance.

  • Machine Learning: The Latest Innovations

    AI and machine learning is more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice.

  • Kubernetes and Cloud Architectures

    Learn about cloud native architectural approaches from the leading industry experts who have operated Kubernetes and FaaS at scale, and explore the associated modern DevOps practices.

  • Evolving Java

    JVM futures, JIT directions and improvements to the runtimes stack is the theme of this year’s JVM track.

  • Next Generation Microservices: Building Distributed Systems the Right Way

    Microservice-based applications are everywhere, but well-built distributed systems are not so common. Early adopters of microservices share their insights on how to design systems the right way.

  • Chaos and Resilience: Architecting for Success

    Making systems resilient involves people and tech. Learn about strategies being used, from cognitive systems engineering to chaos engineering.

  • The Future of the API: REST, gRPC, GraphQL and More

    The humble web-based API is evolving. This track provides the what, how, and why of future APIs.

  • Streaming Data Architectures

    Today's systems process huge volumes of continuously changing data. Hear how the innovators in this space are designing systems and leveraging modern data stream processing platforms.

  • Modern Compilation Targets

    Learn about the innovation happening in the compilation target space. WebAssembly is only the tip of the iceberg.

  • Modern CS in the Real World

    Head back to academia to solve today's problems in software engineering.

  • Bare Knuckle Performance

    Crushing latency and getting the most out of your hardware.

  • Leading Distributed Teams

    Remote and distributed working are increasing in popularity, but many organisations underestimate the leadership challenges. Learn from those who are doing this effectively.

  • Driving Full Cycle Engineering Teams at Every Level

    "Full cycle developers" is not just another catch phrase; it's about engineers taking ownership and delivering value, and doing so with the support of their entire organisation. Learn more from the pioneers.

  • JavaScript: Pushing the Client Beyond the Browser

    JavaScript is not just the language of the web. Join this track to learn how the innovators are pushing the boundaries of this classic language and ecosystem

  • When Things Go Wrong: GDPR, Ethics, & Politics

    Privacy, confidentiality, safety and security: learning from the frontlines, from both good and bad experiences

  • Growing Unicorns in the EU: Building, Leading and Scaling Financial Tech Start Ups

    Learn how EU FinTech innovators have designed, built, and led both their technologies and organisations.

  • Building High Performing Teams

    There are many discussions outlining the secret sauce of high-performing teams. Learn how to balance the essential ingredients of high performing teams such as trust and delegation, as well as recognising the pitfalls and problems that will ruin any recipe.

  • Scaling Security, from Device to Cloud

    Implementing effective security is vitally important, regardless of where you are deploying software applications