You are viewing content from a past/completed QCon -

Workshop: Building Reliable Systems Workshop

Location: St James, 4th flr.

Duration: 9:00am - 4:00pm

Day of week: Friday

Level: Intermediate

Key Takeaways

  • How to successfully apply the best parts of Site Reliability Engineering to your organisation

  • How to Design for Failure, and Incorporate Observability into your systems.

  • How to Engineer for Resilience through enabling Learning Loops, Blameless Post-mortems and Chaos Engineering.


No prerequisites are required to get full value out of this course. The samples and practical examples explored use the Chaos Toolkit and Platform and work upon a system that comprises Kubernetes as the platform with various service implementations but no prior knowledge of these technologies is expected.

Users want reliability. Your business wants speed and agility. You need to invest in resilience, and this is the best workshop to get you rolling.
Teaching patterns, practices and hard-won lessons from the trenches, this workshop takes you through how to bring together Site Reliability Engineering, Designing for Failure, Observability, Engineering Resilience and Chaos Engineering.
This workshop gives you the patterns, practices and tools to enable your own organisation's Resilience Engineering capability, helping you build systems that are reliable and evolve at speed.
This course is for you if you are:
  • A software developer with a traditional background and you need to start taking responsibility for your code in production.
  • A site reliability engineer (SRE) with a little experience of managing production and you need to be proactive about finding system weaknesses before your customers do.
  • A system administrator who is responsible for the availability of production and you need a proactive technique for surfacing system weaknesses before your customers experience them.
  • A product owner who is responsible for delivering a business-critical product or service and you need to know how to gain trust and confidence in your system’s reliability.

Speaker: Russell Miles

CEO of @chaosiqio

Russ Miles is CEO of where he and his team build commercial and open source ( products and provide services to companies applying Chaos Engineering to build confidence in the resilience of their production systems. 

Russ is an international speaker, trainer and author. Most recently he has been writing the handbook for Chaos Engineering for O'Reilly and having published "Antifragile Software: Building Adaptable Software with Microservices" where he explores how to apply Chaos Engineering to construct and manage complex, distributed systems in production with confidence. He also delivers public and private courses on Chaos Engineering and Resilience Engineering around the world and online for O'Reilly Media.

Find Russell Miles at


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.