Abstract
Small things going wrong can quickly snowball. The cascading failure is often a nightmare scenario for any system. An initial problem, which in isolation seems like such a minor problem, can kick off a chain reaction of ever-increasing failures, potentially leading to catastrophic results.
When a failure of a single component results in the failure of other connected elements, this is known as a progressive collapse. In this talk, Sam Newman looks at this phenomenon in more detail, and he'll examine how it has manifested in major disasters. Based on lessons learned from other industries, Sam will share three key techniques that can be used to mitigate against the progressive collapse occurring in your own system.
This talk will help you understand how to architect your systems in such a way that small failures stay small.
Interview:
What is your session about, and why is it important for senior software developers?
My session explores what happens when a small initial problem causes a giant catastrophe. In the context of buildings, this is called Progress Collapse. In my talk, I look at what happens when a building suffers a progressive collapse, how these can be mitigated, and what parallels we can draw deal with the cascading failures we see in distributed systems.
Why is it critical for software leaders to focus on this topic right now, as we head into 2026?
My session is about how disparate parts of a system interact, especially in the context of increasingly distributed systems. How we write code may have changed a lot over the last couple of years, but the fundamentals of system design, and the challenges of distributed systems still remain.
What are the common challenges developers and architects face in this area?
- When something goes wrong, they tend to look for one obvious cause, blame that and move on, without looking at wider systemic issues
- Too much focus on stopping things breaking, and not enough time spent on understanding how the system can continue to work when something does break
What's one thing you hope attendees will implement immediately after your talk?
Stop looking for single causes of failure!
What makes QCon stand out as a conference for senior software professionals?
The curated tracks are what helps QCon stand apart. It means you get a lot less clash between tracks, but also it means that each individual track ends up having something for everyone.
Speaker
Sam Newman
Microservice, Cloud, CI/CD Expert, Author of "Building Microservices" and "Monolith to Microservices", 20+ Years Experience as a Developer
Sam Newman is an independent consultant who loves solving problems with technology. Focusing primarily in the areas of cloud, microservice architecture and continuous delivery, Sam works with companies big and small all over the world. He is also an experienced conference speaker, and author of the O’Reilly books Monolith To Microservices, Building Microservices, and the forthcoming Building Resilient Distributed Systems.