As recent service disruptions and cloud outages have reminded us all, resilience forms a critical aspect of software systems.
But how should we approach building it into our architectures?
From leveraging progressive collapse to avoid cascading failures to finding resilience bugs in systems that don't exist, this track will give attendees concrete steps for building more resilient systems in an increasingly distributed world.
From this track
How to Find Resilience Bugs in Systems that Don't Exist
Wednesday Mar 18 / 10:35AM GMT
Building correct distributed systems takes thinking outside the box, and the fastest way to do that is to think inside a different box. One different box is "formal methods", the discipline of mathematically verifying software and systems.
Hillel Wayne
Author of "Logic for Programmers" and "Learn TLA+"
Spritely: Infrastructure for the Future of the Internet
Wednesday Mar 18 / 11:45AM GMT
Let's take back the internet! Learn about Spritely's work to re-decentralize the net with new foundational technologies that put users in control.
Christine Lemmer-Webber
Executive Director @Spritely Institute, Co-Author of ActivityPub
David Thompson
CTO @Spritely Institute
Understanding Progressive Collapse: How To Avoid A Cascading Failure
Wednesday Mar 18 / 01:35PM GMT
Small things going wrong can quickly snowball. The cascading failure is often a nightmare scenario for any system. An initial problem, which in isolation seems like such a minor problem, can kick off a chain reaction of ever-increasing failures, potentially leading to catastrophic results.
Sam Newman
Microservice, Cloud, CI/CD Expert, Author of "Building Microservices" and "Monolith to Microservices", 20+ Years Experience as a Developer
Keeping the Nation On-Air: How We Think About Resilience at the BBC
Wednesday Mar 18 / 02:45PM GMT
At the heart of the BBC is delivering value to all, serving audiences across the UK and the world on TV, radio, and online with trusted and impartial news and high-quality British content.
Tom Everest
Head of Department for Architecture and Supply Chain @BBC
Shielding the Core: Architecting Resilience with Multi-Layer Defenses
Wednesday Mar 18 / 03:55PM GMT
High-demand events can cause sudden traffic spikes that overwhelm even well-designed systems. In ticketing platforms, millions of users — alongside increasingly sophisticated automated agents — may arrive simultaneously, placing extreme pressure on backend services.
Anderson Parra
Staff Software Engineer @SeatGeek