John Allspaw, TweetEngineering Culture Hacker
Biography: John Allspaw
Building and operating highly available and resilient systems requires being aware of fault tolerance, anomaly detection, and anticipation patterns. These patterns concern the handling of emergent behaviors, not just component reliability. This "systems thinking" approach isn't entirely intuitive, and can take some effort to build into your working assumptions. In this talk we will explore these patterns, look at real-world examples, and ways to further inform architecture design with the fundamentals of resilience engineering.
Presentation: TweetResilient Response In Complex Systems
Complex systems fail, and they don't always fail in expected ways. Recovering from, learning from, and anticipating failure in complex systems requires the efficient cooperation of response teams in sometimes disorienting and escalating scenarios. There are a number of pitfalls that engineers can fall into while troubleshooting production systems under these conditions, but there are also ways to side-step them gracefully. This talk will cover those, as well as compare and contrast web operations at scale with the practices and culture of High Reliability Organizations such as aviation and nuclear power systems.