QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams.

John Allspaw, Engineering Culture Hacker

John Allspaw

Biography: John Allspaw

John has worked in systems operations for over fourteen years in biotech, government and online media. He started out tuning parallel clusters running vehicle crash simulations for the U.S. government, and then moved on to the Internet in 1997. He built the backing infrastructures at Salon, InfoWorld, Friendster, and Flickr. He is now SVP of Tech Operations at Etsy, and is the author of The Art of Capacity Planning and Web Operations published by O'Reilly.  
 

Presentation: Fault tolerance, anomaly detection, and anticipation patterns

Track: Highly-available systems / Time: Wednesday 13:50 - 14:50 / Location: Fleming

Building and operating highly available and resilient systems requires being aware of fault tolerance, anomaly detection, and anticipation patterns. These patterns concern the handling of emergent behaviors, not just component reliability. This "systems thinking" approach isn't entirely intuitive, and can take some effort to build into your working assumptions. In this talk we will explore these patterns, look at real-world examples, and ways to further inform architecture design with the fundamentals of resilience engineering.

Presentation: Resilient Response In Complex Systems

Track: Keynote / Time: Friday 09:00 - 10:00 / Location: To be announced

Complex systems fail, and they don't always fail in expected ways. Recovering from, learning from, and anticipating failure in complex systems requires the efficient cooperation of response teams in sometimes disorienting and escalating scenarios. There are a number of pitfalls that engineers can fall into while troubleshooting production systems under these conditions, but there are also ways to side-step them gracefully. This talk will cover those, as well as compare and contrast web operations at scale with the practices and culture of High Reliability Organizations such as aviation and nuclear power systems.