You are viewing content from a past/completed QCon -

SESSION + Live Q&A

Amplifying Sources of Resilience: What Research Says

Building robust software systems means anticipating how failures may occur with components and subsystems and developing answers to the question:   

“What is needed for the design of systems that prevents or limits catastrophic failure?”   Investing in, developing, and sustaining the adaptive capacity to cope with unexpected situations is at the core of Resilience Engineering. In the software community, this means developing (continually!) ever-better answers to the question:   

“When our preventative designs fail us, what are ways that teams of engineers successfully anticipate, resolve, and learn from those catastrophes?”

  

The Resilience Engineering community has been studying how people in high-consequence/high-tempo domains answer this latter question. Applying Resilience Engineering thinking and paradigms to the world of software engineering and operations is still in its infancy, but we have some promising routes for making progress. This talk will outline productive avenues to locate, amplify, support, and build this capacity that exists (sometimes invisibly) in the expertise of your organization. Spoiler: looking closely at the origins, handling, and perception of incidents is part of this story.


Speaker

John Allspaw

DevOps/Resilience Engineering Thought Leader, Previously CTO @Etsy & Co-founder of @AdaptiveCLabs

John Allspaw has worked in software systems engineering and operations for over twenty years in many different environments. John’s publications include the books The Art of Capacity Planning (2009) and Web Operations (2010) as well as the forward to “The DevOps Handbook.”  His 2009...

Read more
Find John Allspaw at:

From the same track

SESSION + Live Q&A Serverless

Building Resilient Serverless Systems

In this brave new world of serverless, we entrust our vendors with keeping the infrastructure up and running. However, when even cloud behemoths like Amazon Web Services and Google Cloud have outages and failures, how can we build resilient systems?   John Chapin explains how to use...

Johnathan Chapin

Cloud Technology Consultant with an expertise in Serverless Computing

SESSION + Live Q&A Site Reliability Engineering

An Engineer's Guide to a Good Night's Sleep

As organisations look to empower engineers more, and embrace devops practices, we have seen the support role change quite a bit too. Developers are moving from being purely third line support, to working more collaboratively with engineers and operational staff. Also as we move to cloud native...

Nicky Wrightson

Ventures CTO @blenheimchalcot

SESSION + Live Q&A Interview Available

Learning From Chaos: Architecting for Resilience

In this talk Russ Miles, CEO of ChaosIQ, will share how leading organisations are successfully adopting chaos engineering to encourage a mindset of "architecting for resilience". Through chaos engineering, architects are able to establish a true "learning system" where everyone is involved in...

Russell Miles

CEO of @chaosiqio

SESSION + Live Q&A Chaos Engineering

How Condé Nast Succeeds by a Culture That Embraces Failure

Systems architectures are increasingly diverse to serve the growing demands for scalability, fault tolerance, isolation, and extensibility. But the compromise is ever complex software to operate and maintain often with no single shared view of entire design. This is especially true with the...

Crystal Hirschorn

VP Engineering, Global Strategy & Operations @CondeNast

View full Schedule