Track: Architecting for Failure: Chaos, Complexity, and Resilience

Location: Whittle, 3rd flr.

Day of week: Wednesday

Making systems resilient involves people and tech. Learn about strategies being used from chaos testing to distributed system clustering.

Track Host: Nicki Watt

Chief Technology Officer @OpenCredo

Nicki Watt currently serves as OpenCredo’s Chief Technology Officer, a pragmatic hands on software consultancy with specialisms in data engineering, ML & cloud native solutions. Her technical career has seen her wear many hats from Engineer, Systems & Technical Architects to Consultant and now CTO. She is a techie at heart, with involvement in the development, delivery and leading of large scale platform and application development projects. Nicki is also co-author of the graph database book Neo4J in Action.

Building Resilient Serverless Systems

In this brave new world of serverless, we entrust our vendors with keeping the infrastructure up and running. However, when even cloud behemoths like Amazon Web Services and Google Cloud have outages and failures, how can we build resilient systems?

John Chapin explains how to use serverless technologies and an infrastructure-as-code approach to architect, build, and operate large-scale systems that are resilient to vendor failures, even while taking advantage of fully managed vendor services and platforms. He then leads an end-to-end demo of the resilience of a well architected serverless system in the face of massive simulated failure. He further demonstrates how the system not only provides resilience to failure, but also has a side affect of improving the end-user experience. Finally, John discusses some of the drawbacks and idiosyncrasies of the approach.

All source code, infrastructures templates, and slides will be available for the audience to download and explore. While the examples largely focus on AWS—including API Gateway, CloudFormation, DynamoDB, Lambda, and Route 53—the techniques discussed are broadly applicable across cloud vendors.

John Chapin, Cloud Technology Consultant with an expertise in Serverless Computing

An Engineer's Guide to a Good Night's Sleep

As organisations look to empower engineers more, and embrace devops practices, we have seen the support role change quite a bit too. Developers are moving from being purely third line support, to working more collaboratively with engineers and operational staff. Also as we move to cloud native microservice solutions, the increased complexity and diversity of our production landscape means operational staff may well rely more heavily on the engineers, in particular out of hours. 

I have spent the last 18 years working across a plethora of industries utilising a myriad of technology and approaches. From working on everything from trading applications to content enrichment APIs, I have seen a lot of approaches and processes try to help minimise operational support for developers. 

In this talk, I will be exploring and discussing some of my top approaches and techniques to help reduce the risk of that dreaded 3am call! You will gain some practical insight into how to handle failure in today's more complex distributed microservice systems. This will include looking at approaches to resiliency, understanding your system, understanding the requirements for fault tolerance, and the developers' mindset necessary for this. I will be peppering this talk with real world examples, and an occasional war story along the way too.

Nicky Wrightson, Principal Engineer @riverisland

Learning From Chaos: Architecting for Resilience

In this talk Russ Miles, CEO of ChaosIQ, will share how leading organisations are successfully adopting chaos engineering to encourage a mindset of "architecting for resilience". Through chaos engineering, architects are able to establish a true "learning system" where everyone is involved in exploring how their systems can improve through embracing failure. 

Drawing from a collection of real-world examples and experience reports, Russ will show how you can set up your systems to learn from controlled failure and make resilience an important competitive edge for your organisation.

Russ Miles, CEO of @chaosiqio

Tracks