Presentation: Preparing for the Unexpected

Track: Chaos and Resilience: Architecting for Success

Location: Fleming, 3rd flr.

Duration: 11:50am - 12:40pm

Day of week: Wednesday

Share this on:

What You’ll Learn

  1. Hear how the FT manages incidents and what they are doing to make it a sustainable process.
  2. Learn how to benefit from past incidents and encourage engineers to get involved.

Abstract

Convincing engineers to be on-call isn’t always straightforward. In 2019 the Customer Products group at the Financial Times set out to make their out of hours support process more sustainable after losing a number of people from their on-call team.

In this talk you’ll discover how to continuously learn from past incidents by applying your team’s most recent operational experience, increase the confidence of your team in handling live incidents away from the pressures of production, and convince them that, actually, joining the on-call team is a great idea!

Hear how the Financial Times is using incident workshops to prepare for the unexpected and make incident management a more consistent process by sharing the group’s wide range of operational knowledge and architectural insights.

Question: 

What is the work you are doing today?

Answer: 

I work at the Financial Times as a Principal Engineer. I support the development of FT.com, the website and mobile applications. There's two things going on in our department at the moment that I'm supporting. One is we're relaunching a whole bunch of teams. Getting all of those teams kicked off and started at the beginning of this year. Quite a lot of energy going into that, but it's all quite exciting. We're trying to address in a microservices world the issue of ownership as part of that. So the outcome should be we have a whole bunch of teams with full ownership of everything.

The other side is recruitment. We're starting a big new year, opening 40 new positions. So we're hiring quite a lot! That's exciting. It's a scale I haven't worked with before. It's all about scaling the recruitment process, making it possible for us to interview a lot of people quickly and fairly.

Question: 

What are your goals for the talk?

Answer: 

One of them is to use this as a reason to deep dive into something I'm quite interested in, incident management and reliability engineering. I'm really interested in telling the story that we have had at the FT over the last year about how we've handled incidents. And I want to get across that It's possible for engineers on the ground to make space for incident management and training and get people interested in the operational side of running systems. I think it's quite interesting. The FT is similar to a lot of companies where engineers have many different responsibilities and sometimes you have to jump into incident management, taking it all the way to producing an incident report.

I want to get across that engineers can put on the incident management hat. And I'm using this also internally as a launchpad to become a team that excels at incident management within the FT.

The final goal is to publish a framework for how to run incident management tabletop exercises, a lightweight workshop for sharing knowledge and making new connections between people so that we can better handle incidents.

Question: 

What are the core personas for the talk?

Answer: 

This talk is for engineers, and any other discipline that would get value out of learning from incidents.

Question: 

Could you share a few key takeaways?

Answer: 

I want to get across that your company's previous incidents are a treasure trove for preparing for what's to happen. We keep a record of all of our incidents at the FT and we review them regularly. There's always new things to learn even if they've happened in the past. And new people provide new eyes on those previous incidents and things that we didn't know at the time.

As engineers, you can carve out time and make space for running these workshops without too much effort. You don't have to have a dedicated role to run something like this. And you can really lift up your team by running these workshops.

The other important bit I want people to take away is an awareness of the barrier to entry that can exist in getting into incident management for the first time. This talk will cover a really good way to lower that barrier and get, say, junior engineers or anyone who's interested in incident management involved without the pressures of production and working on an active incident, and getting more people motivated and engaged with looking after the systems running in production.

Speaker: Samuel Parkinson

Principal Engineer @FinancialTimes

Sam is a Principal Engineer at the Financial Times, supporting the development of FT.com and the mobile apps. Previously he’s worked at Graze, a start-up that sends snacks through the post. Working in the industry for six years as a software engineer, he’s also spent time on the operational side as an integration engineer.

At the FT he has recently supported the Operations & Reliability group with their rebuild of the company-wide monitoring platform and is doing his best to convince people that joining the on-call team is definitely a good idea.

Find Samuel Parkinson at

Similar Talks

Monitoring All the Things: Keeping Track of a Mixed Estate

Qcon

Principal Engineer Operations and Reliability Programme @FT

Luke Blaney

3 Disciplines for Leading a Distributed Agile Organization

Qcon

Distributed Coach/Mentor & Community Cultivator

Mark Kilby

A Brief History of the Future of the API

Qcon

Co-Author of gRPC for WCF Developers and Creator @VisualRecode

Mark Rendle

Tracks

  • Architectures You've Always Wondered About

    Hard-earned lessons from the names you know on scalability, reliability, security, and performance.

  • Machine Learning: The Latest Innovations

    AI and machine learning is more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice.

  • Kubernetes and Cloud Architectures

    Learn about cloud native architectural approaches from the leading industry experts who have operated Kubernetes and FaaS at scale, and explore the associated modern DevOps practices.

  • Evolving Java

    JVM futures, JIT directions and improvements to the runtimes stack is the theme of this year’s JVM track.

  • Next Generation Microservices: Building Distributed Systems the Right Way

    Microservice-based applications are everywhere, but well-built distributed systems are not so common. Early adopters of microservices share their insights on how to design systems the right way.

  • Chaos and Resilience: Architecting for Success

    Making systems resilient involves people and tech. Learn about strategies being used, from cognitive systems engineering to chaos engineering.

  • The Future of the API: REST, gRPC, GraphQL and More

    The humble web-based API is evolving. This track provides the what, how, and why of future APIs.

  • Streaming Data Architectures

    Today's systems move huge volumes of data. Hear how the innovators in this space are designing systems and leveraging modern data stream processing platforms.

  • Modern Compilation Targets

    Learn about the innovation happening in the compilation target space. WebAssembly is only the tip of the iceberg.

  • Leaving the Ivory Tower: Modern CS Research in the Real World

    Thoughts pushing software forward, including consensus, CRDT's, formal methods & probabilistic programming.

  • Bare Knuckle Performance

    Crushing latency and getting the most out of your hardware.

  • Leading Distributed Teams

    Remote and distributed working are increasing in popularity, but many organisations underestimate the leadership challenges. Learn from those who are doing this effectively.

  • Driving Full Cycle Engineering Teams at Every Level

    "Full cycle developers" is not just another catch phrase; it's about engineers taking ownership and delivering value, and doing so with the support of their entire organisation. Learn more from the pioneers.

  • JavaScript: Pushing the Client Beyond the Browser

    JavaScript is not just the language of the web. Join this track to learn how the innovators are pushing the boundaries of this classic language and ecosystem

  • When Things Go Wrong: GDPR, Ethics, & Politics

    Privacy, confidentiality, safety and security: learning from the frontlines, from both good and bad experiences

  • Growing Unicorns in the EU: Building, Leading and Scaling Financial Tech Start Ups

    Learn how EU FinTech innovators have designed, built, and led both their technologies and organisations.

  • Building High Performing Teams

    There are many discussions outlining the secret sauce of high-performing teams. Learn how to balance the essential ingredients of high performing teams such as trust and delegation, as well as recognising the pitfalls and problems that will ruin any recipe.

  • Scaling Security, from Device to Cloud

    Implementing effective security is vitally important, regardless of where you are deploying software applications