Presentation: Monitoring All the Things: Keeping Track of a Mixed Estate

Track: Next Generation Microservices: Building Distributed Systems the Right Way

Location: Fleming, 3rd flr.

Duration: 4:10pm - 5:00pm

Day of week: Monday

Share this on:

What You’ll Learn

  1. Hear about monitoring mixed environments where newer systems run alongside legacy ones.
  2. Learn about monitoring mixed environments and how to discuss issues with various teams across the organization.

Abstract

Monitoring all of a team’s systems can be tricky when you have a microservice architecture. But what happens when you have many teams, each building systems using totally different technology stacks? Add in decades of legacy systems and a sprinkling of third-party tools and you’ve got plenty of fun in store. Discover how to approach monitoring an estate of many technologies and find out what the Financial Times did to improve visibility across systems built by all its teams.

Question: 

What is the work you're doing today?

Answer: 

I'm a Principal Engineer on the FT's reliability engineering team. Our main goal is to assist the other teams around the business to help them build stuff that is secure and reliable. That involves us building tools and helping them. Also, a lot of talking to people and giving them advice around what approaches to take. Myself, I do a mixture of coding, tech leading and having discussions with other teams about what we build.

Question: 

Do you work with monitoring, using specific tools there, or is it about coding integrations?

Answer: 

We have a range of tools across the FT, including some older tools like Nagios, which we still support for the older systems within. Newer stuff tends to use things like CloudWatch and Graphite/Grafana and also Pingdom. We also have some internal tools.

Question: 

What can people expect from this talk?

Answer: 

I've been to talks before about monitoring. And often they focus very much on a single consistent estate: be that running in the same container platform or all using the same programing language. There's lots of nice, neat tricks for monitoring things when they're all consistent. But the problem I've often faced being in an organization that has more than one team, especially where each team has their own autonomy, you end up with vastly different states. We're not a startup. We've been around for a hundred and fifty years or more. We have legacy tech systems that we still need to support, that are still critical to the business. I want to talk about how you bring those different things together so that you can support the old and the new and the variety that you get in a real working organization.

Question: 

When you say legacy technology, are you talking about mainframes or the older stuff?

Answer: 

We're talking about some stuff that's been deployed to physical racks that are sitting in a data center that we run ourselves. Even up until a few months back, we had stuff running in the office, but a recent office move means we finally migrated all that stuff off. There's older stuff that isn't the best understood throughout the company, but it's still very important to our operations. A variety of different systems in different languages, and I don't really know what the oldest one is, to be honest.

Question: 

What are some of the challenges that you encounter?

Answer: 

One of the biggest challenges is looking at old monitoring systems and trying to understand the nuances, because people can tend to understand this is working and this is broken, but a lot of things have interesting failure states, and understanding what failure states you should alert on and what you shouldn't. What does it mean whenever something says it's a warning instead of an error, because different systems have a different understanding of those things and trying to bring them all together. You want the same user experience regardless of what monitoring system it came from. And I think that's actually the tricky bits. It's talking to all the different teams to understand what they mean by good and what they mean by a failure.

Question: 

What do you want people to take away?

Answer: 

I want them to take away ideas about how they can approach these problems. There isn't one solution that's going to fit every organization. But I want them to have an idea of what they need to think about and things that might trip them up and useful techniques that you can apply to multiple systems so you can start to bring these things together and have a meaningful conversation with people around the organization about what they want to do.

Speaker: Luke Blaney

Principal Engineer Operations and Reliability Programme @FT

Find Luke Blaney at

Similar Talks

Scaling N26 Technology Through Hypergrowth

Qcon

Software Engineer and Tech Lead @N26

Folger Fonseca

3 Disciplines for Leading a Distributed Agile Organization

Qcon

Distributed Coach/Mentor & Community Cultivator

Mark Kilby

A Brief History of the Future of the API

Qcon

Co-Author of gRPC for WCF Developers and Creator @VisualRecode

Mark Rendle

Preparing for the Unexpected

Qcon

Principal Engineer @FinancialTimes

Samuel Parkinson

Security Vulnerabilities Decomposition

Qcon

Principal Application Security Consultant @Veracode

Katy Anton

Tracks

  • Architectures You've Always Wondered About

    Hard-earned lessons from the names you know on scalability, reliability, security, and performance.

  • Machine Learning: The Latest Innovations

    AI and machine learning is more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice.

  • Kubernetes and Cloud Architectures

    Learn about cloud native architectural approaches from the leading industry experts who have operated Kubernetes and FaaS at scale, and explore the associated modern DevOps practices.

  • Evolving Java

    JVM futures, JIT directions and improvements to the runtimes stack is the theme of this year’s JVM track.

  • Next Generation Microservices: Building Distributed Systems the Right Way

    Microservice-based applications are everywhere, but well-built distributed systems are not so common. Early adopters of microservices share their insights on how to design systems the right way.

  • Chaos and Resilience: Architecting for Success

    Making systems resilient involves people and tech. Learn about strategies being used, from cognitive systems engineering to chaos engineering.

  • The Future of the API: REST, gRPC, GraphQL and More

    The humble web-based API is evolving. This track provides the what, how, and why of future APIs.

  • Streaming Data Architectures

    Today's systems move huge volumes of data. Hear how the innovators in this space are designing systems and leveraging modern data stream processing platforms.

  • Modern Compilation Targets

    Learn about the innovation happening in the compilation target space. WebAssembly is only the tip of the iceberg.

  • Modern CS in the Real World

    Head back to academia to solve today's problems in software engineering.

  • Bare Knuckle Performance

    Crushing latency and getting the most out of your hardware.

  • Leading Distributed Teams

    Remote and distributed working are increasing in popularity, but many organisations underestimate the leadership challenges. Learn from those who are doing this effectively.

  • Driving Full Cycle Engineering Teams at Every Level

    "Full cycle developers" is not just another catch phrase; it's about engineers taking ownership and delivering value, and doing so with the support of their entire organisation. Learn more from the pioneers.

  • JavaScript: Pushing the Client Beyond the Browser

    JavaScript is not just the language of the web. Join this track to learn how the innovators are pushing the boundaries of this classic language and ecosystem

  • When Things Go Wrong: GDPR, Ethics, & Politics

    Privacy, confidentiality, safety and security: learning from the frontlines, from both good and bad experiences

  • Growing Unicorns in the EU: Building, Leading and Scaling Financial Tech Start Ups

    Learn how EU FinTech innovators have designed, built, and led both their technologies and organisations.

  • Building High Performing Teams

    There are many discussions outlining the secret sauce of high-performing teams. Learn how to balance the essential ingredients of high performing teams such as trust and delegation, as well as recognising the pitfalls and problems that will ruin any recipe.

  • Scaling Security, from Device to Cloud

    Implementing effective security is vitally important, regardless of where you are deploying software applications