Track:

Observability Done Right: Automating Insight & Software Telemetry

Location:

Whittle, 3rd flr.

Duration

Duration:

10:35am - 11:25am

Day of week:

Wednesday

Level:

Intermediate

Persona:

Architect

Key Takeaways

Focus on the stuff that matters when it comes to developing Microservices
Understand how to build a system with support in mind
Understand practical approaches to optimizing alerts and the infrastracture to support them

Abstract

Microservices can be a great way to work: the services are simple, you can use the right technology for the job, and deployments become smaller and less risky. Unfortunately, other things become more complex. You probably took some time to design a deployment pipeline and set up self-service provisioning, for example. But did the rest of your thinking about what “done” means catch up? Are you still setting up alerts, run books, and monitoring for each microservice as though it was a monolith?

Two years ago, a team at the FT started out building a microservices-based system from scratch. Their initial naive approach to monitoring meant that an underlying network issue could mean 20 people each receiving 10,000 alert emails overnight. With that volume, you can’t pick out the important stuff. In fact, your inbox is unusable unless you have everything filtered away where you’ll never see it. Furthermore, you have information radiators all over the place, but there’s always something flashing or the wrong colour. You can spend the whole day moving from one attention-grabbing screen to another.

That team now has over 150 microservices in production. So how they get themselves out of that mess and regain control of their inboxes and their time? First, you have to work out what’s important, and then you have to ruthlessly narrow down on that. You need to be able to see only the things you need to take action on in a way that tells you exactly what you need to do. Sarah shares how her team regained control and offers some tips and tricks.

Interview

Question:

What is the focus of your work today?

Answer:

I’m the tech lead for the Content platform at the Financial Times. The platform handles publication of content from multiple content management systems, annotating that content with metadata via concept extraction and editorial curation, and making all of that information available via a set of APIs, so any product within the FT or outside that delivers our content has a stable base to build on.

As part of this work, we are completely revamping our metadata from a taxonomy based metadata - essentially, lists of terms in a variety of categories such as authors, people, companies - with an ontology based one, i.e. based on real things, with an ability to navigate the relationships between those things. As an example, in our new metadata, an author is a person who is a writer. This means we don’t have the same name appearing as a term within people, authors and brands (this is highly confusing to deal with) and gives us a lot more flexibility to show content based on that metadata.

We have five development teams working on our system, which is made up of nearly 300 microservices. We were early adopters of docker, building a lot of the cluster management tools ourselves. My focus this year is making sure that all the work we do on our platform works towards our functional and architectural goals - no local optimizations - and that our production stack is as stable and easy to use as possible so we spend time on the things that matter to our business: for example, we are migrating to Kubernetes and replacing some of our hand written tools.

Question:

What’s the motivation for your talk?

Answer:

This was the first microservices architecture I’ve worked on, and one of the early things I learnt is that you just can’t operate microservices the way you did a monolith.

Microservices make the code easier to reason about and deploy frequently, but you have to do DevOps to make it work, and you have to keep a keen focus on building things for operability.

In the early days, my inbox was full of alerts every day, and it was very hard to work out what was something I needed to take action on. We’ve invested a lot of time in finding ways to solve that problem and I wanted to share that experience with others.

Question:

How you you describe the persona of the target audience of this talk?

Answer:

Tech Lead/Architect/Developer/Senior Management: anyone operating a microservices architecture or planning to.

Question:

How would you rate the level of this talk?

Answer:

It’s a technical talk. It assumes you know about microservices, that you have been responsible for supporting a system.

Question:

QCon targets advanced architects and sr development leads, what do you feel will be the actionable that type of persona will walk away from your talk with?

Answer:

There are lots of concrete suggestions of things to install or develop that will help with operating a microservices-based system, but I hope the main actionable will be that you have to care about this stuff and work on it constantly.

Question:

What do you feel is the most disruptive tech in IT right now?

Answer:

Serverless feels like it could completely change things. At the moment I’m seeing us use Lambda mostly for more ‘housekeeping’ tasks rather than for production-critical systems, but that’s going to change and I can see the attraction of not having a server to maintain.

I wonder how easy it will be to support a system made up hundreds of functions running on hardware you can’t see. It’ll be a whole new set of observability challenges!

Speaker: Sarah Wells

Principal Engineer @FT (Financial Times)

Sarah Wells is currently leading work at the FT on building a semantic publishing platform, making it easy to discover and access all the FT’s published content via APIs in a common and flexible format. Sarah has been a developer for 15 years, working across consultancy, financial services, and media. She is more dev than ops, but definitely shifting. Her recent focus has been on Go, microservices, containerisation, devops, and how to influence teams to do the right things.

Find Sarah Wells at

Speaker page

Core Kafka team @Confluent

Ben Stopford

Mastering the app Store landscape with telemetry

Kaushik Patel

Mastering the app Store landscape with telemetry

Data Scientist

Usman Khan

Causal Consistency For Large Neo4j Clusters

Chief Scientist @Neo4j

Jim Webber

Effective Data Pipelines: Data Mngmt from Chaos

Python engineer, Founder @kjamistan

Katharine Jarmul

Deliver Docker Containers Continuously on AWS

Lead Software Developer @AutoScout24

Philipp Garbe

Creating Space To Be Awesome

CTO who understands the science around helping people do their best

Meri Williams

Real World Java 9

Java Champion, Engineer and Evangelist

Trisha Gee

Thinking Strategically About IoT

Senior Software Engineer @IBM, Committer on Apache Aries

Holly Cummins

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Sarah Wells at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Avoiding Alerts Overload From Microservices

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Sarah Wells at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World