Track:

Observability: Logging, Alerting and Tracing

Location: Churchill, G flr.

Day of week: Tuesday

Observability in modern large distributed computer systems

Track Host:
Sarah Wells
Technical Director for Operations and Reliability @FT (Financial Times)

Sarah Wells has been a developer for 15 years, leading delivery teams across consultancy, financial services and media. Over the last few years she has developed a deep interest in operability, observability and devops, and this has recently led to her taking over responsibility for Operations and Reliability at the Financial Times.
Before that, she lead work at the FT on building a semantic publishing platform, making it easy to discover and access all the FT’s published content via APIs in a common and flexible format. That project meant a focus on Go, microservices, containerisation, and how to influence teams to do the right things.

10:35am - 11:25am

by Pierre Vincent
SRE Manager @Poppulo

Being able to observe the state of a running application is key to understanding a system's behaviour and essential if you want to fix production problems quickly and efficiently. Like a lot of other things, this is harder to do in distributed systems than it is with a monolith. At Poppulo we've been running a distributed system of hundreds of microservices in production for more than 4 years and we got to understand how critical this visibility is. If you want to succeed with operating a...

11:50am - 12:40pm

by Yan Cui
Senior Developer at Space Ape Games

As engineers, we're empowered by advancements in cloud platforms to build ever more complex systems that can achieve amazing feats at a scale previously only possible for the elite few. The monitoring tools have evolved over the years to accommodate our growing needs with these increasingly complex systems, but the emergence of serverless technologies like AWS Lambda has shifted the landscape and broken some of the underlying assumptions that existing tools are...

1:40pm - 2:30pm

by Aaron Kirkbride
Software Engineer @Weaveworks

Monitoring containerised applications creates a new set of challenges that traditional monitoring systems struggle with. In this talk, Aaron from Weaveworks will explore how we can use Prometheus, along with its integrations with Kubernetes and other open-source components, to observe services effectively and help extinguish fires when they occur. Using anecdotes from real product incidents, this talk will cover bringing together metrics from various exporters,...

2:55pm - 3:45pm

by Charity Majors
Co-Founder @Honeycombio, formerly DevOps @ParseIT/@Facebook

Metrics, dashboards, logs ... the basics of monitoring haven’t changed much in the past 20 years, even as systems have gotten astronomically more fluid, ephemeral and complex. Modern systems require a more exploratory, iterative approach to problem solving than dashboards afford, and microservices are often the tipping point, the place past which people realize their old tools can simply no longer do the job. We’ll talk about events vs metrics, debugging vs monitoring, and lots of examples...

4:10pm - 5:00pm

by Charity Majors
Co-Founder @Honeycombio, formerly DevOps @ParseIT/@Facebook

by Pierre Vincent
SRE Manager @Poppulo

by Yan Cui
Senior Developer at Space Ape Games

by Sarah Wells
Technical Director for Operations and Reliability @FT (Financial Times)

by Randy Shoup
VP Engineering @WeWork

5:25pm - 6:15pm

by Amy Phillips
Engineering Manager @Moo

The days of trying to build systems that always work are gone. Fast, frequent releases and self-healing platforms can reduce, or even remove the risk of production incidents. So what does this mean for software testing? In this talk, Amy will look back on a long test career, and a recent Platform career to discuss the impact of observability on testing, from new techniques, greater Dev and Ops involvement, right through to whether we even really need testing anymore.

Tracks