Wrangling Telemetry at Scale: A Guide to Self-Hosted Observability

Abstract

Details coming soon.


Speaker

Colin Douch

Site Reliability Engineer @DuckDuckGo

Colin currently works as an SRE at DuckDuckGo, orchestrating and inventing solutions to better serve DuckDuckGo's increasingly large portfolio of services, serving search queries and AI chats from around the world. Formerly heading up the Observability Team at Cloudflare, he has been working, advising, and researching in the Monitoring and Observability space for close to 10 years and has gained a wide perspective into the difficulties that modern companies, big and small, deal with in properly introspecting their systems. Originally from New Zealand, he now lives in the UK and regularly speaks at conferences to share insights from the practical side of Observability engineering.

Read more

Date

Tuesday Mar 17 / 03:55PM GMT ( 50 minutes )

Location

Windsor (5th Fl.)

Share

From the same track

Session Sociotechnical Leadership

Sociotechnical Practices and Tools for Debugging your Organisation

Tuesday Mar 17 / 10:35AM GMT

Debugging is both an art and a science. But more than that, it's an activity undertaken with deep intention: to understand and improve your systems. In the purely technical realm, we have an extraordinary range of tooling and techniques that can help us tackle this problem.

Speaker image - Hazel Weakly

Hazel Weakly

Fellow @Nivenly Foundation; Director, Haskell Foundation; Experienced Leader Focusing on Organizational Change, Developer Experience, and Resilience Engineering

Session Distributed Tracing

How Eve Online Leverages Head Based Sampling to Observe "Fun"

Tuesday Mar 17 / 11:45AM GMT

A unique pattern in video game software is real-time interactions to express the personality of users.Here we will talk about how we instrument the universe of New Eden to identify the traffic that matters, even the "fun" parts!

Speaker image - Nicholas Herring

Nicholas Herring

Technical Director, Eve Online @CCP Games, Refiner of Internet Spaceships and Explorer of Feral Gordian Knots of Python

Session

Can Claude Fix Itself? Using LLMs for Incident Response

Tuesday Mar 17 / 05:05PM GMT

Details coming soon.

Speaker image - Alex Palcuie

Alex Palcuie

Member of Technical Staff in AI Reliability Engineering @Anthropic, Previously Staff Site Reliability Engineer on Google Cloud Platform

Session

Real-Time Observability for Cross-Border Payment Rails

Tuesday Mar 17 / 02:45PM GMT

Details coming soon.

Session

Unconference: Debugging Distributed Systems

Tuesday Mar 17 / 01:35PM GMT