Abstract
Queues are an essential component in a scalable distributed system, but going beyond the simple implementation creates an explosion of complexity to manage.
Suddenly there are 5 different places where you could be creating bottlenecks and you might not even notice until your customer tells you things are running slow or you're drowning in dashboards.
In this presentation we’ll share our experience and expertise in operating, debugging and managing queueing systems with a focus on:
- Why distributed tracing is a requirement for successful operations with queues
- How OpenTelemetry standards make it easy to bring distributed tracing to your systems
- How distributed tracing helped Gearset take control of our queueing problems
- How we dealt with distributed sampling, total operation duration and dealing with long running traces
Speaker
Julian Wreford
Team Lead of Operability Team @Gearset, Software Engineer Turned Accidental SRE
Julian Wreford is an engineering team lead at Gearset where he leads the team responsible for all things site reliability. After starting as a developer, he quickly became interested in operability and has helped lead the growth of observability culture and incident response at Gearset as the company has scaled from small teams to large enterprises. He is passionate about developer ownership throughout the software lifecycle and enjoys empowering developers to better understand and debug the code they write when it is running at scale.
Find Julian Wreford at:
Speaker
Oli Lane
Engineering Team Lead @Gearset, Focusing on Engineering Culture, Observability, and Platform Reliability
Oli is an Engineering Team Lead and self-described "Jack of at least some trades." A fixture at Gearset for over ten years, he has ridden the wave from a scrappy 7-person startup to a 350+ employee scale-up.
Along the way, he has gained deep experience across both product and infrastructure teams, with a particular interest in the sociotechnical side of engineering. Currently, Oli focuses on platform engineering and observability, building the culture and tools needed for high-performing teams and reliable systems.