Uncorking Queueing Bottlenecks with OpenTelemetry

Abstract

Queues are an essential component in a scalable distributed system, but going beyond the simple implementation creates an explosion of complexity to manage.

Suddenly there are 5 different places where you could be creating bottlenecks and you might not even notice until your customer tells you things are running slow or you're drowning in dashboards.

In this presentation we’ll share our experience and expertise in operating, debugging and managing queueing systems with a focus on:

  • Why distributed tracing is a requirement for successful operations with queues
  • How OpenTelemetry standards make it easy to bring distributed tracing to your systems
  • How distributed tracing helped Gearset take control of our queueing problems
  • How we dealt with distributed sampling, total operation duration and dealing with long running traces 

Speaker

Julian Wreford

Team Lead of Operability Team @Gearset, Software Engineer Turned Accidental SRE

Julian Wreford is an engineering team lead at Gearset where he leads the team responsible for all things site reliability. After starting as a developer, he quickly became interested in operability and has helped lead the growth of observability culture and incident response at Gearset as the company has scaled from small teams to large enterprises. He is passionate about developer ownership throughout the software lifecycle and enjoys empowering developers to better understand and debug the code they write when it is running at scale.

Read more
Find Julian Wreford at:

Speaker

Oli Lane

Engineering Team Lead @Gearset, Focusing on Engineering Culture, Observability, and Platform Reliability

Oli is an Engineering Team Lead and self-described "Jack of at least some trades." A fixture at Gearset for over ten years, he has ridden the wave from a scrappy 7-person startup to a 350+ employee scale-up.

Along the way, he has gained deep experience across both product and infrastructure teams, with a particular interest in the sociotechnical side of engineering. Currently, Oli focuses on platform engineering and observability, building the culture and tools needed for high-performing teams and reliable systems.

Read more
Find Oli Lane at:

From the same track

Session architecture

From Fan-Out to Fast: Sub-100ms API Design in Distributed Systems

Monday Mar 16 / 10:35AM GMT

A “simple” API request rarely stays simple. In distributed systems, one call quickly turns into fan-out across gateways, services, caches, and databases — and your p99 becomes the sum of every hop and every flaky dependency.

Speaker image - Saranya Vedagiri

Saranya Vedagiri

Senior Staff Engineer @eBay

Session Platform Engineering

APIs for Agents: Rethinking API Programs in the MCP Era

Monday Mar 16 / 11:45AM GMT

As API programs mature, a familiar gap emerges: some teams operate with strong standards, reusable platforms, and clear governance,  while others rely on informal guidance and best-effort consistency.

Speaker image - Jim Gough

Jim Gough

Distinguished Engineer, API Platform Lead Architect @Morgan Stanley, Co-Author of Optimizing Java

Speaker image - Andreea Niculcea

Andreea Niculcea

Vice President @Morgan Stanley

Session

Beyond the Dashboard: Why 'Query-ability' is the New Observability

Monday Mar 16 / 01:35PM GMT

Details coming soon.

Session

Async-First: Architecting for Event-Driven Connectivity

Monday Mar 16 / 05:05PM GMT

Details coming soon.

Session

Unconference: Connecting Systems

Monday Mar 16 / 02:45PM GMT