Abstract
A “simple” API request rarely stays simple. In distributed systems, one call quickly turns into fan-out across gateways, services, caches, and databases — and your p99 becomes the sum of every hop and every flaky dependency. Worse, it’s often not a clean outage; it’s grey failures and intermittent slowdowns that are hard to reproduce and easy for customers to feel.
In this session, I’ll share a practical playbook for designing sub-100ms APIs when fan-out is unavoidable. We’ll start with latency budgets, so performance becomes a design constraint, not a hope. Then we’ll cover the patterns that keep tail latency predictable: safe parallelism, timeouts and retries that don’t amplify failure, idempotency, bulkheads/circuit breakers with fallbacks, and caching strategies where invalidation is treated as a correctness problem. We’ll close with trace-driven observability — the minimal signals that let you quickly answer: where did the milliseconds go, what changed, and is it us or a dependency?
Main takeaways:
- How to budget latency across service boundaries and enforce it with guardrails
- How to use timeouts/retries/idempotency + bulkheads without creating new p99 spikes
- How to use traces + a few key metrics to pinpoint the slow hop fast