Timeouts, Retries and Idempotency In Distributed Systems

The definition of insanity is doing the same thing over and over again” - this quote attributed to Einstein warns us of the danger of magical thinking, hoping that trying something just one more time will achieve success when before we failed. But is this really insanity?

In this talk, I’ll argue that retrying things actually does make a lot of sense, and is in fact key to improving the resilience of a distributed system. Along the way, I’ll explain the importance of timeouts, retry limits and knowing when giving up does make sense. I’ll also show how retries can be made safe (and help avoid draining your bank account), and perhaps we’ll get to examine that Einstein quote in a bit more detail.


Speaker

Sam Newman

Microservice, Cloud, CI/CD Expert, Author of "Building Microservices" and "Monolith to Microservices", 20+ Years Experience as a Developer

Sam is a technologist focusing in the areas of cloud, microservices, and continuous delivery - three topics which seem to overlap frequently. Providing consulting, training and advisory services to startups and large multi-national enterprises alike, he has over 20 years in IT as a developer, sys admin and architect. Sam is also author of the best selling Building Microservices and the forthcoming Building Resilient Distributed Systems, both from O’Reilly, and is an experienced conference speaker.

Read more
Find Sam Newman at:

From the same track

Session architecture

Platforms for Secure API Connectivity With Architecture as Code

Wednesday Apr 9 / 03:55PM BST

As microservices and complex platforms become the standard, ensuring secure connectivity while maintaining a smooth developer experience is a significant challenge. Traditional security models often introduce friction, slowing down innovation and deployment.

Speaker image - James Gough

James Gough

Distinguished Engineer, API Platform Lead Architect @Morgan Stanley, Co-Author of Optimizing Java

Session

From Dashboard Soup to Observability Lasagna: Building Better Layers

Wednesday Apr 9 / 02:45PM BST

Let's be honest - observability can suck. Ever feel like you're swimming in dashboard soup? You know the feeling: tons of single-use dashboards, building new ones during every incident only to lose them in the chaos, and spending ages creating visualizations that no one ever looks at again.

Speaker image - Martha Lambert

Martha Lambert

Product Engineer @incident.io, Building Reliable and Observable Systems

Session APIs

Scaling API Independence: Mocking, Contract Testing & Observability in Large Microservices Environments

Wednesday Apr 9 / 01:35PM BST

Microservices promise faster deployments and team autonomy. In reality, engineers are often blocked waiting for APIs, dealing with broken sandboxes, or wrangling test environments.

Speaker image - Tom Akehurst

Tom Akehurst

CTO and Co-Founder @WireMock, 20+ Years Building Enterprise Systems

Session architecture

From Confusion to Clarity: Advanced Observability Strategies for Media Workflows at Netflix

Wednesday Apr 9 / 11:45AM BST

Managing media workflows at the Netflix scale is both thrilling and daunting. With millions of workflow executions across hundreds of types and over 500 million CPU hours consumed quarterly, costs can skyrocket, and encoding issues can disrupt the streaming experience.

Speaker image - Sujana Sooreddy

Sujana Sooreddy

Software Engineer @Netflix - Building High Scale Observability Solutions

Speaker image - Naveen Mareddy

Naveen Mareddy

Staff Engineer @Netflix, 20+ years in Software Engineering, Creator of MediaInfra Meetup, Speaker, Mentor