CodeMosa

Master LeetCode Patterns

Retry with Backoff

Reliability & Latency

Exponential backoff with jitter for resilient retries of transient failures

Core Idea

#

Retries should be bounded, exponential, and jittered to avoid coordinated retry storms. Only retry idempotent operations and respect server guidance.

When to Use

#

When encountering transient network errors, 5xx responses, or timeouts on idempotent requests.

Recognition Cues

#
Indicators that this pattern might be the right solution
  • Burst of 5xx/timeouts under load
  • Thundering herd after dependency failure
  • Client storms synchronized by fixed delays

Pattern Variants & Approaches

#

Overview

#
Client performs bounded exponential backoff with jitter on transient failures, respecting idempotency and server hints.

Overview Architecture

Attempt / Retry (jittered)Response/Error👤Client⚙️Service

When to Use This Variant

  • Transient 5xx/timeouts
  • Retry-After headers
  • Idempotent operations

Use Case

HTTP/gRPC clients, SDKs, and background jobs communicating over unreliable networks.

Advantages

  • Avoids retry storms
  • Improves success under flakes
  • Simple client-side policy

Implementation Example

# Exponential backoff with jitter (pseudocode)
for attempt in range(max_attempts):
  try:
    return call()
  except TransientError:
    sleep(rand(0, base * 2**attempt))

Tradeoffs

#

Pros

  • Improves resiliency to transient faults
  • Reduces load during outages
  • Simple to implement with libraries

Cons

  • Increases latency on failure paths
  • Can amplify load if mis-tuned
  • Complexity with layered retry policies

Common Pitfalls

#
  • Retrying non-idempotent operations
  • Unbounded attempts or total retry time
  • Coordinated retries without jitter
  • Layered retries across gateway, mesh, and client
  • Ignoring Retry-After headers

Design Considerations

#
  • Exponential backoff with full/decorrelated jitter
  • Cap attempts and total budget per request
  • Idempotency keys for writes where possible
  • Status/exception-based retry policies
  • Outlier detection and hedging for tail latency (sparingly)

Real-World Examples

#
AWS SDKs

Built-in exponential backoff with jitter

Millions of clients
Google

SRE guidance on jittered retries

Planet-scale services
Stripe

Idempotency keys + retries for payments

Global API traffic

Complexity Analysis

#
Scalability

Client-side - Applies per call

Implementation Complexity

Low to Medium - Policy tuning

Cost

Low - Library-level feature