Intermediate

Rate Limiting

Reliability & Latency

Token bucket, leaky bucket, and sliding window to control request rates

Core Idea

Protect services and enforce fairness by controlling the rate of requests using algorithms like token bucket, leaky bucket, or sliding window counters.

When to Use

When you must protect upstream capacity, prevent abuse, or enforce per-user/API quotas at the edge or service level.

Recognition Cues

Indicators that this pattern might be the right solution

Expensive endpoints overwhelmed by bursts
Abuse or scraping impacts stability
Noisy neighbors starving other tenants

Pattern Variants & Approaches

Overview

Edge or per-service limiter enforces quotas using shared counters; allowed requests pass through, excess receive 429.

Overview Architecture

When to Use This Variant

Protect APIs from abuse
Per-tenant fairness
Expensive endpoints

Use Case

Public APIs, multi-tenant platforms, and resource-intensive operations.

Advantages

SLO protection
Cost control
Fair sharing

Implementation Example

# Token bucket sketch
if tokens(user) > 0:
  consume()
  allow()
else:
  reject(429)

Tradeoffs

Pros

Protects availability and SLOs
Fairness across tenants and users
Cost control for expensive operations

Cons

Operational and consistency complexity
Stateful components and performance overhead
Risk of blocking legitimate traffic

Common Pitfalls

Global counters as hot keys
Inconsistent limits across instances
Overly strict windows causing false positives
No 429 hints for clients to back off
Clock skew in distributed windows

Design Considerations

Select algorithm per use case (bursty vs smooth)
Enforce at edge/gateway and close to target
Distributed counters (Redis, Envoy, service mesh)
Separate quotas by principal (IP/user/token)
Expose headers and Retry-After for clients

Real-World Examples

GitHub

Per-token and per-IP API limits

Billions of API requests/day

Cloudflare

Edge WAF + rate limits at POPs

20%+ of internet traffic

Twitter

API tiering and dynamic quotas

Global real-time API

Complexity Analysis

Scalability

Distributed or per-instance

Implementation Complexity

Medium - Counters and coordination

Cost

Low to Medium - Cache or proxy costs

API Gateway Service Mesh Retry with Backoff Circuit Breaker