CodeMosa

Master LeetCode Patterns

Rate Limiting

Reliability & Latency

Token bucket, leaky bucket, and sliding window to control request rates

Core Idea

#

Protect services and enforce fairness by controlling the rate of requests using algorithms like token bucket, leaky bucket, or sliding window counters.

When to Use

#

When you must protect upstream capacity, prevent abuse, or enforce per-user/API quotas at the edge or service level.

Recognition Cues

#
Indicators that this pattern might be the right solution
  • Expensive endpoints overwhelmed by bursts
  • Abuse or scraping impacts stability
  • Noisy neighbors starving other tenants

Pattern Variants & Approaches

#

Overview

#
Edge or per-service limiter enforces quotas using shared counters; allowed requests pass through, excess receive 429.

Overview Architecture

RequestsCountersAllowed429 Throttle👤Client⚖️Gateway/Rate LimiterCounter Store⚙️Service

When to Use This Variant

  • Protect APIs from abuse
  • Per-tenant fairness
  • Expensive endpoints

Use Case

Public APIs, multi-tenant platforms, and resource-intensive operations.

Advantages

  • SLO protection
  • Cost control
  • Fair sharing

Implementation Example

# Token bucket sketch
if tokens(user) > 0:
  consume()
  allow()
else:
  reject(429)

Tradeoffs

#

Pros

  • Protects availability and SLOs
  • Fairness across tenants and users
  • Cost control for expensive operations

Cons

  • Operational and consistency complexity
  • Stateful components and performance overhead
  • Risk of blocking legitimate traffic

Common Pitfalls

#
  • Global counters as hot keys
  • Inconsistent limits across instances
  • Overly strict windows causing false positives
  • No 429 hints for clients to back off
  • Clock skew in distributed windows

Design Considerations

#
  • Select algorithm per use case (bursty vs smooth)
  • Enforce at edge/gateway and close to target
  • Distributed counters (Redis, Envoy, service mesh)
  • Separate quotas by principal (IP/user/token)
  • Expose headers and Retry-After for clients

Real-World Examples

#
GitHub

Per-token and per-IP API limits

Billions of API requests/day
Cloudflare

Edge WAF + rate limits at POPs

20%+ of internet traffic
Twitter

API tiering and dynamic quotas

Global real-time API

Complexity Analysis

#
Scalability

Distributed or per-instance

Implementation Complexity

Medium - Counters and coordination

Cost

Low to Medium - Cache or proxy costs