Message Queues

Messaging & Async

Asynchronous communication with queues and topics to decouple producers and consumers

Core Idea

Use queues to buffer work, smooth spikes, and decouple producers from consumers. Choose delivery semantics and handle retries and dead letters.

When to Use

When tasks can be processed asynchronously, workloads are bursty, or you need to decouple microservices.

Recognition Cues
Indicators that this pattern might be the right solution
  • Synchronous calls frequently timeout
  • Traffic spikes overwhelm backend
  • Tight coupling between producer and worker

Pattern Variants & Approaches

Overview
Producers enqueue tasks; consumers process asynchronously with retries and DLQs for poison messages.

Overview Architecture

EnqueueDequeueDead letters⚙️Producer📬Queue⚙️Consumer📬DLQ

When to Use This Variant

  • Bursty workloads
  • Long-running tasks
  • Need decoupling of services

Use Case

Emailing, video processing, ETL, and background jobs.

Advantages

  • Smoothing spikes
  • Independent scaling
  • Resilient retries

Implementation Example

# Queue flow
queue.send(msg)
msg = queue.receive()
try:
  handle(msg)
  queue.ack(msg)
except:
  queue.nack_to_dlq(msg)
Tradeoffs

Pros

  • Smooths spikes and improves availability
  • Decouples systems and reduces coupling
  • Allows independent scaling of consumers

Cons

  • Operational overhead and monitoring
  • Eventual consistency
  • Complex failure handling
Common Pitfalls
  • Unbounded queues increasing latency
  • No poison message handling (DLQ)
  • Assuming global ordering across partitions
  • At-least-once duplicates not handled
  • Large messages causing slow brokers
Design Considerations
  • Pick delivery semantics (at-least/at-most/exactly once)
  • Visibility/ack timeouts and retry policies
  • Dead letter queues and quarantine
  • Idempotent consumers and dedupe keys
  • Batching, prefetch, and backpressure
Real-World Examples
AWS SQS

Durable queues for async jobs

Millions of TPS across accounts
RabbitMQ

AMQP-based work queues

Large on-prem and cloud fleets
Apache Kafka

As a durable log with consumer groups

PB-scale clusters
Complexity Analysis
Scalability

High - Partitioned consumption

Implementation Complexity

Medium - Semantics and tuning

Cost

Low to Medium - Broker costs