Intermediate

Load Balancing

Core Scale & Availability

Distribute incoming traffic across multiple servers to improve availability and performance

Core Idea

Load balancing distributes network traffic across multiple servers to ensure no single server bears too much load. It improves application responsiveness, increases availability, and enables horizontal scaling.

When to Use

Use load balancing when you need to handle high traffic volumes, ensure high availability, enable zero-downtime deployments, or scale horizontally by adding more servers.

Recognition Cues

Indicators that this pattern might be the right solution

Application needs to handle thousands of concurrent requests
Single server becomes a bottleneck or single point of failure
Need to perform rolling updates without downtime
Traffic patterns are unpredictable or have significant spikes
Geographic distribution of users requires regional routing

Pattern Variants & Approaches

Layer 4 (Transport Layer) Load Balancing

Operates at the TCP/UDP level, making routing decisions based on IP addresses and ports without inspecting packet content

Layer 4 (Transport Layer) Load Balancing Architecture

When to Use This Variant

Need maximum performance and lowest latency
Protocol-agnostic load balancing required
Handling non-HTTP protocols (databases, game servers)
Simple routing logic based on connection count

Use Case

High-performance scenarios like database connection pooling, game servers, or IoT device connections

Advantages

Faster than Layer 7 (no packet inspection)
Lower CPU and memory overhead
Protocol-agnostic (works with any TCP/UDP traffic)
Can handle millions of connections

Implementation Example

# Layer 4 Load Balancer Configuration (HAProxy example)
frontend tcp_front
    bind *:3306
    mode tcp
    default_backend mysql_servers

backend mysql_servers
    mode tcp
    balance leastconn
    option tcp-check
    server mysql1 10.0.1.10:3306 check
    server mysql2 10.0.1.11:3306 check
    server mysql3 10.0.1.12:3306 check

Layer 7 (Application Layer) Load Balancing

Operates at the HTTP/HTTPS level, making intelligent routing decisions based on request content, headers, cookies, and URLs

Layer 7 (Application Layer) Load Balancing Architecture

When to Use This Variant

Need content-based routing (URL path, headers, cookies)
Require SSL termination at load balancer
Want to implement A/B testing or canary deployments
Need to route different API versions to different backends

Use Case

Web applications requiring intelligent routing, microservices architectures, API gateways, and content-based traffic distribution

Advantages

Content-aware routing decisions
Can modify requests/responses (add headers, rewrite URLs)
SSL/TLS termination reduces backend load
Better for HTTP-specific optimizations

Implementation Example

# Layer 7 Load Balancer Configuration (NGINX example)
upstream api_v1 {
    least_conn;
    server api1.example.com:8080;
    server api2.example.com:8080;
}

upstream api_v2 {
    least_conn;
    server api3.example.com:8080;
    server api4.example.com:8080;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    # Route based on URL path
    location /v1/ {
        proxy_pass http://api_v1;
    }

    location /v2/ {
        proxy_pass http://api_v2;
    }

    # Health check endpoint
    location /health {
        access_log off;
        return 200 "healthy\n";
    }
}

Global Server Load Balancing (GSLB)

Distributes traffic across multiple geographic regions based on user location, server health, and latency

Global Server Load Balancing (GSLB) Architecture

When to Use This Variant

Users distributed across multiple continents
Need disaster recovery across regions
Want to minimize latency for global users
Require compliance with data residency laws

Use Case

Global applications serving users worldwide, multi-region disaster recovery, and compliance with geographic data regulations

Advantages

Reduced latency by routing to nearest region
Geographic redundancy and disaster recovery
Compliance with data residency requirements
Can route based on real-time performance metrics

Implementation Example

# GSLB Configuration (Conceptual)
# DNS-based global load balancing

# Route 53 Geolocation Routing Policy
resource "aws_route53_record" "api_us" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  geolocation_routing_policy {
    continent = "NA"
  }
  
  alias {
    name    = aws_lb.us_east.dns_name
    zone_id = aws_lb.us_east.zone_id
  }
  
  health_check_id = aws_route53_health_check.us.id
}

resource "aws_route53_record" "api_eu" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  geolocation_routing_policy {
    continent = "EU"
  }
  
  alias {
    name    = aws_lb.eu_west.dns_name
    zone_id = aws_lb.eu_west.zone_id
  }
  
  health_check_id = aws_route53_health_check.eu.id
}

Tradeoffs

Pros

Improved application availability and fault tolerance
Horizontal scalability by adding more servers
Better resource utilization across server fleet
Enables zero-downtime deployments
Can provide SSL termination and DDoS protection

Cons

Adds complexity to infrastructure
Load balancer itself can become a bottleneck
Sticky sessions can lead to uneven load distribution
Additional cost for load balancer infrastructure
Requires careful configuration and monitoring

Common Pitfalls

Not implementing health checks, leading to traffic sent to failed servers
Using sticky sessions without considering session replication
Ignoring SSL/TLS termination placement (at load balancer vs. backend)
Not monitoring load balancer itself as a potential bottleneck
Misconfiguring timeout values causing cascading failures

Design Considerations

Choose between Layer 4 (TCP/UDP) and Layer 7 (HTTP/HTTPS) load balancing
Implement proper health check mechanisms (active vs. passive)
Decide on load balancing algorithm (round-robin, least connections, IP hash)
Plan for SSL/TLS termination strategy
Consider session persistence requirements (sticky sessions)
Design for load balancer redundancy (avoid single point of failure)

Real-World Examples

Netflix

Uses AWS ELB and custom Zuul gateway for routing 200+ million requests per day

Petabytes of traffic, millions of concurrent streams

Uber

Multi-region load balancing with automatic failover for ride matching service

Millions of rides per day across 70+ countries

Cloudflare

Global anycast network with load balancing across 300+ data centers

Handles 20%+ of all internet traffic

Complexity Analysis

Scalability

Horizontal - Add more backend servers

Implementation Complexity

Medium - Requires proper configuration and monitoring

Cost

Medium - Load balancer costs + increased infrastructure

Related Patterns

Service Discovery Health Checks Circuit Breaker Rate Limiting CDN/Edge Caching