Load Balancing

Core Scale & Availability

Distribute incoming traffic across multiple servers to improve availability and performance

Core Idea

Load balancing distributes network traffic across multiple servers to ensure no single server bears too much load. It improves application responsiveness, increases availability, and enables horizontal scaling.

When to Use

Use load balancing when you need to handle high traffic volumes, ensure high availability, enable zero-downtime deployments, or scale horizontally by adding more servers.

Recognition Cues
Indicators that this pattern might be the right solution
  • Application needs to handle thousands of concurrent requests
  • Single server becomes a bottleneck or single point of failure
  • Need to perform rolling updates without downtime
  • Traffic patterns are unpredictable or have significant spikes
  • Geographic distribution of users requires regional routing

Pattern Variants & Approaches

Layer 4 (Transport Layer) Load Balancing
Operates at the TCP/UDP level, making routing decisions based on IP addresses and ports without inspecting packet content

Layer 4 (Transport Layer) Load Balancing Architecture

TCP/UDPRound Robin👤Client⚖️Load Balancer🖥️Server 1🖥️Server 2🖥️Server 3💾Database

When to Use This Variant

  • Need maximum performance and lowest latency
  • Protocol-agnostic load balancing required
  • Handling non-HTTP protocols (databases, game servers)
  • Simple routing logic based on connection count

Use Case

High-performance scenarios like database connection pooling, game servers, or IoT device connections

Advantages

  • Faster than Layer 7 (no packet inspection)
  • Lower CPU and memory overhead
  • Protocol-agnostic (works with any TCP/UDP traffic)
  • Can handle millions of connections

Implementation Example

# Layer 4 Load Balancer Configuration (HAProxy example)
frontend tcp_front
    bind *:3306
    mode tcp
    default_backend mysql_servers

backend mysql_servers
    mode tcp
    balance leastconn
    option tcp-check
    server mysql1 10.0.1.10:3306 check
    server mysql2 10.0.1.11:3306 check
    server mysql3 10.0.1.12:3306 check
Layer 7 (Application Layer) Load Balancing
Operates at the HTTP/HTTPS level, making intelligent routing decisions based on request content, headers, cookies, and URLs

Layer 7 (Application Layer) Load Balancing Architecture

HTTPS/v1/*/v2/*👤Client⚖️L7 Load Balancer⚙️API v1⚙️API v2Cache💾Database

When to Use This Variant

  • Need content-based routing (URL path, headers, cookies)
  • Require SSL termination at load balancer
  • Want to implement A/B testing or canary deployments
  • Need to route different API versions to different backends

Use Case

Web applications requiring intelligent routing, microservices architectures, API gateways, and content-based traffic distribution

Advantages

  • Content-aware routing decisions
  • Can modify requests/responses (add headers, rewrite URLs)
  • SSL/TLS termination reduces backend load
  • Better for HTTP-specific optimizations

Implementation Example

# Layer 7 Load Balancer Configuration (NGINX example)
upstream api_v1 {
    least_conn;
    server api1.example.com:8080;
    server api2.example.com:8080;
}

upstream api_v2 {
    least_conn;
    server api3.example.com:8080;
    server api4.example.com:8080;
}

server {
    listen 443 ssl;
    server_name api.example.com;

    # Route based on URL path
    location /v1/ {
        proxy_pass http://api_v1;
    }

    location /v2/ {
        proxy_pass http://api_v2;
    }

    # Health check endpoint
    location /health {
        access_log off;
        return 200 "healthy\n";
    }
}
Global Server Load Balancing (GSLB)
Distributes traffic across multiple geographic regions based on user location, server health, and latency

Global Server Load Balancing (GSLB) Architecture

DNS QueryDNS QueryGeo RouteGeo Route👤US Client👤EU Client⚖️DNS/GSLB⚖️US LB⚖️EU LB🖥️US Servers🖥️EU Servers

When to Use This Variant

  • Users distributed across multiple continents
  • Need disaster recovery across regions
  • Want to minimize latency for global users
  • Require compliance with data residency laws

Use Case

Global applications serving users worldwide, multi-region disaster recovery, and compliance with geographic data regulations

Advantages

  • Reduced latency by routing to nearest region
  • Geographic redundancy and disaster recovery
  • Compliance with data residency requirements
  • Can route based on real-time performance metrics

Implementation Example

# GSLB Configuration (Conceptual)
# DNS-based global load balancing

# Route 53 Geolocation Routing Policy
resource "aws_route53_record" "api_us" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  geolocation_routing_policy {
    continent = "NA"
  }
  
  alias {
    name    = aws_lb.us_east.dns_name
    zone_id = aws_lb.us_east.zone_id
  }
  
  health_check_id = aws_route53_health_check.us.id
}

resource "aws_route53_record" "api_eu" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"
  
  geolocation_routing_policy {
    continent = "EU"
  }
  
  alias {
    name    = aws_lb.eu_west.dns_name
    zone_id = aws_lb.eu_west.zone_id
  }
  
  health_check_id = aws_route53_health_check.eu.id
}
Tradeoffs

Pros

  • Improved application availability and fault tolerance
  • Horizontal scalability by adding more servers
  • Better resource utilization across server fleet
  • Enables zero-downtime deployments
  • Can provide SSL termination and DDoS protection

Cons

  • Adds complexity to infrastructure
  • Load balancer itself can become a bottleneck
  • Sticky sessions can lead to uneven load distribution
  • Additional cost for load balancer infrastructure
  • Requires careful configuration and monitoring
Common Pitfalls
  • Not implementing health checks, leading to traffic sent to failed servers
  • Using sticky sessions without considering session replication
  • Ignoring SSL/TLS termination placement (at load balancer vs. backend)
  • Not monitoring load balancer itself as a potential bottleneck
  • Misconfiguring timeout values causing cascading failures
Design Considerations
  • Choose between Layer 4 (TCP/UDP) and Layer 7 (HTTP/HTTPS) load balancing
  • Implement proper health check mechanisms (active vs. passive)
  • Decide on load balancing algorithm (round-robin, least connections, IP hash)
  • Plan for SSL/TLS termination strategy
  • Consider session persistence requirements (sticky sessions)
  • Design for load balancer redundancy (avoid single point of failure)
Real-World Examples
Netflix

Uses AWS ELB and custom Zuul gateway for routing 200+ million requests per day

Petabytes of traffic, millions of concurrent streams
Uber

Multi-region load balancing with automatic failover for ride matching service

Millions of rides per day across 70+ countries
Cloudflare

Global anycast network with load balancing across 300+ data centers

Handles 20%+ of all internet traffic
Complexity Analysis
Scalability

Horizontal - Add more backend servers

Implementation Complexity

Medium - Requires proper configuration and monitoring

Cost

Medium - Load balancer costs + increased infrastructure