Advanced

Database Sharding

Data & Consistency

Partition data across multiple databases to scale write throughput and storage

Core Idea

Split data into shards by hash, range, tenant, or geography. Route requests to the correct shard and rebalance online as data grows.

When to Use

When a single database cannot handle write volume or dataset size, or you need tenant/geo isolation.

Recognition Cues

Indicators that this pattern might be the right solution

Primary bottlenecked by write IOPS
Large tables exceed manageable size
Hot partitions or tenants dominate load

Pattern Variants & Approaches

Overview

Route requests based on a shard key to the correct shard; rebalance as data/hotspots evolve.

Overview Architecture

When to Use This Variant

Single node limits reached
Skewed hot keys
Need parallelism across many nodes

Use Case

User/content stores and large multi-tenant datasets.

Advantages

Linear scale with shards
Isolation of hotspots
Geo-partitioning options

Implementation Example

# Shard router sketch
shard = hash(user_id) % N
conn = pools[shard]
conn.query(...)

Tradeoffs

Pros

Scales writes and storage horizontally
Tenant and geo isolation options
Operational flexibility with shard moves

Cons

Application and query complexity
Cross-shard consistency challenges
Operationally heavy resharding

Common Pitfalls

Poor shard key causing hotspots
Cross-shard joins/transactions with high latency
Complex online resharding with downtime risk
Uneven shard sizes and skew

Design Considerations

Shard key selection and over-sharding
Directory service or consistent hashing
Dual-writes and backfills for resharding
Throttled migrations and rebalancing
Isolation per tenant/region when required

Real-World Examples

Twitter

Timeline and user shards

Hundreds of millions of users

Instagram

Range/hash sharding for media and users

Billions of media objects

YouTube

Content and metadata sharded globally

Exabyte-scale datasets

Complexity Analysis

Scalability

High - Many shards in parallel

Implementation Complexity

High - Routing and rebalancing

Cost

Medium to High - More nodes

Related Patterns

Database Replication CQRS Consistent Hashing Service Discovery