Database Sharding

Data & Consistency

Partition data across multiple databases to scale write throughput and storage

Core Idea

Split data into shards by hash, range, tenant, or geography. Route requests to the correct shard and rebalance online as data grows.

When to Use

When a single database cannot handle write volume or dataset size, or you need tenant/geo isolation.

Recognition Cues
Indicators that this pattern might be the right solution
  • Primary bottlenecked by write IOPS
  • Large tables exceed manageable size
  • Hot partitions or tenants dominate load

Pattern Variants & Approaches

Overview
Route requests based on a shard key to the correct shard; rebalance as data/hotspots evolve.

Overview Architecture

Shard Key⚙️Application⚖️Shard Router💾Shard 1💾Shard 2💾Shard 3

When to Use This Variant

  • Single node limits reached
  • Skewed hot keys
  • Need parallelism across many nodes

Use Case

User/content stores and large multi-tenant datasets.

Advantages

  • Linear scale with shards
  • Isolation of hotspots
  • Geo-partitioning options

Implementation Example

# Shard router sketch
shard = hash(user_id) % N
conn = pools[shard]
conn.query(...)
Tradeoffs

Pros

  • Scales writes and storage horizontally
  • Tenant and geo isolation options
  • Operational flexibility with shard moves

Cons

  • Application and query complexity
  • Cross-shard consistency challenges
  • Operationally heavy resharding
Common Pitfalls
  • Poor shard key causing hotspots
  • Cross-shard joins/transactions with high latency
  • Complex online resharding with downtime risk
  • Uneven shard sizes and skew
Design Considerations
  • Shard key selection and over-sharding
  • Directory service or consistent hashing
  • Dual-writes and backfills for resharding
  • Throttled migrations and rebalancing
  • Isolation per tenant/region when required
Real-World Examples
Twitter

Timeline and user shards

Hundreds of millions of users
Instagram

Range/hash sharding for media and users

Billions of media objects
YouTube

Content and metadata sharded globally

Exabyte-scale datasets
Complexity Analysis
Scalability

High - Many shards in parallel

Implementation Complexity

High - Routing and rebalancing

Cost

Medium to High - More nodes