CodeMosa

Master LeetCode Patterns

Database Sharding

Data & Consistency

Partition data across multiple databases to scale write throughput and storage

Core Idea

#

Split data into shards by hash, range, tenant, or geography. Route requests to the correct shard and rebalance online as data grows.

When to Use

#

When a single database cannot handle write volume or dataset size, or you need tenant/geo isolation.

Recognition Cues

#
Indicators that this pattern might be the right solution
  • Primary bottlenecked by write IOPS
  • Large tables exceed manageable size
  • Hot partitions or tenants dominate load

Pattern Variants & Approaches

#

Overview

#
Route requests based on a shard key to the correct shard; rebalance as data/hotspots evolve.

Overview Architecture

Shard Key⚙️Application⚖️Shard Router💾Shard 1💾Shard 2💾Shard 3

When to Use This Variant

  • Single node limits reached
  • Skewed hot keys
  • Need parallelism across many nodes

Use Case

User/content stores and large multi-tenant datasets.

Advantages

  • Linear scale with shards
  • Isolation of hotspots
  • Geo-partitioning options

Implementation Example

# Shard router sketch
shard = hash(user_id) % N
conn = pools[shard]
conn.query(...)

Tradeoffs

#

Pros

  • Scales writes and storage horizontally
  • Tenant and geo isolation options
  • Operational flexibility with shard moves

Cons

  • Application and query complexity
  • Cross-shard consistency challenges
  • Operationally heavy resharding

Common Pitfalls

#
  • Poor shard key causing hotspots
  • Cross-shard joins/transactions with high latency
  • Complex online resharding with downtime risk
  • Uneven shard sizes and skew

Design Considerations

#
  • Shard key selection and over-sharding
  • Directory service or consistent hashing
  • Dual-writes and backfills for resharding
  • Throttled migrations and rebalancing
  • Isolation per tenant/region when required

Real-World Examples

#
Twitter

Timeline and user shards

Hundreds of millions of users
Instagram

Range/hash sharding for media and users

Billions of media objects
YouTube

Content and metadata sharded globally

Exabyte-scale datasets

Complexity Analysis

#
Scalability

High - Many shards in parallel

Implementation Complexity

High - Routing and rebalancing

Cost

Medium to High - More nodes