Core Scale & Availability
Scale by adding more machines rather than upgrading a single node
Increase capacity by adding more replicas and distributing load across them. Prefer scale-out for elasticity, resilience, and cost efficiency over vertical scale-up.
When workload grows beyond a single node, traffic is bursty, or you need zero-downtime elasticity and fault tolerance.
Burst handling, low-latency APIs, and zero-downtime deployments across zones.
# Horizontal scaling primitives
# Cloud: ASG/Instance Group + LB
# K8s: Deployment + HPA + Service (LoadBalancer)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
minReplicas: 2
maxReplicas: 20
metrics: [{ type: Resource, resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 }}}]
ASGs scale EC2 fleets based on traffic and SLOs
Hundreds of thousands of instancesHPA scales pods by CPU/custom metrics
Clusters with thousands of podsEdge runs many small stateless workers per POP
Hundreds of cities globallyHigh - Add replicas elastically
Medium - Coordination and autoscaling
Variable - Pay for headroom