Autoscaling cheatsheet.
HPA (Horizontal Pod Autoscaler)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: web }
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target: { type: Utilization, averageUtilization: 70 }
- type: Resource
resource:
name: memory
target: { type: Utilization, averageUtilization: 80 }
Requires metrics-server installed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
HPA commands
kubectl get hpa
kubectl describe hpa web
kubectl top pod
kubectl top node
Behavior (smooth scaling)
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 60
- type: Pods
value: 4
periodSeconds: 60
selectPolicy: Max
Slow scale-down, fast scale-up.
Custom metrics
metrics:
- type: Pods
pods:
metric: { name: http_requests_per_second }
target: { type: AverageValue, averageValue: "100" }
- type: External
external:
metric:
name: queue_length
selector: { matchLabels: { queue: tasks } }
target: { type: Value, value: "50" }
Requires prometheus-adapter or similar.
KEDA (event-driven)
helm install keda kedacore/keda -n keda --create-namespace
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata: { name: worker }
spec:
scaleTargetRef:
name: worker # Deployment
minReplicaCount: 0
maxReplicaCount: 50
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: redis
metadata:
address: redis:6379
listName: tasks
listLength: "10"
Scales from 0 to N based on queue depth. Supports many sources: SQS, RabbitMQ, Kafka, Postgres, cron, Prometheus, etc.
ScaledJob (per-event Job)
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata: { name: process-job }
spec:
jobTargetRef:
template:
spec:
containers:
- name: w
image: worker:v1
restartPolicy: Never
triggers:
- type: aws-sqs-queue
metadata: { queueURL: ..., queueLength: "1" }
One job per queue message.
VPA (Vertical Pod Autoscaler)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata: { name: web }
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web
updatePolicy:
updateMode: Auto # Auto, Recreate, Initial, Off
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed: { cpu: 100m, memory: 128Mi }
maxAllowed: { cpu: 2, memory: 2Gi }
Recommends/applies CPU+memory requests/limits based on observed usage.
⚠️ Don’t combine VPA Auto with HPA on the same resource type — they fight.
Cluster Autoscaler
Scales nodes when pods can’t schedule. Configured per cloud:
# EKS managed node group
eksctl create cluster ... --asg-access
# or use Karpenter (recommended)
Karpenter (modern node autoscaler)
apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
template:
spec:
requirements:
- { key: kubernetes.io/arch, operator: In, values: [amd64, arm64] }
- { key: karpenter.sh/capacity-type, operator: In, values: [on-demand, spot] }
- { key: node.kubernetes.io/instance-type, operator: In, values: [m6i.large, m6g.large] }
nodeClassRef:
name: default
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
Karpenter picks the right instance type, packs pods, handles spot. Faster + cheaper than Cluster Autoscaler.
Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: web }
spec:
minAvailable: 2
# or
# maxUnavailable: 1
selector:
matchLabels: { app: web }
Prevents voluntary disruption (drain, autoscaler) from killing too many.
Resource requests are critical
Without resources.requests, HPA can’t compute % utilization. Always set:
containers:
- resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { cpu: 1, memory: 512Mi }
Vertical sizing tip
Start with explicit requests = (observed avg + 20%). Tune via VPA recommendations.
When to use each
- HPA: variable HTTP load.
- VPA: workload sizing (right-size pods).
- KEDA: event-driven (queues, cron, external metrics).
- Cluster Autoscaler / Karpenter: node-level.
Common mistakes
- HPA without
requests→ can’t scale. - HPA on JVM app without
XX:+UseContainerSupport→ mis-reads container memory. - Combining VPA Auto + HPA on same dimension → conflict.
- No PDB → autoscaler can disrupt all replicas.
- Cluster Autoscaler with
:latestimages → cold-start churn.
Read this next
If you want my HPA + KEDA + Karpenter setup, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .