Kubernetes Cheatsheet 08 — Autoscaling (HPA, VPA, Cluster Autoscaler)

Autoscaling cheatsheet.

HPA (Horizontal Pod Autoscaler)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: web }
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }
    - type: Resource
      resource:
        name: memory
        target: { type: Utilization, averageUtilization: 80 }

Requires metrics-server installed:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

HPA commands

kubectl get hpa
kubectl describe hpa web
kubectl top pod
kubectl top node

Behavior (smooth scaling)

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 60
        - type: Pods
          value: 4
          periodSeconds: 60
      selectPolicy: Max

Slow scale-down, fast scale-up.

Custom metrics

metrics:
  - type: Pods
    pods:
      metric: { name: http_requests_per_second }
      target: { type: AverageValue, averageValue: "100" }
  
  - type: External
    external:
      metric:
        name: queue_length
        selector: { matchLabels: { queue: tasks } }
      target: { type: Value, value: "50" }

Requires prometheus-adapter or similar.

KEDA (event-driven)

helm install keda kedacore/keda -n keda --create-namespace

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata: { name: worker }
spec:
  scaleTargetRef:
    name: worker          # Deployment
  minReplicaCount: 0
  maxReplicaCount: 50
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
    - type: redis
      metadata:
        address: redis:6379
        listName: tasks
        listLength: "10"

Scales from 0 to N based on queue depth. Supports many sources: SQS, RabbitMQ, Kafka, Postgres, cron, Prometheus, etc.

ScaledJob (per-event Job)

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata: { name: process-job }
spec:
  jobTargetRef:
    template:
      spec:
        containers:
          - name: w
            image: worker:v1
        restartPolicy: Never
  triggers:
    - type: aws-sqs-queue
      metadata: { queueURL: ..., queueLength: "1" }

One job per queue message.

VPA (Vertical Pod Autoscaler)

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata: { name: web }
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web
  updatePolicy:
    updateMode: Auto         # Auto, Recreate, Initial, Off
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed: { cpu: 100m, memory: 128Mi }
        maxAllowed: { cpu: 2, memory: 2Gi }

Recommends/applies CPU+memory requests/limits based on observed usage.

⚠️ Don’t combine VPA Auto with HPA on the same resource type — they fight.

Cluster Autoscaler

Scales nodes when pods can’t schedule. Configured per cloud:

# EKS managed node group
eksctl create cluster ... --asg-access
# or use Karpenter (recommended)

Karpenter (modern node autoscaler)

apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
  template:
    spec:
      requirements:
        - { key: kubernetes.io/arch, operator: In, values: [amd64, arm64] }
        - { key: karpenter.sh/capacity-type, operator: In, values: [on-demand, spot] }
        - { key: node.kubernetes.io/instance-type, operator: In, values: [m6i.large, m6g.large] }
      nodeClassRef:
        name: default
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s

Karpenter picks the right instance type, packs pods, handles spot. Faster + cheaper than Cluster Autoscaler.

Pod Disruption Budget

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata: { name: web }
spec:
  minAvailable: 2
  # or
  # maxUnavailable: 1
  selector:
    matchLabels: { app: web }

Prevents voluntary disruption (drain, autoscaler) from killing too many.

Resource requests are critical

Without resources.requests, HPA can’t compute % utilization. Always set:

containers:
  - resources:
      requests: { cpu: 100m, memory: 128Mi }
      limits: { cpu: 1, memory: 512Mi }

Vertical sizing tip

Start with explicit requests = (observed avg + 20%). Tune via VPA recommendations.

When to use each

HPA: variable HTTP load.
VPA: workload sizing (right-size pods).
KEDA: event-driven (queues, cron, external metrics).
Cluster Autoscaler / Karpenter: node-level.

Common mistakes

HPA without requests → can’t scale.
HPA on JVM app without XX:+UseContainerSupport → mis-reads container memory.
Combining VPA Auto + HPA on same dimension → conflict.
No PDB → autoscaler can disrupt all replicas.
Cluster Autoscaler with :latest images → cold-start churn.

HPA (Horizontal Pod Autoscaler)#

HPA commands#

Behavior (smooth scaling)#

Custom metrics#

KEDA (event-driven)#

ScaledJob (per-event Job)#

VPA (Vertical Pod Autoscaler)#

Cluster Autoscaler#

Karpenter (modern node autoscaler)#

Pod Disruption Budget#

Resource requests are critical#

Vertical sizing tip#

When to use each#

Common mistakes#

Read this next#