A typical Kubernetes cluster wastes 50%+ of capacity. The cost is real, the fixes are boring. This post is the practical playbook.
Right-size requests
Every pod has CPU/memory requests. The scheduler reserves that capacity. Over-provisioning means nodes provision more than needed.
resources:
requests:
cpu: "100m" # actual usage in your monitoring
memory: "256Mi"
limits:
cpu: "500m" # higher than request to allow bursts
memory: "512Mi"
Use VPA recommendations :
kubectl describe vpa my-app | grep -A3 "Recommendation"
Apply recommendations; recheck monthly.
Karpenter for node provisioning
Cluster Autoscaler has node-group constraints; Karpenter provisions nodes per pod’s requirements:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: [amd64, arm64]
- key: karpenter.sh/capacity-type
operator: In
values: [spot, on-demand]
nodeClassRef:
name: default
limits: { cpu: 1000 }
disruption:
consolidationPolicy: WhenUnderutilized
Bin-packs better. Spins up exact-size nodes. Consolidates underutilized nodes. Typical savings: 20–40% on node spend.
Spot capacity
For interruptible workloads (CI, background jobs, batch ML), use spot:
- Up to 90% off on EC2.
- Workloads must handle SIGTERM (graceful shutdown).
- Karpenter handles spot pool fallback automatically.
For stateful or user-facing services, stick with on-demand. Mixed pool gets you most of the savings.
HPA + VPA
- HPA: scales replicas based on CPU / memory / custom metric.
- VPA: adjusts requests for existing pods.
Run both:
- VPA recommends realistic
requests. - HPA scales replicas based on load.
Conflict: HPA on CPU + VPA on CPU same metric. Solve by using different metrics — VPA on CPU usage, HPA on RPS or queue depth via KEDA .
Topology spread
Don’t over-spread:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
ScheduleAnyway lets the scheduler bin-pack better. DoNotSchedule forces strict spread; expensive.
Cluster sizing
Single huge cluster vs. many small:
- Huge: better bin-packing, lower per-cluster overhead, cheaper.
- Small: blast radius bounded, easier RBAC, easier upgrade.
For most teams under 500 nodes: one cluster per environment.
Observability of cost
Per-namespace, per-team cost visibility:
- OpenCost: open-source, works with Prometheus.
- Kubecost: commercial, polished.
Show teams their cost. Magic culturally.
For broader observability see Observability 2.0 .
Common mistakes
1. Setting requests = limits
Wastes capacity. Limits should be higher than requests for bursts.
2. CPU limits aggressive
CPU limits cause throttling, sometimes unnecessarily. For most workloads, set high limits or no CPU limit at all (let memory be the constraint).
3. Idle dev clusters running 24/7
Suspend at night, weekends. Easy 30%+ savings on non-prod.
4. PVCs orphaned
Pods get deleted; PVCs linger; you keep paying. Sweep monthly.
5. NAT Gateway egress
In AWS, NAT Gateway egress is $$$. Use VPC endpoints for AWS service traffic.
What I’d ship today
For a 2026 Kubernetes cluster:
- Karpenter for node provisioning.
- VPA recommendations applied quarterly.
- HPA on real load metrics (KEDA-driven).
- Spot pool for interruptibles.
- OpenCost dashboards visible to teams.
- Quarterly cost review as a calendar event.
Boring habits. 30–50% lower bills than the typical cluster.
Read this next
If you want my Karpenter + VPA + OpenCost setup, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .