Cloud bills compound. A team that started at $5k/month is at $50k/month in 18 months without doing anything wrong. The cost-optimization playbook isn’t exotic — it’s a few boring tactics applied consistently. This post is the working set.
Right-sizing (the biggest lever)
Most teams over-provision. Run for 2 weeks. Look at peak utilization. If CPU averages 15%, you have 2–4× too much.
EC2 / Cloud Run / GKE: look at actual CPU + memory.
RDS: connections + CPU + memory.
ElastiCache: memory + CPU.
Tools: AWS Compute Optimizer, GCP Recommender, K8s VPA recommendations.
A right-sizing pass typically cuts 30%+ of compute spend.
Savings plans / reserved instances
For predictable steady-state workloads:
- AWS Savings Plans (Compute): 1-year commit → 27% off; 3-year → 54% off.
- GCP Committed Use Discounts: similar savings.
- Azure Reserved Instances: similar.
Commit only to your true baseline (the load you’ll have regardless). Variable on-demand.
For most production: 60–80% of compute on commitments, 20–40% on-demand for flexibility.
Spot instances
For interruptible work (batch jobs, CI, async workers, training):
- EC2 Spot: up to 90% off.
- GCP Spot VMs / Preemptibles: similar.
- K8s with Karpenter: schedules pods on spot pools automatically.
Apps must handle SIGTERM (graceful shutdown). Otherwise interruptions cause failures.
Egress reduction
Egress is the silent killer:
- AWS: $0.09/GB out. 100 TB/month = $9k.
- Cloudflare: $0/GB. Bunny: $0.005/GB.
Tactics:
- CDN in front of everything user-facing — see Cloudflare Workers + D1 .
- Cross-region replication within free tiers; minimize cross-region traffic.
- VPC endpoints so internal traffic doesn’t egress through NAT.
- Compress responses (gzip/br).
Idle resource cleanup
Tag everything; sweep:
- Unattached EBS volumes.
- Snapshots from 2021.
- Test EC2s left running.
- NAT Gateways for VPCs nobody uses anymore.
- Idle ElastiCache / RDS dev clusters at night.
Tools: AWS Trusted Advisor, AWS Cost Explorer, Cloud Custodian, infracost in CI.
A monthly cleanup typically finds 5–15% waste.
Database tier-down
A db.r6g.4xlarge for dev / staging is excessive. Right-size by environment:
- Prod: real size.
- Staging: 1/2 of prod.
- Dev: smallest viable.
Use auto-shutdown for dev/staging during off-hours.
CI/CD costs
Build minutes add up. Caching (CI/CD Best Practices ) is the biggest lever. A 10-minute build dropping to 2 minutes cuts CI cost 80%.
Container right-sizing
Resource requests in K8s drive scheduling. Most teams set them once and forget.
- Use VPA (Vertical Pod Autoscaler) recommendations.
- Set
requeststo actual usage;limitshigher to allow bursts. - Monitor and adjust.
For Kubernetes specifics .
Observability cost
Datadog / New Relic / Splunk bills can match cloud bills.
- Sample traces (OTel). 10% sampling is plenty for most.
- Log retention: hot 14 days, cold 90 days, then S3 archive.
- High-cardinality metrics are expensive — drop unused dimensions.
For self-hosted alternatives see Observability 2.0 .
LLM costs
For AI-heavy apps:
- Prompt caching (90% off cached tokens).
- Model routing (see LLM Routing in 2026 ).
- Batching (50% off via batch APIs).
- Self-host above ~$30k/month.
See LLM Cost Optimization in 2026 .
FinOps culture
Tools alone don’t save money. Culture does:
- Per-team cost dashboards. Visible. Discussed.
- Architectural reviews include cost.
- PR template includes “estimated cost impact.”
- Quarterly waste-audit week.
Without culture, costs creep back. With it, savings stick.
Read this next
- Kubernetes in 2026
- LLM Cost Optimization in 2026
- LLM Routing in 2026 — Use Haiku to Save 80%
- Cloudflare Workers + D1 + Durable Objects
If you want my cost-audit checklist + monthly review template, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .