Cloud bills compound. A team that started at $5k/month is at $50k/month in 18 months without doing anything wrong. The cost-optimization playbook isn’t exotic — it’s a few boring tactics applied consistently. This post is the working set.

Right-sizing (the biggest lever)

Most teams over-provision. Run for 2 weeks. Look at peak utilization. If CPU averages 15%, you have 2–4× too much.

EC2 / Cloud Run / GKE: look at actual CPU + memory.
RDS: connections + CPU + memory.
ElastiCache: memory + CPU.

Tools: AWS Compute Optimizer, GCP Recommender, K8s VPA recommendations.

A right-sizing pass typically cuts 30%+ of compute spend.

Savings plans / reserved instances

For predictable steady-state workloads:

  • AWS Savings Plans (Compute): 1-year commit → 27% off; 3-year → 54% off.
  • GCP Committed Use Discounts: similar savings.
  • Azure Reserved Instances: similar.

Commit only to your true baseline (the load you’ll have regardless). Variable on-demand.

For most production: 60–80% of compute on commitments, 20–40% on-demand for flexibility.

Spot instances

For interruptible work (batch jobs, CI, async workers, training):

  • EC2 Spot: up to 90% off.
  • GCP Spot VMs / Preemptibles: similar.
  • K8s with Karpenter: schedules pods on spot pools automatically.

Apps must handle SIGTERM (graceful shutdown). Otherwise interruptions cause failures.

Egress reduction

Egress is the silent killer:

  • AWS: $0.09/GB out. 100 TB/month = $9k.
  • Cloudflare: $0/GB. Bunny: $0.005/GB.

Tactics:

  • CDN in front of everything user-facing — see Cloudflare Workers + D1 .
  • Cross-region replication within free tiers; minimize cross-region traffic.
  • VPC endpoints so internal traffic doesn’t egress through NAT.
  • Compress responses (gzip/br).

Idle resource cleanup

Tag everything; sweep:

  • Unattached EBS volumes.
  • Snapshots from 2021.
  • Test EC2s left running.
  • NAT Gateways for VPCs nobody uses anymore.
  • Idle ElastiCache / RDS dev clusters at night.

Tools: AWS Trusted Advisor, AWS Cost Explorer, Cloud Custodian, infracost in CI.

A monthly cleanup typically finds 5–15% waste.

Database tier-down

A db.r6g.4xlarge for dev / staging is excessive. Right-size by environment:

  • Prod: real size.
  • Staging: 1/2 of prod.
  • Dev: smallest viable.

Use auto-shutdown for dev/staging during off-hours.

CI/CD costs

Build minutes add up. Caching (CI/CD Best Practices ) is the biggest lever. A 10-minute build dropping to 2 minutes cuts CI cost 80%.

Container right-sizing

Resource requests in K8s drive scheduling. Most teams set them once and forget.

  • Use VPA (Vertical Pod Autoscaler) recommendations.
  • Set requests to actual usage; limits higher to allow bursts.
  • Monitor and adjust.

For Kubernetes specifics .

Observability cost

Datadog / New Relic / Splunk bills can match cloud bills.

  • Sample traces (OTel). 10% sampling is plenty for most.
  • Log retention: hot 14 days, cold 90 days, then S3 archive.
  • High-cardinality metrics are expensive — drop unused dimensions.

For self-hosted alternatives see Observability 2.0 .

LLM costs

For AI-heavy apps:

  • Prompt caching (90% off cached tokens).
  • Model routing (see LLM Routing in 2026 ).
  • Batching (50% off via batch APIs).
  • Self-host above ~$30k/month.

See LLM Cost Optimization in 2026 .

FinOps culture

Tools alone don’t save money. Culture does:

  • Per-team cost dashboards. Visible. Discussed.
  • Architectural reviews include cost.
  • PR template includes “estimated cost impact.”
  • Quarterly waste-audit week.

Without culture, costs creep back. With it, savings stick.

Read this next

If you want my cost-audit checklist + monthly review template, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .