Infrastructure as code is mandatory in 2026; doing it badly is worse than not doing it. This post is the working set of practices.

Repo structure

infra/
├── modules/                 # reusable building blocks
│   ├── postgres/
│   ├── kubernetes-cluster/
│   └── service/
├── environments/
│   ├── dev/
│   │   └── main.tf          # uses modules
│   ├── staging/
│   │   └── main.tf
│   └── prod/
│       └── main.tf
└── README.md

Each environment has its own state. Modules are versioned; environments pin module versions.

Remote state with locking

# Terraform / OpenTofu
terraform {
  backend "s3" {
    bucket         = "my-tfstate"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "tfstate-locks"
    encrypt        = true
  }
}

S3 + DynamoDB lock prevents concurrent applies. Never let two engineers run apply on prod at the same time.

For Pulumi: Pulumi Cloud or self-hosted backend with similar guarantees.

Module design

# modules/postgres/main.tf
variable "name" { type = string }
variable "instance_class" { type = string default = "db.t4g.micro" }
variable "allocated_storage" { type = number default = 20 }
variable "tags" { type = map(string) default = {} }

resource "aws_db_instance" "this" {
  identifier        = var.name
  instance_class    = var.instance_class
  allocated_storage = var.allocated_storage
  ...
  tags = merge(var.tags, { Module = "postgres" })
}

output "endpoint" { value = aws_db_instance.this.endpoint }

Sensible defaults, overridable. Outputs explicit. Versioned.

Tagging discipline

Every resource tagged:

default_tags = {
  Owner       = "team-data"
  Environment = "prod"
  ManagedBy   = "terraform"
  CostCenter  = "engineering-data"
  Repo        = "infra"
}

Tags drive cost reports, ownership lookups, automated cleanup. Without tags, you can’t answer “whose is this?”

Code review

Treat IaC like code:

  • PRs reviewed.
  • plan output included.
  • No direct apply to prod outside CI.

For GitOps -style: PR triggers plan; merge triggers apply.

CI pipeline

on:
  pull_request:
    paths: ["infra/**"]

jobs:
  plan:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        env: [dev, staging, prod]
    steps:
      - uses: actions/checkout@v4
      - uses: opentofu/setup-opentofu@v1
      - run: tofu init
        working-directory: infra/environments/${{ matrix.env }}
      - run: tofu plan -out=plan.bin
        working-directory: infra/environments/${{ matrix.env }}
      - uses: actions/upload-artifact@v4
        with: { path: infra/environments/${{ matrix.env }}/plan.bin }

  apply:
    if: github.ref == 'refs/heads/main'
    needs: [plan]
    environment: production       # required-reviewer gate
    runs-on: ubuntu-latest
    steps:
      - run: tofu apply plan.bin

Plan on PR; apply on main only with manual approval for prod.

Drift detection

Schedule weekly plan; alert on drift:

on:
  schedule:
    - cron: "0 8 * * 1"      # Monday 8am

jobs:
  drift:
    runs-on: ubuntu-latest
    steps:
      - run: tofu plan -detailed-exitcode
        # exit 0 = no diff; 2 = diff (drift); 1 = error

Drift is usually someone clicking in the console. Investigate; either fix the IaC or revert the manual change.

Secrets, never in IaC

Don’t put secret values in *.tf. Pull them at apply time from a secret manager:

data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "prod/db/password"
}

resource "aws_db_instance" "this" {
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}

For secrets management at scale.

Cost guardrails

Use Infracost in CI to flag cost-changing PRs:

- uses: infracost/actions/setup@v3
- run: infracost diff --path=infra/environments/prod --compare-to=infracost-base.json

Flags “this PR adds $5k/month” before merging. Catches accidents.

Common mistakes

1. One big state file

Everything in one state. Apply takes hours. One bad change blocks all others. Split by environment, by domain.

2. No code review on IaC

PRs ship to prod with no second pair of eyes. Disasters.

3. Hand-rolled cloud changes

“Just this once” — six months later, IaC and reality drift, deploys fail, mystery bugs. Always through IaC.

4. No tagging

Cost reports useless. Resource ownership unclear.

5. Silently mutable defaults

A module’s default changes; existing environments silently get the new behavior. Pin module versions.

What I’d ship today

For a new IaC setup:

  • OpenTofu (Pulumi vs Terraform vs OpenTofu ).
  • Modular structure with versioned modules.
  • Per-environment state, S3 + DynamoDB locking.
  • Atlantis or GitHub Actions for plan/apply automation.
  • Tags everywhere.
  • Infracost in CI.
  • Weekly drift detection.
  • Secrets via AWS Secrets Manager / Vault.

Read this next

If you want my OpenTofu module library + CI templates, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .