Cheatsheet for running Alembic in production.

Deploy strategy

Pre-deploy Job applies migrations; app pods follow.

apiVersion: batch/v1
kind: Job
metadata:
  name: migrate-{{ .Values.image.tag }}
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp:{{ .Values.image.tag }}
          command: ["alembic", "upgrade", "head"]
      restartPolicy: OnFailure

Idempotent; Alembic skips already-applied.

App startup

@asynccontextmanager
async def lifespan(app):
    cfg = Config("alembic.ini")
    command.upgrade(cfg, "head")
    yield

Works for single replica. For multi-replica: race conditions.

App startup with advisory lock

async def run_migrations():
    async with engine.connect() as conn:
        await conn.execute(text("SELECT pg_advisory_xact_lock(99999)"))
        await asyncio.to_thread(_apply_migrations)

@asynccontextmanager
async def lifespan(app):
    await run_migrations()
    yield

Only one replica acquires the lock; others wait. Cheap; works.

Migrations that take hours

For really long migrations:

1. Low-traffic window.
2. Standalone Job (not in deploy critical path).
3. Monitor replica lag.
4. Have abort plan.

For non-blocking ops (CONCURRENTLY index): can run during normal hours.

Forward-only in prod

Don’t alembic downgrade -1 in prod. Roll forward with a corrective migration.

For rollback: write a forward migration that undoes the change. Apply like any other.

Stamp (mark as applied without running)

alembic stamp head             # set current = head
alembic stamp <rev>            # specific

Use cases:

  • Bootstrapping from Base.metadata.create_all().
  • Recovering after manual schema fixes.
  • Skipping a migration applied out-of-band.

Recovering from inconsistent state

Migration partially applied; alembic_version not updated:

# Check state
psql -c "SELECT version_num FROM alembic_version"
psql -c "\d users"          # check actual schema

# If partial: clean up manually
psql -c "DROP INDEX IF EXISTS partial_index_left_over"

# Stamp the right version
alembic stamp <rev>

Backup before risky migrations

pg_dump -h prod -d mydb -Fc > pre-migration-$(date +%Y%m%d-%H%M%S).dump

Or use snapshot (managed DB).

Replica lag during migration

SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_replay_lsn()) FROM pg_stat_replication;

Big spike → consider pausing or splitting the migration.

Migration testing in staging

Stage with prod-shaped data:

pg_dump prod > prod-snap.sql
psql -h staging -c "DROP DATABASE app; CREATE DATABASE app"
psql -h staging -d app < prod-snap.sql

# Run new migrations
alembic upgrade head
# Time + verify

Catches “this migration is fast on dev but takes hours on real data.”

Multi-app coordinate

For monorepos / multi-app:

  • Each app has its own alembic.ini and migration tree.
  • Shared schema modules require shared migration discipline.

Production checklist

  • Tested in staging with prod-shaped data.
  • Estimated runtime documented.
  • Backup available.
  • Replica lag plan.
  • Rollback strategy (forward).
  • Comm to ops if long.
  • Migration runs as K8s Job (not multi-replica app startup).
  • Compatible with running app version (expand-contract).
  • Monitoring active.

Common mistakes

  • App startup migration with N replicas → race conditions.
  • VACUUM FULL or CREATE INDEX without CONCURRENTLY on hot tables → outage.
  • Stamping without understanding consequences → DB diverges from history.
  • Editing already-applied migration files → deployed instances diverge.

Read this next

If you want my K8s migration Job + advisory-lock pattern, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .