Cheatsheet for running Alembic in production.
Deploy strategy
K8s Job (recommended)
Pre-deploy Job applies migrations; app pods follow.
apiVersion: batch/v1
kind: Job
metadata:
name: migrate-{{ .Values.image.tag }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
spec:
template:
spec:
containers:
- name: migrate
image: myapp:{{ .Values.image.tag }}
command: ["alembic", "upgrade", "head"]
restartPolicy: OnFailure
Idempotent; Alembic skips already-applied.
App startup
@asynccontextmanager
async def lifespan(app):
cfg = Config("alembic.ini")
command.upgrade(cfg, "head")
yield
Works for single replica. For multi-replica: race conditions.
App startup with advisory lock
async def run_migrations():
async with engine.connect() as conn:
await conn.execute(text("SELECT pg_advisory_xact_lock(99999)"))
await asyncio.to_thread(_apply_migrations)
@asynccontextmanager
async def lifespan(app):
await run_migrations()
yield
Only one replica acquires the lock; others wait. Cheap; works.
Migrations that take hours
For really long migrations:
1. Low-traffic window.
2. Standalone Job (not in deploy critical path).
3. Monitor replica lag.
4. Have abort plan.
For non-blocking ops (CONCURRENTLY index): can run during normal hours.
Forward-only in prod
Don’t alembic downgrade -1 in prod. Roll forward with a corrective migration.
For rollback: write a forward migration that undoes the change. Apply like any other.
Stamp (mark as applied without running)
alembic stamp head # set current = head
alembic stamp <rev> # specific
Use cases:
- Bootstrapping from
Base.metadata.create_all(). - Recovering after manual schema fixes.
- Skipping a migration applied out-of-band.
Recovering from inconsistent state
Migration partially applied; alembic_version not updated:
# Check state
psql -c "SELECT version_num FROM alembic_version"
psql -c "\d users" # check actual schema
# If partial: clean up manually
psql -c "DROP INDEX IF EXISTS partial_index_left_over"
# Stamp the right version
alembic stamp <rev>
Backup before risky migrations
pg_dump -h prod -d mydb -Fc > pre-migration-$(date +%Y%m%d-%H%M%S).dump
Or use snapshot (managed DB).
Replica lag during migration
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_replay_lsn()) FROM pg_stat_replication;
Big spike → consider pausing or splitting the migration.
Migration testing in staging
Stage with prod-shaped data:
pg_dump prod > prod-snap.sql
psql -h staging -c "DROP DATABASE app; CREATE DATABASE app"
psql -h staging -d app < prod-snap.sql
# Run new migrations
alembic upgrade head
# Time + verify
Catches “this migration is fast on dev but takes hours on real data.”
Multi-app coordinate
For monorepos / multi-app:
- Each app has its own alembic.ini and migration tree.
- Shared schema modules require shared migration discipline.
Production checklist
- Tested in staging with prod-shaped data.
- Estimated runtime documented.
- Backup available.
- Replica lag plan.
- Rollback strategy (forward).
- Comm to ops if long.
- Migration runs as K8s Job (not multi-replica app startup).
- Compatible with running app version (expand-contract).
- Monitoring active.
Common mistakes
- App startup migration with N replicas → race conditions.
VACUUM FULLorCREATE INDEXwithout CONCURRENTLY on hot tables → outage.- Stamping without understanding consequences → DB diverges from history.
- Editing already-applied migration files → deployed instances diverge.
Read this next
If you want my K8s migration Job + advisory-lock pattern, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .