Database migrations are where many teams trip. The framework runs the SQL; the discipline of writing safe SQL is on you. This post is the working playbook.
The expand-contract pattern
For breaking schema changes, never one-shot it. Two phases:
Phase 1 (expand):
- Add new column / table / index.
- Backfill from old.
- Code writes to BOTH old and new.
- Code reads from new (with fallback to old).
Phase 2 (contract):
- Code reads/writes new only.
- Drop old column / table / index.
Each phase is its own deploy. Between phases, the system is fully functional. If something goes wrong: stop and roll back.
Example: rename a column
Bad: ALTER TABLE users RENAME COLUMN email_addr TO email; — breaks all in-flight code.
Good:
-- Migration 1
ALTER TABLE users ADD COLUMN email text;
UPDATE users SET email = email_addr; -- (in batches; see below)
-- Code now writes both, reads `email`
-- Wait, observe.
-- Migration 2 (after deploy stabilized)
ALTER TABLE users DROP COLUMN email_addr;
Two deploys. Each safe. No downtime.
Adding a NOT NULL column
Bad: ALTER TABLE users ADD COLUMN status text NOT NULL DEFAULT 'active'; — locks the table on big tables (writes all rows in one tx).
Postgres ≥ 11: DEFAULT doesn’t rewrite the table; only future inserts get the default. So ADD COLUMN ... DEFAULT 'active' NOT NULL is fast.
Otherwise:
-- 1. Add nullable
ALTER TABLE users ADD COLUMN status text;
-- 2. Backfill in batches (see below)
-- 3. Add NOT NULL after backfill
ALTER TABLE users ALTER COLUMN status SET NOT NULL;
Backfill in batches
-- BAD: locks for hours
UPDATE users SET status = 'active';
-- GOOD: batch
DO $$
DECLARE batch_size int := 5000;
DECLARE last_id bigint := 0;
DECLARE updated int;
BEGIN
LOOP
UPDATE users
SET status = 'active'
WHERE id IN (
SELECT id FROM users WHERE id > last_id AND status IS NULL
ORDER BY id LIMIT batch_size
)
RETURNING id INTO last_id;
GET DIAGNOSTICS updated = ROW_COUNT;
EXIT WHEN updated = 0;
PERFORM pg_sleep(0.1); -- breathing room
END LOOP;
END $$;
Or do it from the application:
async def backfill():
while True:
rows = await db.fetch(
"SELECT id FROM users WHERE status IS NULL ORDER BY id LIMIT 5000"
)
if not rows: break
await db.execute(
"UPDATE users SET status = 'active' WHERE id = ANY($1)",
[r["id"] for r in rows]
)
await asyncio.sleep(0.5)
Indexes — concurrently
-- Locks the table:
CREATE INDEX idx ON huge_table (col);
-- Doesn't lock:
CREATE INDEX CONCURRENTLY idx ON huge_table (col);
Always CONCURRENTLY on production. Slower; doesn’t block writes.
For removing: DROP INDEX CONCURRENTLY.
Foreign keys
-- Locks for validation:
ALTER TABLE orders ADD CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id);
-- Two-step (safer):
ALTER TABLE orders ADD CONSTRAINT fk_user FOREIGN KEY (user_id)
REFERENCES users(id) NOT VALID;
ALTER TABLE orders VALIDATE CONSTRAINT fk_user;
NOT VALID adds the constraint without checking existing rows; VALIDATE checks them with a less-blocking lock.
Big table rebuilds (pg_repack)
For reclaiming bloat, changing primary key, partitioning a non-partitioned table:
pg_repack -h db -U user -d mydb -t huge_table
pg_repack rebuilds the table online with minimal locking. Use for:
- Bloat from heavy UPDATE / DELETE.
- Compaction after column drops.
- Adding partitioning to existing tables (pre-pg_repack 1.5).
Requires the extension installed; takes time + temporary 2x disk.
Migration tools
| Lang | Notes | |
|---|---|---|
| Alembic | Python | SQLAlchemy ecosystem |
| Flyway | JVM | Mature, widely used |
| Liquibase | JVM | XML/YAML; structured |
| golang-migrate | Go | Simple |
| sqlx migrate | Rust | If using SQLx |
| Diesel | Rust | Diesel ecosystem |
| Drizzle Kit | TS | Modern; auto-gen from schema |
| Prisma Migrate | TS/JS | Prisma ecosystem |
All track migrations + versions; let you up/down. Pick the one that matches your stack.
Forward-only
In production, roll forward only. Down migrations are useful in dev; in prod, they often can’t faithfully reverse a migration that touched data.
For “rollback”: write a forward migration that reverses what you did, applied like any other migration. Treat it as new work, not “undo.”
Migrations in CI
- name: Apply migrations to test DB
run: alembic upgrade head
- name: Run tests
run: pytest
Catch broken migrations before merge.
Migrations on deploy
# In container start command:
alembic upgrade head && exec uvicorn app:app
Or run as a Job before the rolling deploy:
# Kubernetes Job that runs migrations
apiVersion: batch/v1
kind: Job
metadata: { name: migrate-{{ .Values.image.tag }} }
spec:
template:
spec:
containers:
- name: migrate
image: myapp:{{ .Values.image.tag }}
command: ["alembic", "upgrade", "head"]
restartPolicy: OnFailure
Pre-deploy: run migrations. Then roll out new pods.
Online schema change tools (MySQL)
- gh-ost (GitHub) — popular for MySQL.
- pt-online-schema-change (Percona).
For Postgres: pg_repack + concurrent indexes cover most cases.
Dependent code changes
Coordinate code and schema:
Deploy 1: schema migration (expand) + code that writes to old AND new.
Deploy 2: code that reads new only.
Deploy 3: schema migration (contract).
Don’t combine schema migration + code change that depends on it in one deploy.
Common mistakes
1. Migration runs in deploy with code that depends on it
Migration ran on instance A; instance B is on old code; code uses old schema; B errors. Always expand-contract.
2. Big UPDATEs in one statement
Hours of lock; replicas lag; everything fails. Batch.
3. CREATE INDEX without CONCURRENTLY
Production downtime. Always concurrently.
4. Adding NOT NULL to existing column
ALTER TABLE users ALTER COLUMN x SET NOT NULL; -- can take forever
Backfill first; then SET NOT NULL.
5. No migration testing
Migration has a bug; finds out in prod. Run migrations against a copy of prod data in CI.
What I’d ship today
For DB-heavy apps:
- Alembic / Flyway as the migration tool.
- Expand-contract discipline.
- CONCURRENTLY for indexes.
- Batched backfills.
- Migrations as Jobs in K8s.
- Forward-only in production.
- Production-data-shaped staging for testing.
Read this next
- Postgres Performance Tuning 2026
- Postgres Locking 2026
- Postgres Replication Topologies 2026
- Zero-Downtime Deployments
If you want my migration playbook + Alembic templates, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .