Cheatsheet for non-blocking schema changes.

Expand-contract pattern

For renames, type changes, anything risky:

Phase 1 (expand): add new column; backfill; code writes both.
Phase 2 (contract): code reads/writes new only; drop old.

Each phase is its own migration + deploy.

Example: rename column

# Migration 1 (expand)
def upgrade():
    op.add_column("users", sa.Column("email_address", sa.String(255), nullable=True))
    op.execute("UPDATE users SET email_address = email")
    # Then deploy code that writes both columns

# Migration 2 (contract; after code stabilized)
def upgrade():
    op.alter_column("users", "email_address", nullable=False)
    op.drop_column("users", "email")
    # Then deploy code that uses email_address only

CONCURRENTLY (Postgres)

def upgrade():
    op.execute("CREATE INDEX CONCURRENTLY ix_users_email ON users (email)")

def downgrade():
    op.execute("DROP INDEX CONCURRENTLY ix_users_email")

Doesn’t lock the table. Slow but non-blocking.

Config (can’t run in transaction):

# alembic.ini
[alembic]
transaction_per_migration = false

Or per-migration:

def upgrade():
    op.execute("COMMIT")
    op.execute("CREATE INDEX CONCURRENTLY ...")
    op.execute("BEGIN")     # restart for rest

Failed CONCURRENTLY index = INVALID; drop manually before retry.

NOT VALID FK (Postgres)

def upgrade():
    op.create_foreign_key(
        "fk_posts_user", "posts", "users", ["user_id"], ["id"],
        postgresql_not_valid=True,
    )
    # Later (separate migration or one-off):
    op.execute("ALTER TABLE posts VALIDATE CONSTRAINT fk_posts_user")

NOT VALID adds without checking existing rows (cheap). VALIDATE CONSTRAINT checks later with less locking.

NOT NULL on existing column

Bad:

op.alter_column("users", "status", nullable=False)    # full table scan; locks

Better (PG 11+):

# Add nullable with default
op.add_column("users", sa.Column("status", sa.String, nullable=False, server_default="active"))

Or 3-phase:

# Phase 1: add nullable
op.add_column("users", sa.Column("status", sa.String(20), nullable=True))

# Phase 2: backfill (in batches; outside Alembic for big tables)
op.execute("UPDATE users SET status = 'active' WHERE status IS NULL")

# Phase 3: set NOT NULL
op.alter_column("users", "status", nullable=False)

Batched backfill (outside Alembic for huge tables)

async def backfill():
    while True:
        async with AsyncSessionLocal() as s:
            result = await s.execute(
                text("""
                    UPDATE users SET status = 'active'
                    WHERE id IN (
                        SELECT id FROM users WHERE status IS NULL ORDER BY id LIMIT 5000
                    )
                """)
            )
            await s.commit()
            if result.rowcount == 0: break
        await asyncio.sleep(0.1)

Small per-batch locks; replicas don’t lag.

Column type change (without rewriting)

# Bad: rewrites table
op.alter_column("users", "id", type_=sa.BigInteger)

# Better: multi-step
# 1. Add new column
op.add_column("users", sa.Column("id_new", sa.BigInteger))
# 2. Backfill
# 3. Add new UNIQUE / index
# 4. Switch FKs to point to id_new
# 5. Drop old; rename

Multi-hour orchestration. Worth it for big tables.

DROP COLUMN

op.drop_column("users", "old_field")

Drops instantly (marks as deleted; vacuum reclaims). But: deploy code that doesn’t reference old_field FIRST.

Order of operations

For app + DB changes:

  1. Deploy app that reads/writes BOTH old and new (or both states).
  2. Run migration that adds new structure.
  3. Backfill in batches.
  4. Deploy app that uses new only.
  5. Run migration to drop old structure.

Replica lag

Big migrations cause replica lag. Monitor:

SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), pg_last_wal_replay_lsn()) FROM pg_stat_replication;

Pause if too far behind.

pg_repack for table rewrite

For deep changes (PK type, partitioning):

pg_repack -h prod -U app -d mydb -t huge_table

Online; rebuilds table. Not Alembic; run separately.

Reversibility

Production: forward-only. Don’t downgrade in prod.

If something goes wrong: write a corrective forward migration.

Common mistakes

  • One-shot rename / type change on big table — locks for hours.
  • CONCURRENTLY without transaction_per_migration = false.
  • NOT NULL without backfill — migration fails on existing rows.
  • Mid-flight code referencing the old shape — failed requests during deploy.

Read this next

If you want my expand-contract template + backfill helper, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .