Chapter 10, the final chapter of the Postgres-focused textbook: replication, failover, pgvector. The pieces that make Postgres production-grade.

Read replicas

Two engines:

write_engine = create_async_engine(WRITER_URL, pool_size=20, pool_pre_ping=True)
read_engine = create_async_engine(
    READER_URL,
    pool_size=40,
    pool_pre_ping=True,
    connect_args={"server_settings": {"default_transaction_read_only": "on"}},
)

WriteSession = async_sessionmaker(write_engine, expire_on_commit=False)
ReadSession = async_sessionmaker(read_engine, expire_on_commit=False)

Read engine forces read-only at session start. Writes raise a clear error.

Routing

async def get_read_db() -> AsyncSession:
    async with ReadSession() as session:
        yield session

async def get_write_db() -> AsyncSession:
    async with WriteSession() as session:
        yield session

@app.get("/users")
async def list_users(db: AsyncSession = Depends(get_read_db)):
    return (await db.execute(select(User))).scalars().all()

@app.post("/users")
async def create_user(data: UserIn, db: AsyncSession = Depends(get_write_db)):
    user = User(**data.model_dump())
    db.add(user)
    await db.commit()

Per-handler choice. Reads → replica; writes → primary.

Read-after-write

User creates a user; immediately lists; might miss it (replica lag). Workarounds:

  • Route logged-in user’s reads-after-writes to primary (session sticky for a few seconds).
  • Causal token: client sends pg_current_wal_lsn(); readers wait until replica replays past it.
  • Optimistic UI: show locally; reconcile.

For most apps: route the writer’s own immediate reads to primary; everyone else is fine on replica.

Failover

When primary fails, replica is promoted. Connections to old primary fail.

engine = create_async_engine(URL, pool_pre_ping=True, pool_recycle=300)

pool_pre_ping ensures dead connections are replaced.

For DNS-based failover: pool eventually picks up the new IP. For IP-based: connections retry against new primary.

Multi-host strings

asyncpg supports:

postgresql://app@primary:5432,replica:5432/db?target_session_attrs=read-write

Connects to whichever is read-write. Handles failover automatically.

Patroni / Stolon

For HA orchestration: Patroni manages failover with consensus. App connects to a stable endpoint (HAProxy / pgcat). Behind the scenes, Patroni promotes replicas; updates routing.

pgvector — vector embeddings

from pgvector.sqlalchemy import Vector

class Document(Base):
    __tablename__ = "documents"
    id: Mapped[int] = mapped_column(primary_key=True)
    body: Mapped[str]
    embedding: Mapped[list[float]] = mapped_column(Vector(1024))
    
    __table_args__ = (
        Index("ix_docs_embedding", "embedding", postgresql_using="hnsw", postgresql_ops={"embedding": "vector_cosine_ops"}),
    )

Requires CREATE EXTENSION vector.

Vector queries

from sqlalchemy import func

q_embedding = await embed("query text")

stmt = (
    select(Document, Document.embedding.cosine_distance(q_embedding).label("dist"))
    .order_by("dist")
    .limit(10)
)

Distance operators: <-> (L2), <=> (cosine), <#> (inner product).

For cosine similarity (1 - cosine_distance):

.order_by(Document.embedding.cosine_distance(q_embedding))

HNSW vs IVFFlat

# HNSW: higher recall, more memory
postgresql_using="hnsw"
postgresql_ops={"embedding": "vector_cosine_ops"}

# IVFFlat: smaller, faster build, slightly less recall
postgresql_using="ivfflat"
postgresql_with={"lists": 100}

HNSW: production default in 2026.

Vector + filter (hybrid)

stmt = (
    select(Document)
    .where(Document.user_id == user_id)
    .order_by(Document.embedding.cosine_distance(q_embedding))
    .limit(10)
)

Filter then ANN. For high cardinality / strict filter: WHERE first; ANN is post.

For best perf: HNSW indexes don’t natively filter; Postgres does it via WHERE plan, which can be slow. Tune via ef_search / restructure as needed.

Embedding storage

Embeddings are large (1024 floats × 4 bytes = 4KB per row). For millions: significant storage.

Half-precision (vector(1024) with halfvec / float16): coming in pgvector. For now: float32.

Real-time embedding updates

When source text changes, re-embed and update:

@event.listens_for(Document, "before_update")
def update_embedding(mapper, connection, target):
    if target.body_changed:
        target.embedding = compute_embedding_sync(target.body)

Or async via outbox + worker (preferred; computation slow).

See also

Backup and PITR

pg_basebackup -h primary -U replicator -D /backup -Ft -X stream -z -P

Plus continuous WAL archiving for point-in-time recovery.

For managed: RDS, Cloud SQL, Aurora handle this.

What you’ve learned

Across this textbook:

  1. Driver setup (asyncpg, psycopg).
  2. Postgres types (JSONB, arrays, ranges, citext).
  3. Indexes (B-tree, GIN, GiST, BRIN).
  4. Full-text search.
  5. Locking, advisory locks, SKIP LOCKED.
  6. LISTEN / NOTIFY.
  7. Partitioning.
  8. RLS for multi-tenancy.
  9. Upsert and COPY.
  10. Replication, failover, pgvector.

For DB-agnostic SQLAlchemy fundamentals: the companion textbook .

For migrations: the Alembic textbook .

For Pydantic: the Pydantic v2 textbook .

For FastAPI: the FastAPI textbook .

If you want my full Postgres + SQLAlchemy + FastAPI production starter, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .