Chapter 10, the final chapter of the Postgres-focused textbook: replication, failover, pgvector. The pieces that make Postgres production-grade.
Read replicas
Two engines:
write_engine = create_async_engine(WRITER_URL, pool_size=20, pool_pre_ping=True)
read_engine = create_async_engine(
READER_URL,
pool_size=40,
pool_pre_ping=True,
connect_args={"server_settings": {"default_transaction_read_only": "on"}},
)
WriteSession = async_sessionmaker(write_engine, expire_on_commit=False)
ReadSession = async_sessionmaker(read_engine, expire_on_commit=False)
Read engine forces read-only at session start. Writes raise a clear error.
Routing
async def get_read_db() -> AsyncSession:
async with ReadSession() as session:
yield session
async def get_write_db() -> AsyncSession:
async with WriteSession() as session:
yield session
@app.get("/users")
async def list_users(db: AsyncSession = Depends(get_read_db)):
return (await db.execute(select(User))).scalars().all()
@app.post("/users")
async def create_user(data: UserIn, db: AsyncSession = Depends(get_write_db)):
user = User(**data.model_dump())
db.add(user)
await db.commit()
Per-handler choice. Reads → replica; writes → primary.
Read-after-write
User creates a user; immediately lists; might miss it (replica lag). Workarounds:
- Route logged-in user’s reads-after-writes to primary (session sticky for a few seconds).
- Causal token: client sends
pg_current_wal_lsn(); readers wait until replica replays past it. - Optimistic UI: show locally; reconcile.
For most apps: route the writer’s own immediate reads to primary; everyone else is fine on replica.
Failover
When primary fails, replica is promoted. Connections to old primary fail.
engine = create_async_engine(URL, pool_pre_ping=True, pool_recycle=300)
pool_pre_ping ensures dead connections are replaced.
For DNS-based failover: pool eventually picks up the new IP. For IP-based: connections retry against new primary.
Multi-host strings
asyncpg supports:
postgresql://app@primary:5432,replica:5432/db?target_session_attrs=read-write
Connects to whichever is read-write. Handles failover automatically.
Patroni / Stolon
For HA orchestration: Patroni manages failover with consensus. App connects to a stable endpoint (HAProxy / pgcat). Behind the scenes, Patroni promotes replicas; updates routing.
pgvector — vector embeddings
from pgvector.sqlalchemy import Vector
class Document(Base):
__tablename__ = "documents"
id: Mapped[int] = mapped_column(primary_key=True)
body: Mapped[str]
embedding: Mapped[list[float]] = mapped_column(Vector(1024))
__table_args__ = (
Index("ix_docs_embedding", "embedding", postgresql_using="hnsw", postgresql_ops={"embedding": "vector_cosine_ops"}),
)
Requires CREATE EXTENSION vector.
Vector queries
from sqlalchemy import func
q_embedding = await embed("query text")
stmt = (
select(Document, Document.embedding.cosine_distance(q_embedding).label("dist"))
.order_by("dist")
.limit(10)
)
Distance operators: <-> (L2), <=> (cosine), <#> (inner product).
For cosine similarity (1 - cosine_distance):
.order_by(Document.embedding.cosine_distance(q_embedding))
HNSW vs IVFFlat
# HNSW: higher recall, more memory
postgresql_using="hnsw"
postgresql_ops={"embedding": "vector_cosine_ops"}
# IVFFlat: smaller, faster build, slightly less recall
postgresql_using="ivfflat"
postgresql_with={"lists": 100}
HNSW: production default in 2026.
Vector + filter (hybrid)
stmt = (
select(Document)
.where(Document.user_id == user_id)
.order_by(Document.embedding.cosine_distance(q_embedding))
.limit(10)
)
Filter then ANN. For high cardinality / strict filter: WHERE first; ANN is post.
For best perf: HNSW indexes don’t natively filter; Postgres does it via WHERE plan, which can be slow. Tune via ef_search / restructure as needed.
Embedding storage
Embeddings are large (1024 floats × 4 bytes = 4KB per row). For millions: significant storage.
Half-precision (vector(1024) with halfvec / float16): coming in pgvector. For now: float32.
Real-time embedding updates
When source text changes, re-embed and update:
@event.listens_for(Document, "before_update")
def update_embedding(mapper, connection, target):
if target.body_changed:
target.embedding = compute_embedding_sync(target.body)
Or async via outbox + worker (preferred; computation slow).
See also
Backup and PITR
pg_basebackup -h primary -U replicator -D /backup -Ft -X stream -z -P
Plus continuous WAL archiving for point-in-time recovery.
For managed: RDS, Cloud SQL, Aurora handle this.
What you’ve learned
Across this textbook:
- Driver setup (asyncpg, psycopg).
- Postgres types (JSONB, arrays, ranges, citext).
- Indexes (B-tree, GIN, GiST, BRIN).
- Full-text search.
- Locking, advisory locks, SKIP LOCKED.
- LISTEN / NOTIFY.
- Partitioning.
- RLS for multi-tenancy.
- Upsert and COPY.
- Replication, failover, pgvector.
For DB-agnostic SQLAlchemy fundamentals: the companion textbook .
For migrations: the Alembic textbook .
For Pydantic: the Pydantic v2 textbook .
For FastAPI: the FastAPI textbook .
If you want my full Postgres + SQLAlchemy + FastAPI production starter, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .