Chapter 3: index types in Postgres and how to declare them in SQLAlchemy. B-tree (default), GIN, GiST, BRIN, HASH, partial, expression, INCLUDE.
B-tree (default)
class User(Base):
email: Mapped[str] = mapped_column(unique=True, index=True)
Default: B-tree. Fast for =, <, >, <=, >=, ranges, sort orders.
For composite:
__table_args__ = (
Index("ix_users_active_email", "active", "email"),
)
Order matters: queries on active benefit; active + email benefits; email alone doesn’t.
GIN (Generalized Inverted)
For JSONB, arrays, full-text search, trigram:
__table_args__ = (
Index("ix_events_payload_gin", "payload", postgresql_using="gin"),
)
For JSONB containment specifically:
__table_args__ = (
Index("ix_events_payload_gin", "payload", postgresql_using="gin", postgresql_ops={"payload": "jsonb_path_ops"}),
)
jsonb_path_ops is faster / smaller than default for @> queries.
GIN with pg_trgm (substring search)
# Need extension: CREATE EXTENSION pg_trgm
__table_args__ = (
Index("ix_users_name_trgm", "name", postgresql_using="gin", postgresql_ops={"name": "gin_trgm_ops"}),
)
Now LIKE '%alice%' uses the index. Trigram-based fuzzy match.
GiST (Generalized Search Tree)
For ranges, geometry (PostGIS), exclusion constraints:
__table_args__ = (
Index("ix_bookings_period_gist", "period", postgresql_using="gist"),
)
Used by exclusion constraints to enforce non-overlapping ranges.
BRIN (Block Range Index)
For huge time-series tables with append-only writes:
__table_args__ = (
Index("ix_events_occurred_at_brin", "occurred_at", postgresql_using="brin"),
)
Tiny (megabytes for billions of rows); slower than B-tree but trivial maintenance.
Best when correlation between physical row order and indexed value is high (auto for append-only with timestamp).
HASH
Equality-only index:
__table_args__ = (
Index("ix_events_event_type_hash", "event_type", postgresql_using="hash"),
)
Pre-PG 10: not crash-safe. PG 10+: usable but rarely a clear win over B-tree.
Partial indexes
__table_args__ = (
Index("ix_users_active", "email", postgresql_where=text("active = true")),
)
Only indexes rows matching the predicate. Tiny; queries with the same predicate use it.
For “find active users by email”:
SELECT * FROM users WHERE active = true AND email = $1;
-- uses partial index
Excellent when most rows are not active (or are filtered).
Expression indexes
__table_args__ = (
Index("ix_users_lower_email", text("lower(email)")),
)
Indexes the expression. WHERE lower(email) = ... uses it.
For citext columns: not needed. For VARCHAR with case-insensitive search: very useful.
INCLUDE columns (covering indexes)
__table_args__ = (
Index("ix_orders_user_id", "user_id", postgresql_include=["status", "total"]),
)
status and total are stored in the index but not used for ordering. Index-only scans skip the heap.
For SELECT user_id, status, total WHERE user_id = $1: index-only scan. Faster.
CREATE INDEX CONCURRENTLY
def upgrade():
op.execute("CREATE INDEX CONCURRENTLY ix_users_email ON users (email)")
Doesn’t lock the table. Slower; doesn’t block writes. Use for hot tables.
Caveats:
- Can’t run inside a transaction. Configure Alembic accordingly.
- Failure leaves an invalid index; clean up before retry.
DROP INDEX CONCURRENTLY
def downgrade():
op.execute("DROP INDEX CONCURRENTLY ix_users_email")
Same idea for removal.
Reindexing
For bloat:
REINDEX INDEX CONCURRENTLY ix_users_email;
REINDEX TABLE CONCURRENTLY users;
CONCURRENTLY for online. Useful periodically for hot indexes.
Multi-column choice
# Query: WHERE a = ? AND b = ?
Index("ix_t_a_b", "a", "b")
vs
# Query: WHERE a = ? AND b = ? OR a = ?
# Two queries; two indexes
Index("ix_t_a", "a")
Index("ix_t_b", "b")
Composite for combined filters; separate for independent filters. Postgres can combine bitmaps from multiple indexes if needed.
Index size considerations
SELECT
schemaname, tablename, indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
Big tables = big indexes. Watch the ratio of index/table.
Unused indexes
SELECT
schemaname, tablename, indexname,
idx_scan AS scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0;
Indexes never scanned. Drop them; save write overhead + storage.
Order_by with NULLS FIRST/LAST
__table_args__ = (
Index("ix_users_id_nulls_last", text("id NULLS LAST")),
)
Match the ORDER BY direction in your query for best use.
Functional + partial together
__table_args__ = (
Index(
"ix_users_active_lower_email",
text("lower(email)"),
postgresql_where=text("active = true"),
),
)
Composable. Index only active users; on lowercased email.
Naming convention reminder
Naming convention chapter from the DB-agnostic textbook applies. Predictable names; Alembic-friendly.
Common mistakes
1. No index on FK
Foreign key without index → slow joins, slow cascade. Always add.
2. Index per column, no composite
Query filters by (a, b); two single-column indexes; planner combines but suboptimal. Composite often wins.
3. Over-indexing
Every column gets an index. Writes slow; storage bloats. Index by query patterns.
4. Big GIN on rarely-queried JSONB
GIN is expensive to build/maintain. Use targeted expression indexes.
5. CREATE INDEX without CONCURRENTLY in production
Locks the table. Always concurrently in production migrations.
What’s next
Chapter 4: Full-text search.
Read this next
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .