Chapter 3: index types in Postgres and how to declare them in SQLAlchemy. B-tree (default), GIN, GiST, BRIN, HASH, partial, expression, INCLUDE.

B-tree (default)

class User(Base):
    email: Mapped[str] = mapped_column(unique=True, index=True)

Default: B-tree. Fast for =, <, >, <=, >=, ranges, sort orders.

For composite:

__table_args__ = (
    Index("ix_users_active_email", "active", "email"),
)

Order matters: queries on active benefit; active + email benefits; email alone doesn’t.

GIN (Generalized Inverted)

For JSONB, arrays, full-text search, trigram:

__table_args__ = (
    Index("ix_events_payload_gin", "payload", postgresql_using="gin"),
)

For JSONB containment specifically:

__table_args__ = (
    Index("ix_events_payload_gin", "payload", postgresql_using="gin", postgresql_ops={"payload": "jsonb_path_ops"}),
)

jsonb_path_ops is faster / smaller than default for @> queries.

# Need extension: CREATE EXTENSION pg_trgm
__table_args__ = (
    Index("ix_users_name_trgm", "name", postgresql_using="gin", postgresql_ops={"name": "gin_trgm_ops"}),
)

Now LIKE '%alice%' uses the index. Trigram-based fuzzy match.

GiST (Generalized Search Tree)

For ranges, geometry (PostGIS), exclusion constraints:

__table_args__ = (
    Index("ix_bookings_period_gist", "period", postgresql_using="gist"),
)

Used by exclusion constraints to enforce non-overlapping ranges.

BRIN (Block Range Index)

For huge time-series tables with append-only writes:

__table_args__ = (
    Index("ix_events_occurred_at_brin", "occurred_at", postgresql_using="brin"),
)

Tiny (megabytes for billions of rows); slower than B-tree but trivial maintenance.

Best when correlation between physical row order and indexed value is high (auto for append-only with timestamp).

HASH

Equality-only index:

__table_args__ = (
    Index("ix_events_event_type_hash", "event_type", postgresql_using="hash"),
)

Pre-PG 10: not crash-safe. PG 10+: usable but rarely a clear win over B-tree.

Partial indexes

__table_args__ = (
    Index("ix_users_active", "email", postgresql_where=text("active = true")),
)

Only indexes rows matching the predicate. Tiny; queries with the same predicate use it.

For “find active users by email”:

SELECT * FROM users WHERE active = true AND email = $1;
-- uses partial index

Excellent when most rows are not active (or are filtered).

Expression indexes

__table_args__ = (
    Index("ix_users_lower_email", text("lower(email)")),
)

Indexes the expression. WHERE lower(email) = ... uses it.

For citext columns: not needed. For VARCHAR with case-insensitive search: very useful.

INCLUDE columns (covering indexes)

__table_args__ = (
    Index("ix_orders_user_id", "user_id", postgresql_include=["status", "total"]),
)

status and total are stored in the index but not used for ordering. Index-only scans skip the heap.

For SELECT user_id, status, total WHERE user_id = $1: index-only scan. Faster.

CREATE INDEX CONCURRENTLY

def upgrade():
    op.execute("CREATE INDEX CONCURRENTLY ix_users_email ON users (email)")

Doesn’t lock the table. Slower; doesn’t block writes. Use for hot tables.

Caveats:

  • Can’t run inside a transaction. Configure Alembic accordingly.
  • Failure leaves an invalid index; clean up before retry.

DROP INDEX CONCURRENTLY

def downgrade():
    op.execute("DROP INDEX CONCURRENTLY ix_users_email")

Same idea for removal.

Reindexing

For bloat:

REINDEX INDEX CONCURRENTLY ix_users_email;
REINDEX TABLE CONCURRENTLY users;

CONCURRENTLY for online. Useful periodically for hot indexes.

Multi-column choice

# Query: WHERE a = ? AND b = ?
Index("ix_t_a_b", "a", "b")

vs

# Query: WHERE a = ? AND b = ? OR a = ?
# Two queries; two indexes
Index("ix_t_a", "a")
Index("ix_t_b", "b")

Composite for combined filters; separate for independent filters. Postgres can combine bitmaps from multiple indexes if needed.

Index size considerations

SELECT
    schemaname, tablename, indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;

Big tables = big indexes. Watch the ratio of index/table.

Unused indexes

SELECT
    schemaname, tablename, indexname,
    idx_scan AS scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

Indexes never scanned. Drop them; save write overhead + storage.

Order_by with NULLS FIRST/LAST

__table_args__ = (
    Index("ix_users_id_nulls_last", text("id NULLS LAST")),
)

Match the ORDER BY direction in your query for best use.

Functional + partial together

__table_args__ = (
    Index(
        "ix_users_active_lower_email",
        text("lower(email)"),
        postgresql_where=text("active = true"),
    ),
)

Composable. Index only active users; on lowercased email.

Naming convention reminder

Naming convention chapter from the DB-agnostic textbook applies. Predictable names; Alembic-friendly.

Common mistakes

1. No index on FK

Foreign key without index → slow joins, slow cascade. Always add.

2. Index per column, no composite

Query filters by (a, b); two single-column indexes; planner combines but suboptimal. Composite often wins.

3. Over-indexing

Every column gets an index. Writes slow; storage bloats. Index by query patterns.

4. Big GIN on rarely-queried JSONB

GIN is expensive to build/maintain. Use targeted expression indexes.

5. CREATE INDEX without CONCURRENTLY in production

Locks the table. Always concurrently in production migrations.

What’s next

Chapter 4: Full-text search.

Read this next


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .