SQLAlchemy + Postgres Cheatsheet 17 — pg_stat_statements and Slow Queries

Cheatsheet for finding slow queries.

pg_stat_statements (top-N by time)

Enable:

-- postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

CREATE EXTENSION pg_stat_statements;

Query:

SELECT
    substring(query, 1, 100) AS query,
    calls,
    total_exec_time / 1000 AS total_sec,
    mean_exec_time AS mean_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

The 80/20 of total DB time. Targets for optimization.

Reset

SELECT pg_stat_statements_reset();

Reset after deploying a fix; re-measure.

auto_explain

-- postgresql.conf
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '500ms'
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_format = 'json'

Queries > 500ms get full EXPLAIN ANALYZE in the log. Investigate.

log_min_duration_statement

ALTER SYSTEM SET log_min_duration_statement = '500ms';
SELECT pg_reload_conf();

Logs all queries > 500ms with their SQL.

EXPLAIN

explain = await session.execute(
    text("EXPLAIN (ANALYZE, BUFFERS) " + str(stmt.compile(compile_kwargs={"literal_binds": True})))
)
for row in explain:
    print(row[0])

ANALYZE runs the query; BUFFERS shows IO.

Slow query log via SQLAlchemy events

import time
from sqlalchemy import event

@event.listens_for(engine.sync_engine, "before_cursor_execute")
def before(conn, cur, stmt, params, ctx, executemany):
    ctx._t = time.time()

@event.listens_for(engine.sync_engine, "after_cursor_execute")
def after(conn, cur, stmt, params, ctx, executemany):
    dur = time.time() - ctx._t
    if dur > 0.5:
        log.warning("slow_query", sql=stmt[:500], ms=dur*1000)

App-side; complements DB-side logging.

Find long-running queries (live)

SELECT
    pid, usename, application_name,
    state, age(now(), query_start) AS duration,
    substring(query, 1, 80) AS query
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < now() - interval '5 minutes'
ORDER BY query_start;

Find queries waiting on locks

SELECT
    blocked.pid AS blocked_pid,
    substring(blocked.query, 1, 60) AS blocked_query,
    blocking.pid AS blocking_pid,
    substring(blocking.query, 1, 60) AS blocking_query,
    age(now(), blocked.query_start) AS blocked_duration
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks bg ON bg.relation = bl.relation AND bg.granted AND bg.pid != bl.pid
JOIN pg_stat_activity blocking ON blocking.pid = bg.pid
WHERE blocked.state = 'active';

Who’s holding the lock that’s blocking who.

Index usage on a query

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = '[email protected]';

-- Look for: "Index Scan using ix_users_email" (good)
-- Vs: "Seq Scan" (probably missing index)

Why isn’t my index used?

Common causes:

Function on column: WHERE lower(email) = ... (need expression index).
Type cast mismatch: WHERE id::text = '1'.
LIKE without anchor: '%foo%' (need trigram).
Small table: planner chooses seq scan.
Stats stale: run ANALYZE.
Statistics distribution: query value is too common; index unhelpful.

pg_buffercache (what’s in cache)

CREATE EXTENSION pg_buffercache;

SELECT c.relname, count(*) AS buffers
FROM pg_buffercache b JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)
GROUP BY c.relname
ORDER BY buffers DESC LIMIT 10;

Hot tables in shared_buffers.

Connection-level explain

async with session.begin():
    await session.execute(text("SET LOCAL log_min_duration_statement = '0'"))
    # All queries in this transaction logged
    await session.execute(stmt)

EXPLAIN settings

EXPLAIN (
    ANALYZE,           -- actually run
    BUFFERS,           -- show IO
    FORMAT JSON,       -- machine readable
    VERBOSE,           -- column-level info
    COSTS, TIMING, SUMMARY
) SELECT ...

When to optimize

Top 10 in pg_stat_statements by total_exec_time.
Anything > 200ms in p99.
Queries with > 100k rows scanned but few rows returned.
Anything showing Seq Scan on a large table.

Common mistakes

pg_stat_statements not enabled — blind.
Running ANALYZE rarely — stale stats.
Optimizing slow queries one-off without measuring relative impact.
Adding indexes without removing unused — write amplification.

pg_stat_statements (top-N by time)#

Reset#

auto_explain#

log_min_duration_statement#

EXPLAIN#

Slow query log via SQLAlchemy events#

Find long-running queries (live)#

Find queries waiting on locks#

Index usage on a query#

Why isn’t my index used?#

pg_buffercache (what’s in cache)#

Connection-level explain#

EXPLAIN settings#

When to optimize#

Common mistakes#

Read this next#