PostgreSQL JSONB Advanced Patterns — Indexing, Performance, and Pitfalls

JSONB is one of the best things about PostgreSQL — and one of the easiest to misuse. Done right it gives you schema flexibility and near-relational performance. Done wrong it’s a write-amplification nightmare that hides until production.

This post is the working set of patterns I reach for. The basics (Postgres Fundamentals ) won’t get repeated; this is the next layer.

When JSONB earns its keep

Truly variable schema — events, audit logs, integration payloads, user-defined fields.
Many optional fields that would make 80 NULL columns.
Provider-specific data that may evolve faster than your migration cadence.

When NOT to use JSONB:

The fields are stable. Use real columns.
You’ll want to filter, aggregate, or join on these fields constantly. Real columns again.
The document is large (>10 KB) and updated frequently. You’ll hit jsonb_set write amplification.

JSONB is a tool, not a philosophy.

Indexing — the part that decides everything

A WHERE data->>'status' = 'paid' against an unindexed JSONB is a sequential scan. At 1M rows, that’s seconds. With the right index, milliseconds.

`GIN` with `jsonb_ops` (default, broadest)

CREATE INDEX events_data_gin ON events USING GIN (data);

Supports:

?, ?|, ?& (key existence)
@> (containment)
@?, @@ (jsonpath)

This is the lazy default — works for many query shapes. It’s bigger than alternatives and slower to build.

`GIN` with `jsonb_path_ops` (smaller, faster, fewer ops)

CREATE INDEX events_data_gin_path ON events USING GIN (data jsonb_path_ops);

Supports @> containment only — but it’s roughly 30% smaller and noticeably faster for that operator. If your queries are data @> '{"status": "paid"}' shaped, prefer this.

Expression index on a hot key

CREATE INDEX events_status ON events ((data->>'status'));

Supports point lookups on data->>'status' = 'paid' and equality joins. This is what most tutorials should be teaching. B-tree on a single key extracted from JSONB is the highest-ROI JSONB index pattern for common access shapes.

Partial index for selective values

CREATE INDEX events_paid
  ON events ((data->>'amount'))
  WHERE data->>'status' = 'paid';

If 95% of your events aren’t ‘paid’ but most of your queries care about paid ones, this index is much smaller and can outperform a broader one.

Combining JSONB with a typed column index

CREATE INDEX events_user_paid
  ON events (user_id)
  WHERE data->>'status' = 'paid';

Mix and match. The planner is good at picking these.

Generated columns — the win you’ll wish you’d known

Postgres 12+ supports stored generated columns. Promote hot keys out of JSONB:

ALTER TABLE events
  ADD COLUMN status TEXT GENERATED ALWAYS AS (data->>'status') STORED,
  ADD COLUMN amount NUMERIC(12,2) GENERATED ALWAYS AS ((data->>'amount')::NUMERIC) STORED;

CREATE INDEX events_status_idx ON events (status);
CREATE INDEX events_amount_idx ON events (amount);

Now WHERE status = 'paid' AND amount > 100 is a clean B-tree query. The data lives in JSONB and projects to a typed column. Best of both.

Caveats:

STORED generated columns are physical — they cost row width.
Updates to data re-evaluate every generated expression. With many generated columns, write cost rises.
Cannot reference other generated columns. Cannot be VOLATILE.

For 1–3 hot keys, this is the cleanest pattern. Beyond that, just promote them to real columns.

Query patterns that perform

-- Containment: super common, GIN-indexable
SELECT id FROM events WHERE data @> '{"status": "paid"}';

-- Existence
SELECT id FROM events WHERE data ? 'refunded_at';

-- Path extraction (fast with expression index)
SELECT id FROM events WHERE data->>'status' = 'paid';

-- Numeric extraction (cast! and watch for NULL/non-numeric)
SELECT id FROM events WHERE (data->>'amount')::NUMERIC > 100;

-- jsonpath (Postgres 12+)
SELECT id FROM events WHERE data @@ '$.items[*].price > 100';
SELECT id FROM events WHERE data @? '$.items ? (@.qty > 5)';

Two things to internalize:

-> returns JSONB; ->> returns text.
For numeric comparison, cast the text to NUMERIC and beware nulls.

`jsonb_set` and write amplification

UPDATE events SET data = jsonb_set(data, '{status}', '"paid"') WHERE id = $1;

jsonb_set returns a new JSONB. Postgres rewrites the entire row to the heap and the WAL. A 10KB document updated to flip a single boolean rewrites 10KB.

At scale, this is the biggest JSONB performance trap. Three mitigations:

Don’t store frequently-mutated state in a large JSONB. Promote it.
Split the JSONB column. Store mostly-immutable data in attributes JSONB, mutable counters in a smaller state JSONB or typed columns.
Use HOT updates — if no indexed column changes, Postgres can do a heap-only update. Don’t put indexed JSONB fields in the same column as frequently-updated ones.

Partial JSONB updates with `||` (concatenate)

-- Merges into the top level
UPDATE events SET data = data || '{"viewed_at": "2026-04-28T...", "viewed_by": 42}' WHERE id = $1;

-- Update nested:
UPDATE events SET data = jsonb_set(data, '{user, last_seen}', to_jsonb(now()), true) WHERE id = $1;

|| is slightly cheaper than jsonb_set for top-level merges and reads cleaner.

Nesting depth — keep it shallow

data->'a'->'b'->'c'->'d'->>'value'   -- 4 levels deep

Each -> step is a JSONB scan. Deeply nested structures aren’t free. Two rules:

Aim for depth ≤ 2. Beyond that, performance and ergonomics drop.
Store the same key consistently. data->>'amount' everywhere; not data->'pricing'->>'amount' here and data->>'amount' there.

If you find yourself reaching deep, it’s a signal to flatten.

Aggregations on JSONB

-- Sum a value across rows
SELECT SUM((data->>'amount')::NUMERIC) FROM events WHERE data->>'status' = 'paid';

-- Count by extracted dimension
SELECT data->>'country', COUNT(*) FROM events
GROUP BY data->>'country';

-- Use jsonb_each_text for key/value pivots
SELECT k, COUNT(*)
FROM events, jsonb_each_text(data) AS j(k, v)
GROUP BY k;

These work but are slow at scale unless backed by appropriate indexes (or generated columns). For analytic workloads, you’ll be happier moving the JSONB to ClickHouse or BigQuery.

Partitioning

For huge tables (100M+ rows), partition by a typed column (usually created_at):

CREATE TABLE events (
  id BIGSERIAL,
  created_at TIMESTAMPTZ NOT NULL,
  data JSONB NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_q2 PARTITION OF events
  FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

Then per-partition GIN indexes. Queries with a created_at filter prune partitions. Old data archives easily.

This is the pattern for clean 50M-row → 12ms timings you’ll see in case studies.

Common mistakes I’ve seen

1. Indexing the whole JSONB when you query one key

A GIN on data works for 50 different query shapes. An expression index on (data->>'status') is 100× smaller and faster for the one query you actually run. Index the access pattern, not the column.

2. Treating JSONB like a document store

A 50KB JSON-everywhere column is a misuse. JSONB is for the flexible parts. The structured parts go in real columns.

3. Forgetting NULL semantics

-- These are different:
WHERE data->>'status' = 'paid'      -- string compare
WHERE data->'status' = '"paid"'::jsonb  -- JSONB compare (JSON encoding!)
WHERE NOT data ? 'status'           -- key absent
WHERE data->'status' IS NULL        -- key present with JSON null OR absent (subtle)

Test edge cases. JSONB null ≠ SQL NULL ≠ key absent.

4. Storing booleans as strings

{"active": "true"}     -- ⛔
{"active": true}       -- ✅

Type tractor accidents come from JS clients sending strings. Validate at the API boundary.

5. Sorting on a key without an index

ORDER BY data->>'priority' DESC LIMIT 20;

Without an expression index, full table sort. With:

CREATE INDEX events_priority_desc ON events ((data->>'priority') DESC);

…it’s a fast index scan.

A real-world checklist

For a JSONB column that matters:

What 5 queries access it? Index those access shapes.
Are there 1–3 hot keys? Add generated columns + B-tree indexes.
Is it >10 KB and frequently updated? Split the column.
Is depth ≤ 2? Flatten if not.
Is the table partitioned by date? It should be at >50M rows.
Do EXPLAIN ANALYZE on every shape that’s not a primary key lookup. The planner sometimes surprises.

When JSONB earns its keep#

Indexing — the part that decides everything#

GIN with jsonb_ops (default, broadest)#

GIN with jsonb_path_ops (smaller, faster, fewer ops)#

Expression index on a hot key#

Partial index for selective values#

Combining JSONB with a typed column index#

Generated columns — the win you’ll wish you’d known#

Query patterns that perform#

jsonb_set and write amplification#

Partial JSONB updates with || (concatenate)#

Nesting depth — keep it shallow#

Aggregations on JSONB#

Partitioning#

Common mistakes I’ve seen#

1. Indexing the whole JSONB when you query one key#

2. Treating JSONB like a document store#

3. Forgetting NULL semantics#

4. Storing booleans as strings#

5. Sorting on a key without an index#

A real-world checklist#

Read this next#