JSONB is one of Postgres’s quietly excellent features. Schema-flexible, indexed, queryable — but also misused. This post is the working playbook.
When to JSONB
Use JSONB when:
- The schema is genuinely variable (per-tenant fields, third-party payloads, audit blobs).
- You need to store and occasionally query nested data.
- Adding a column for every possible key would be silly.
Don’t use JSONB when:
- The fields are stable and queried often — make them columns.
- You need strict types and constraints — JSON doesn’t enforce schema.
- The data has clear relational structure — normalize.
Storage
CREATE TABLE events (
id bigserial PRIMARY KEY,
occurred_at timestamptz NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL
);
JSONB is binary, more compact than JSON text, and supports indexing. Always JSONB, never json.
Querying
-- Containment
SELECT * FROM events WHERE payload @> '{"status": "ok"}';
-- Path access
SELECT * FROM events WHERE payload->>'user_id' = '123';
SELECT payload->'user'->'email' FROM events;
-- Existence
SELECT * FROM events WHERE payload ? 'error';
-- Multiple keys
SELECT * FROM events WHERE payload ?& array['a', 'b']; -- all
SELECT * FROM events WHERE payload ?| array['a', 'b']; -- any
-- jsonb_path_query (more powerful)
SELECT jsonb_path_query(payload, '$.items[*] ? (@.qty > 5)') FROM orders;
@> and -> and ->> are the workhorses. jsonb_path_query for SQL/JSON path syntax.
GIN index for @>
CREATE INDEX events_payload_gin ON events USING GIN (payload jsonb_path_ops);
jsonb_path_ops is faster and smaller than the default for @> queries. Use it unless you need other operators (existence checks).
-- Now:
EXPLAIN SELECT * FROM events WHERE payload @> '{"user_id": 123}';
-- → Bitmap Index Scan on events_payload_gin
Expression index for specific paths
If you always query a specific field:
CREATE INDEX events_user_id ON events ((payload->>'user_id'));
-- Or with cast:
CREATE INDEX events_user_id_int ON events (((payload->>'user_id')::int));
Faster and smaller than a GIN index over the whole document.
Partial indexes
CREATE INDEX events_errors ON events (occurred_at)
WHERE payload @> '{"level": "error"}';
Only error events indexed. Index is tiny; queries that match the partial predicate use it.
Updates
-- Set a key
UPDATE users SET prefs = prefs || '{"theme": "dark"}' WHERE id = 1;
-- Set a nested path
UPDATE users SET prefs = jsonb_set(prefs, '{notifications, email}', 'true') WHERE id = 1;
-- Remove a key
UPDATE users SET prefs = prefs - 'theme' WHERE id = 1;
-- Remove a nested path
UPDATE users SET prefs = prefs #- '{notifications, sms}' WHERE id = 1;
JSONB updates rewrite the whole document under the hood. For tiny prefs: fine. For large blobs with frequent partial updates: consider normalization.
Performance pitfalls
1. No index, full table scan
SELECT * FROM events WHERE payload->>'user_id' = '123';
-- Without index: seq scan over millions of rows
Always EXPLAIN. Add indexes that match query patterns.
2. Index doesn’t match query
CREATE INDEX ... ON events ((payload->>'user_id'));
-- BUT query is:
SELECT * FROM events WHERE payload @> '{"user_id": "123"}';
These don’t match. Either change the query or add the index that matches.
3. Cast mismatch
CREATE INDEX ... ON events ((payload->>'user_id'));
-- Query: WHERE (payload->>'user_id')::int = 123
-- The cast bypasses the index.
Match the index expression exactly, including casts.
4. Huge documents
10MB JSONB blob; updates rewrite all of it. Heavy WAL. Slow. Either:
- Split into a child table.
- Move large arrays into a separate normalized table.
5. JSONB with frequent UPDATE-many
Append-heavy logs in JSONB columns get bloated fast. VACUUM matters; consider partitioning.
Schema validation
JSONB doesn’t enforce schema, but you can:
ALTER TABLE events ADD CONSTRAINT payload_has_user
CHECK (payload ? 'user_id');
Or stronger with pg_jsonschema extension:
ALTER TABLE events ADD CONSTRAINT valid_payload
CHECK (jsonschema_is_valid('{"type":"object","required":["user_id"]}'::json, payload));
Hybrid: keep flexibility but enforce essentials.
Real-world patterns
Per-tenant custom fields
CREATE TABLE customers (
id bigserial PRIMARY KEY,
tenant_id bigint NOT NULL,
email text NOT NULL,
custom jsonb NOT NULL DEFAULT '{}'
);
CREATE INDEX customers_custom_gin ON customers USING GIN (custom jsonb_path_ops);
Each tenant defines their own custom fields. Queryable; flexible.
Webhook payloads
CREATE TABLE webhook_events (
id bigserial PRIMARY KEY,
source text NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL,
received_at timestamptz DEFAULT now()
);
CREATE INDEX ON webhook_events (source, event_type, received_at DESC);
CREATE INDEX ON webhook_events USING GIN (payload jsonb_path_ops);
Inbound webhooks have varied schemas. Store as JSONB; index for searching.
Audit log
CREATE TABLE audit_log (
id bigserial PRIMARY KEY,
actor_id bigint NOT NULL,
action text NOT NULL,
target jsonb NOT NULL, -- {"type": "user", "id": 42}
diff jsonb NOT NULL, -- {"before": {...}, "after": {...}}
occurred_at timestamptz DEFAULT now()
);
Diff varies per action. JSONB fits.
Common mistakes
1. JSONB everything
{ "id": 1, "name": "alice", "email": "..." } as JSONB instead of columns. No type safety, no constraints, slow queries. Just use columns.
2. Deep nesting
payload.a.b.c.d.e.f. Hard to query, hard to index. Flatten.
3. Storing arrays of growing size
Arrays of 10000+ elements in JSONB: read entire array even for one element. Use a child table.
4. No GIN index
GIN indexes are larger / slower to build than B-tree but enable JSONB queries. Without one, expect seq scans.
5. JSONB for relational data
User → Posts → Comments stored as one big JSONB doc. Sounds nice; querying is painful. Normalize.
Read this next
- Postgres Performance Tuning 2026
- Postgres Indexing 2026
- Postgres Partitioning 2026
- Postgres Replication Topologies 2026
If you want my JSONB indexing reference + benchmark notebook, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .