PostgreSQL Fundamentals Every Backend Developer Should Know

Why PostgreSQL is worth learning deeply

PostgreSQL is the database I reach for by default. It’s free, open-source, ACID-compliant, ridiculously feature-rich, and battle-tested at every scale from hobby projects to companies serving billions of requests. Knowing Postgres well isn’t a “database team” skill — it’s a backend developer skill, period. The difference between a query that takes 2 seconds and one that takes 2 milliseconds is almost always Postgres knowledge.

This post is a primer on the parts of Postgres I find myself using and recommending most often. It assumes you’ve written some SQL before but haven’t necessarily explored Postgres-specific features.

Data types worth knowing

Postgres has a wonderfully rich type system. The temptation to default everything to VARCHAR(255) and INTEGER is real, but you’re leaving a lot of correctness and performance on the table.

Use `TEXT` instead of `VARCHAR(n)`

Unless you have a specific reason to limit length, just use TEXT. There’s no performance difference in Postgres, and you avoid the dreaded “I need to widen this column from VARCHAR(50) to VARCHAR(100)” migration.

Use `TIMESTAMPTZ` (always)

TIMESTAMPTZ (timestamp with time zone) stores everything in UTC and converts at the boundary. TIMESTAMP (without time zone) stores whatever you give it and trusts you to keep track. Almost every timezone bug I’ve seen came from someone using TIMESTAMP and then arguing with themselves about which timezone the value was “really” in. Use TIMESTAMPTZ.

Use `UUID` for distributed IDs

If you have multiple writers (microservices, mobile clients generating IDs offline), UUID prevents collisions without a central sequence:

CREATE EXTENSION IF NOT EXISTS "pgcrypto";

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email TEXT NOT NULL UNIQUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

If you’re a single service with sequential writes, an auto-incrementing BIGINT is still cheaper and indexes more compactly.

`NUMERIC` for money, never `FLOAT`

Floating-point math is fast but lossy. For money, prices, anything where rounding matters, use NUMERIC(precision, scale):

price NUMERIC(12, 2)  -- up to 9,999,999,999.99

Arrays and Enums

Postgres has native array types and enums. Use them sparingly — they’re great when the data is genuinely a fixed set of values, but if it’s likely to grow you’ll regret modeling it as an enum (changing enum values requires a migration).

CREATE TYPE order_status AS ENUM ('pending', 'paid', 'shipped', 'cancelled');

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    status order_status NOT NULL DEFAULT 'pending',
    tags TEXT[] NOT NULL DEFAULT '{}'
);

Indexes: the difference between fast and slow

If your table has more than a few thousand rows and a query feels slow, the answer is almost always an index. Postgres has several index types worth knowing.

B-tree (the default)

Used for equality and range queries. If you don’t specify a type, you get a B-tree. Add one whenever you regularly filter, sort, or join on a column:

CREATE INDEX idx_orders_user_id ON orders(user_id);

Composite indexes

For queries that filter on multiple columns, a composite index in the right order matters:

CREATE INDEX idx_orders_user_status ON orders(user_id, status);

This index speeds up WHERE user_id = ? AND status = ? and also WHERE user_id = ?. It does not speed up WHERE status = ? alone — leftmost-prefix rule.

Partial indexes

If you only ever query WHERE status = 'active', index only those rows:

CREATE INDEX idx_orders_active ON orders(user_id) WHERE status = 'active';

Smaller index, faster lookups, less write amplification.

GIN indexes for JSONB and arrays

For JSONB containment queries (@>) or full-text search:

CREATE INDEX idx_users_metadata ON users USING GIN (metadata);

Use `EXPLAIN ANALYZE`

Don’t guess. Run EXPLAIN ANALYZE <query> to see what Postgres actually does. Look for Seq Scan on big tables (usually bad), or queries returning lots of rows when they should be filtering down.

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42 AND status = 'paid';

Transactions and isolation

Every statement in Postgres runs in a transaction, even if you don’t write BEGIN. Wrapping multiple statements in one transaction gives you atomicity:

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

If anything fails before COMMIT, the whole thing rolls back.

The default isolation level is READ COMMITTED. For most apps that’s fine. If you need stronger guarantees:

REPEATABLE READ — guarantees that the same query inside a transaction returns the same rows.
SERIALIZABLE — pretends transactions run one at a time. Most expensive, strongest guarantees.

Use SELECT ... FOR UPDATE to lock specific rows when you read them, preventing concurrent updates.

JSONB: the killer feature

Postgres’s JSONB type lets you store arbitrary JSON, query it, and index it efficiently. It’s the reason a lot of teams stick with Postgres instead of reaching for MongoDB.

CREATE TABLE events (
    id BIGSERIAL PRIMARY KEY,
    occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    payload JSONB NOT NULL
);

INSERT INTO events (payload) VALUES
    ('{"type": "login", "user_id": 42, "ip": "1.2.3.4"}'),
    ('{"type": "purchase", "user_id": 42, "amount": 19.99}');

Query it:

-- field access
SELECT payload->>'type' AS type FROM events;

-- containment
SELECT * FROM events WHERE payload @> '{"type": "purchase"}';

-- nested path
SELECT payload #>> '{user, email}' FROM events;

Best practice: use JSONB for genuinely variable data (event payloads, third-party API responses, user preferences). Don’t use it as an excuse to skip schema design — your “user has email and name” should still be regular columns.

Common Table Expressions (CTEs) for readability

CTEs (WITH ... AS) let you name and reuse subqueries. Great for breaking complex queries into named steps:

WITH active_users AS (
    SELECT id, email FROM users WHERE is_active = true
),
recent_orders AS (
    SELECT user_id, count(*) AS order_count
    FROM orders
    WHERE created_at > now() - interval '30 days'
    GROUP BY user_id
)
SELECT
    u.email,
    coalesce(o.order_count, 0) AS orders_last_30d
FROM active_users u
LEFT JOIN recent_orders o ON o.user_id = u.id;

Postgres 12+ inlines CTEs by default for performance, so they’re not slower than equivalent subqueries.

Window functions

Window functions compute values across a set of related rows without collapsing them. Useful for rankings, running totals, deltas:

SELECT
    user_id,
    created_at,
    amount,
    sum(amount) OVER (PARTITION BY user_id ORDER BY created_at) AS running_total
FROM orders;

If you’ve ever exported data to a spreadsheet just to calculate a running total, you can almost certainly do it in SQL with a window function.

Connection pooling

A new Postgres connection is expensive (forks a backend process). For web apps, never let your application create a new connection per request. Either:

Use your framework’s connection pool (Django’s CONN_MAX_AGE, SQLAlchemy’s pool).
Run PgBouncer in front of Postgres for transaction-level pooling. This is essential at scale.

Backup and recovery

A Postgres database without backups is a disaster waiting to happen.

pg_dump — logical backup of a single database. Good for small to medium DBs.
pg_basebackup — physical backup of the cluster. Restores the entire data directory.
Point-in-time recovery (PITR) with WAL archiving — the gold standard. You can restore to any moment, not just the last backup.

Whatever you choose, test the restore. A backup you’ve never restored is just a hopeful file.

Wrapping up

Postgres rewards depth. The features above — proper data types, indexes, JSONB, CTEs, window functions, transactions — are the day-to-day tools you’ll use over and over again as a backend developer. Master these and you’ll find yourself writing simpler application code, because the database is doing the work it was always meant to do.

In future posts I’ll dig into specific topics — full-text search, performance tuning with EXPLAIN, replication setups, and more. If there’s something you’d like me to cover, drop a comment.

Happy querying!

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Why PostgreSQL is worth learning deeply#

Data types worth knowing#

Use TEXT instead of VARCHAR(n)#

Use TIMESTAMPTZ (always)#

Use UUID for distributed IDs#

NUMERIC for money, never FLOAT#

Arrays and Enums#

Indexes: the difference between fast and slow#

B-tree (the default)#

Composite indexes#

Partial indexes#

GIN indexes for JSONB and arrays#

Use EXPLAIN ANALYZE#

Transactions and isolation#

JSONB: the killer feature#

Common Table Expressions (CTEs) for readability#

Window functions#

Connection pooling#

Backup and recovery#

Wrapping up#

Why PostgreSQL is worth learning deeply

Data types worth knowing

Use `TEXT` instead of `VARCHAR(n)`

Use `TIMESTAMPTZ` (always)

Use `UUID` for distributed IDs

`NUMERIC` for money, never `FLOAT`

Arrays and Enums

Indexes: the difference between fast and slow

B-tree (the default)

Composite indexes

Partial indexes

GIN indexes for JSONB and arrays

Use `EXPLAIN ANALYZE`

Transactions and isolation

JSONB: the killer feature

Common Table Expressions (CTEs) for readability

Window functions

Connection pooling

Backup and recovery

Wrapping up