py-spy for production / running processes (sampling, no code changes). cProfile for offline / reproducible benchmarks. py-spy is the modern default.

How do I find a memory leak in Python?

memray. It tracks allocations across the whole process, including C extensions. tracemalloc is built-in but less complete. For production: memray's --follow-fork for child processes.

Python Profiling in 2026 — py-spy, scalene, memray, and Finding Real Bottlenecks

You can’t optimize what you don’t measure. Python’s profiling story improved dramatically in the last few years: production-safe sampling with py-spy, GPU+CPU+memory in one tool with scalene, leak hunting with memray. This post is the working playbook.

py-spy: production sampler

pip install py-spy

# Live top
py-spy top --pid 12345

# Flamegraph
py-spy record -o profile.svg --pid 12345 --duration 30

# Dump (current stack of all threads)
py-spy dump --pid 12345

Zero code changes. Works on running production processes. Sampling: ~5% overhead.

For a deployed service:

docker exec -it api-pod py-spy record -o /tmp/profile.svg --pid 1 --duration 30
docker cp api-pod:/tmp/profile.svg ./

Open the SVG in a browser. Hot paths jump out.

scalene: line-level

pip install scalene

scalene myscript.py
# or
python -m scalene myscript.py

Per-line CPU, memory, GPU usage. Distinguishes Python time from native (C extension) time. Output: HTML report.

For an API:

scalene --html --outfile profile.html myapp.py
# load test for a few minutes; ctrl-c
# open profile.html

Beats cProfile for finding which lines actually matter.

memray: memory profiling

pip install memray

memray run -o output.bin myscript.py
memray flamegraph output.bin

Records every allocation. Find:

Peak memory users.
Memory leaks (allocations that never freed).
Allocation hot paths.

For long-running processes:

memray run --live --pid 12345

Live tracking on a running process.

tracemalloc (built-in)

import tracemalloc

tracemalloc.start()
# ... run ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
    print(stat)

Built-in; less powerful than memray but no install. Good for “where did all my memory go?” smoke tests.

cProfile (built-in, offline)

python -m cProfile -o out.prof myscript.py
python -m pstats out.prof
> sort cumulative
> stats 30

For deterministic benchmarks where you can re-run. Higher overhead than sampling.

SnakeViz for cProfile

pip install snakeviz
python -m cProfile -o out.prof myscript.py
snakeviz out.prof

Interactive flamegraph in a browser. Friendlier than pstats.

Async profiling

py-spy, scalene work with asyncio. For task-level visibility:

import asyncio

asyncio.get_event_loop().set_debug(True)

Logs slow callbacks (>100ms). Cheap signal for “this coroutine blocked the loop.”

Workflow

1. Measure (don't guess).
2. Identify the top-3 hot spots.
3. Pick one; optimize.
4. Re-measure. Did it help?
5. Repeat until cost / benefit ratio favors stopping.

Most “optimizations” without measurement are wasted work.

Common bottleneck patterns

1. Sync code in async paths

time.sleep, requests.get, sync DB drivers. py-spy shows the event loop blocked. Use asyncio.to_thread or async libraries.

2. N+1 queries

Profile shows DB query in a loop. Refactor to one query.

3. JSON serialization in hot paths

Stdlib json in 100k req/sec hot path. Switch to orjson for 5–10x speedup.

4. Pickle / deepcopy

Often unnecessary. Profile reveals; replace with explicit copy.

5. Logging in tight loops

log.info with string formatting per iteration. Sample or aggregate.

Continuous profiling in production

For long-term visibility (not one-off):

Pyroscope — continuous profiling, Grafana integration.
Datadog Continuous Profiler — SaaS.
Granulate (free tier) — open-source agent.
Parca — OSS continuous profiling.

Hot paths over time; correlate with deploys; spot regressions before users do.

Microbenchmarks

import timeit

setup = "data = list(range(1000))"
t1 = timeit.timeit("[x*2 for x in data]", setup=setup, number=10000)
t2 = timeit.timeit("list(map(lambda x: x*2, data))", setup=setup, number=10000)
print(t1, t2)

For small “is X faster than Y” questions. Avoid microbenchmark obsession — macro perf usually dominates.

pyperf for stable benchmarks

pip install pyperf
pyperf timeit -s "data = list(range(1000))" "[x*2 for x in data]"

Statistical rigor: warmup, multiple runs, variance reporting. Better than timeit for serious benchmarking.

Common mistakes

1. Optimizing without measuring

You “knew” the SQL was slow. It wasn’t; the JSON serialization was. Wasted day.

2. cProfile in production

10-30% overhead. Use py-spy.

3. Micro before macro

Cython-rewriting a function that runs once a minute. Find the hot path first.

4. Ignoring memory

CPU-only profiling misses leaks; pod restarts mysteriously. Profile both.

5. One-off profiling

Deployed; never profiled again. Continuous profiling catches drift.

What I’d ship today

For Python services:

py-spy in production debugging tooling.
scalene for one-off deep dives.
memray when memory misbehaves.
Pyroscope or similar for continuous profiling.
orjson for fast JSON in hot paths.
Async-aware libraries everywhere; never sync IO in async.
Observability + profiling correlated by trace_id.

Read this next

If you want my profiling cheat sheet + py-spy production setup, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

py-spy: production sampler#

scalene: line-level#

memray: memory profiling#

tracemalloc (built-in)#

cProfile (built-in, offline)#

SnakeViz for cProfile#

Async profiling#

Workflow#

Common bottleneck patterns#

1. Sync code in async paths#

2. N+1 queries#

3. JSON serialization in hot paths#

4. Pickle / deepcopy#

5. Logging in tight loops#

Continuous profiling in production#

Microbenchmarks#

pyperf for stable benchmarks#

Common mistakes#

1. Optimizing without measuring#

2. cProfile in production#

3. Micro before macro#

4. Ignoring memory#

5. One-off profiling#

What I’d ship today#

Read this next#