Profiling cheatsheet.

cProfile (stdlib)

python -m cProfile -o profile.prof script.py
python -m cProfile -s cumtime script.py | head -30

In code:

import cProfile, pstats

prof = cProfile.Profile()
prof.enable()
do_work()
prof.disable()

stats = pstats.Stats(prof).sort_stats("cumtime")
stats.print_stats(20)

Visualize cProfile

uv tool install snakeviz
snakeviz profile.prof

Browser-based flamegraph.

py-spy (sampling, production-safe)

uv tool install py-spy

Live:

py-spy top --pid 1234
py-spy top -- python script.py

Sample to flamegraph:

py-spy record -o profile.svg --pid 1234 --duration 30

Sample running production process without modification.

py-spy dump (current stack)

py-spy dump --pid 1234

Shows what every thread is doing right now. Useful for “what’s stuck?”.

scalene (CPU + memory + GPU)

uv tool install scalene
scalene script.py

Browser report with line-by-line CPU + memory + GPU usage. Distinguishes Python vs C time.

memray (memory)

uv tool install memray
memray run -o output.bin script.py
memray flamegraph output.bin
memray summary output.bin

Heap profiler. Find leaks.

tracemalloc (stdlib memory)

import tracemalloc

tracemalloc.start()
# ... run code ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
    print(stat)

Quick memory analysis without external deps.

line_profiler (line-by-line CPU)

uv tool install line-profiler
@profile               # added by tool
def hot_function():
    ...
kernprof -l -v script.py

Output per-line timing.

yappi (multi-threaded / asyncio aware)

import yappi

yappi.start()
do_work()
yappi.stop()

yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()

Handles threads / coroutines better than cProfile.

asyncio debug

import asyncio
loop = asyncio.get_event_loop()
loop.set_debug(True)

Or:

PYTHONASYNCIODEBUG=1 python script.py

Logs slow callbacks (> 100ms by default).

Continuous profiling

uv add pyroscope-io
import pyroscope
pyroscope.configure(
    app_name="myapp",
    server_address="http://pyroscope:4040",
)

Sends samples continuously to a Pyroscope server. Long-term visibility.

Workflow

1. Identify hot path: py-spy top on running process.
2. Capture flamegraph: py-spy record for a representative period.
3. Drill in: scalene or line_profiler on the function.
4. Check memory: memray.
5. Iterate; verify.

Common hot paths

  • N+1 DB queries — solved via eager loading.
  • JSON serialization — use orjson.
  • Loop with attribute lookups — local-bind.
  • String concatenation in loop — use list + join.
  • Sync IO in async — to_thread or asyncify.

perf vs profile

SamplingDeterministic
py-spyyesno
cProfilenoyes
scaleneyespartial

Sampling: low overhead; good for production. Deterministic: counts every call; overhead per-function.

Reading flamegraphs

Wide bars = most time. Look at the top of tall stacks. Click to zoom.

[main]              ← entry
  [handler]
    [db.execute]    ← wide; time spent here
      [psycopg2._wait]

Common mistakes

  • cProfile in production — too much overhead.
  • Optimizing cold paths.
  • Trusting first profile — warm up the cache first.
  • Not separating Python time from C extension time — scalene helps.

Read this next

If you want my py-spy + pyroscope continuous profiling setup, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .