Python Cheatsheet 09 — Profiling (py-spy, cProfile, scalene)

Profiling cheatsheet.

cProfile (stdlib)

python -m cProfile -o profile.prof script.py
python -m cProfile -s cumtime script.py | head -30

In code:

import cProfile, pstats

prof = cProfile.Profile()
prof.enable()
do_work()
prof.disable()

stats = pstats.Stats(prof).sort_stats("cumtime")
stats.print_stats(20)

Visualize cProfile

uv tool install snakeviz
snakeviz profile.prof

Browser-based flamegraph.

py-spy (sampling, production-safe)

uv tool install py-spy

Live:

py-spy top --pid 1234
py-spy top -- python script.py

Sample to flamegraph:

py-spy record -o profile.svg --pid 1234 --duration 30

Sample running production process without modification.

py-spy dump (current stack)

py-spy dump --pid 1234

Shows what every thread is doing right now. Useful for “what’s stuck?”.

scalene (CPU + memory + GPU)

uv tool install scalene
scalene script.py

Browser report with line-by-line CPU + memory + GPU usage. Distinguishes Python vs C time.

memray (memory)

uv tool install memray
memray run -o output.bin script.py
memray flamegraph output.bin
memray summary output.bin

Heap profiler. Find leaks.

tracemalloc (stdlib memory)

import tracemalloc

tracemalloc.start()
# ... run code ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
    print(stat)

Quick memory analysis without external deps.

line_profiler (line-by-line CPU)

uv tool install line-profiler

@profile               # added by tool
def hot_function():
    ...

kernprof -l -v script.py

Output per-line timing.

yappi (multi-threaded / asyncio aware)

import yappi

yappi.start()
do_work()
yappi.stop()

yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()

Handles threads / coroutines better than cProfile.

asyncio debug

import asyncio
loop = asyncio.get_event_loop()
loop.set_debug(True)

Or:

PYTHONASYNCIODEBUG=1 python script.py

Logs slow callbacks (> 100ms by default).

Continuous profiling

uv add pyroscope-io

import pyroscope
pyroscope.configure(
    app_name="myapp",
    server_address="http://pyroscope:4040",
)

Sends samples continuously to a Pyroscope server. Long-term visibility.

Workflow

1. Identify hot path: py-spy top on running process.
2. Capture flamegraph: py-spy record for a representative period.
3. Drill in: scalene or line_profiler on the function.
4. Check memory: memray.
5. Iterate; verify.

Common hot paths

N+1 DB queries — solved via eager loading.
JSON serialization — use orjson.
Loop with attribute lookups — local-bind.
String concatenation in loop — use list + join.
Sync IO in async — to_thread or asyncify.

perf vs profile

	Sampling	Deterministic
py-spy	yes	no
cProfile	no	yes
scalene	yes	partial

Sampling: low overhead; good for production. Deterministic: counts every call; overhead per-function.

Reading flamegraphs

Wide bars = most time. Look at the top of tall stacks. Click to zoom.

[main]              ← entry
  [handler]
    [db.execute]    ← wide; time spent here
      [psycopg2._wait]

Common mistakes

cProfile in production — too much overhead.
Optimizing cold paths.
Trusting first profile — warm up the cache first.
Not separating Python time from C extension time — scalene helps.

cProfile (stdlib)#

Visualize cProfile#

py-spy (sampling, production-safe)#

py-spy dump (current stack)#

scalene (CPU + memory + GPU)#

memray (memory)#

tracemalloc (stdlib memory)#

line_profiler (line-by-line CPU)#

yappi (multi-threaded / asyncio aware)#

asyncio debug#

Continuous profiling#

Workflow#

Common hot paths#

perf vs profile#

Reading flamegraphs#

Common mistakes#

Read this next#