Profiling cheatsheet.
cProfile (stdlib)
python -m cProfile -o profile.prof script.py
python -m cProfile -s cumtime script.py | head -30
In code:
import cProfile, pstats
prof = cProfile.Profile()
prof.enable()
do_work()
prof.disable()
stats = pstats.Stats(prof).sort_stats("cumtime")
stats.print_stats(20)
Visualize cProfile
uv tool install snakeviz
snakeviz profile.prof
Browser-based flamegraph.
py-spy (sampling, production-safe)
uv tool install py-spy
Live:
py-spy top --pid 1234
py-spy top -- python script.py
Sample to flamegraph:
py-spy record -o profile.svg --pid 1234 --duration 30
Sample running production process without modification.
py-spy dump (current stack)
py-spy dump --pid 1234
Shows what every thread is doing right now. Useful for “what’s stuck?”.
scalene (CPU + memory + GPU)
uv tool install scalene
scalene script.py
Browser report with line-by-line CPU + memory + GPU usage. Distinguishes Python vs C time.
memray (memory)
uv tool install memray
memray run -o output.bin script.py
memray flamegraph output.bin
memray summary output.bin
Heap profiler. Find leaks.
tracemalloc (stdlib memory)
import tracemalloc
tracemalloc.start()
# ... run code ...
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
print(stat)
Quick memory analysis without external deps.
line_profiler (line-by-line CPU)
uv tool install line-profiler
@profile # added by tool
def hot_function():
...
kernprof -l -v script.py
Output per-line timing.
yappi (multi-threaded / asyncio aware)
import yappi
yappi.start()
do_work()
yappi.stop()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()
Handles threads / coroutines better than cProfile.
asyncio debug
import asyncio
loop = asyncio.get_event_loop()
loop.set_debug(True)
Or:
PYTHONASYNCIODEBUG=1 python script.py
Logs slow callbacks (> 100ms by default).
Continuous profiling
uv add pyroscope-io
import pyroscope
pyroscope.configure(
app_name="myapp",
server_address="http://pyroscope:4040",
)
Sends samples continuously to a Pyroscope server. Long-term visibility.
Workflow
1. Identify hot path: py-spy top on running process.
2. Capture flamegraph: py-spy record for a representative period.
3. Drill in: scalene or line_profiler on the function.
4. Check memory: memray.
5. Iterate; verify.
Common hot paths
- N+1 DB queries — solved via eager loading.
- JSON serialization — use orjson.
- Loop with attribute lookups — local-bind.
- String concatenation in loop — use list + join.
- Sync IO in async — to_thread or asyncify.
perf vs profile
| Sampling | Deterministic | |
|---|---|---|
| py-spy | yes | no |
| cProfile | no | yes |
| scalene | yes | partial |
Sampling: low overhead; good for production. Deterministic: counts every call; overhead per-function.
Reading flamegraphs
Wide bars = most time. Look at the top of tall stacks. Click to zoom.
[main] ← entry
[handler]
[db.execute] ← wide; time spent here
[psycopg2._wait]
Common mistakes
- cProfile in production — too much overhead.
- Optimizing cold paths.
- Trusting first profile — warm up the cache first.
- Not separating Python time from C extension time — scalene helps.
Read this next
If you want my py-spy + pyroscope continuous profiling setup, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .