async/await in Python is one of those features that looks simple on the surface (sprinkle async and await and you have concurrency!) and turns into a swamp the moment you actually use it. The number of “I added async and now my app is slower” posts on Stack Overflow is a testament to that.
This post is the explanation I wish someone had given me before I started writing async code. We’ll build the mental model from first principles, then look at the patterns that work and the foot-guns that don’t.
The one-sentence summary
Async lets a single thread do useful work while waiting for I/O. That’s it. Everything else is a consequence of that statement.
If you understand that sentence, the rest of this post is just unpacking it.
The problem async solves
Most backend code spends most of its time waiting. Waiting for the database to respond. Waiting for an HTTP call to complete. Waiting for a file to be read. While the program waits, the CPU does nothing.
Synchronous code looks like this:
def get_users():
response_a = requests.get("https://api.example.com/users") # wait 200ms
response_b = requests.get("https://api.example.com/orders") # wait 200ms
return response_a.json(), response_b.json()
# Total: ~400ms
Two HTTP calls, each taking 200ms, run sequentially. The CPU is idle for almost all 400ms — just waiting on the network.
Async code looks like this:
async def get_users():
response_a, response_b = await asyncio.gather(
client.get("https://api.example.com/users"),
client.get("https://api.example.com/orders"),
)
return response_a.json(), response_b.json()
# Total: ~200ms
Same two HTTP calls, but they run concurrently. While one is waiting on the network, the other can also be waiting. The total time is now bound by the slowest call, not the sum of all calls.
That is the entire point of async. Not “make CPU work faster” — that’s threads or multiprocessing. Async is about not wasting time waiting.
Coroutines, the event loop, and what async actually does
When you write async def foo():, you’re not defining a regular function. You’re defining a coroutine function — a function that returns a coroutine object when called.
async def foo():
return 42
result = foo()
print(result) # <coroutine object foo at 0x...>
A coroutine is a chunk of work that knows how to pause itself. To actually run it, you hand it to the event loop:
import asyncio
async def foo():
return 42
result = asyncio.run(foo())
print(result) # 42
asyncio.run() starts an event loop, runs your coroutine, and shuts the loop down. Inside that loop, when your coroutine hits await something_io(), it tells the event loop “I’m pausing here — wake me up when this I/O is done.” The event loop notes that, then runs another coroutine that’s ready to make progress.
That’s it. There’s no magic. async defines work that can pause; await is the pause point; the event loop manages the schedule.
await only works inside async def
This is the rule that confuses everyone at first:
def main():
await some_coroutine() # SyntaxError
await only works inside an async def function. So how do you get from synchronous code into async land? Through asyncio.run():
async def main():
await some_coroutine()
asyncio.run(main())
This is your boundary. asyncio.run() is the only place you cross from sync to async. From inside main(), everything is async; outside it, everything is sync. Trying to bridge them carelessly is the source of most async bugs.
When async helps (and when it doesn’t)
Async helps when you have many concurrent I/O operations:
- HTTP calls to upstream APIs
- Database queries
- File reads/writes (with an async file lib)
- WebSocket connections
- Anything where the CPU is mostly waiting
Async does not help with CPU-bound work:
- Image processing
- Heavy data crunching
- ML model inference
- Tight numeric loops
Why? Because there’s nothing to await. The CPU is busy, not waiting. For CPU-bound work, you need real parallelism: threads (limited in Python by the GIL), processes (multiprocessing), or external workers (Celery, RQ, ProcessPoolExecutor).
# This will NOT speed up:
async def crunch_numbers():
return sum(i * i for i in range(10_000_000))
There’s no await in that function — it just blocks the event loop while it computes. Dressing CPU work in async syntax doesn’t make it concurrent.
The cardinal sin: blocking the event loop
This is the single most common async mistake:
import time
import asyncio
async def bad():
print("Starting...")
time.sleep(2) # ← blocking!
print("Done")
asyncio.run(bad())
time.sleep(2) blocks the entire event loop for 2 seconds. Nothing else can run. Every other coroutine is frozen. You took async code and made it worse than synchronous code.
The async version is asyncio.sleep:
async def good():
print("Starting...")
await asyncio.sleep(2)
print("Done")
This pauses just this coroutine. The event loop is free to run other coroutines while we wait.
Running coroutines concurrently: gather and create_task
await runs one coroutine at a time and waits for it to finish. To run multiple concurrently, you have two main tools.
asyncio.gather — run many, wait for all
async def main():
results = await asyncio.gather(
fetch("a"),
fetch("b"),
fetch("c"),
)
print(results)
gather returns a list in the same order as the inputs. If any of them raise, the exception propagates (and by default the others are cancelled — pass return_exceptions=True to collect exceptions instead of raising).
asyncio.create_task — fire-and-track
async def main():
task = asyncio.create_task(fetch("a"))
do_other_work()
result = await task
create_task schedules a coroutine to run now, in the background, and gives you a handle. You can await it later, or just let it run.
asyncio.TaskGroup (3.11+) — the modern, safer pattern
async def main():
async with asyncio.TaskGroup() as tg:
t1 = tg.create_task(fetch("a"))
t2 = tg.create_task(fetch("b"))
print(t1.result(), t2.result())
Task groups give you structured concurrency: when the with block exits, all tasks must be done. If any fail, the rest are cancelled and you get a clean ExceptionGroup. This is the recommended pattern for new code.
Async libraries you’ll actually use
| Sync library | Async equivalent |
|---|---|
requests | httpx
(sync + async in one) |
psycopg2 (PostgreSQL) | asyncpg
or psycopg 3 |
redis-py | redis-py (modern versions ship async support) |
open() | aiofiles |
subprocess | asyncio.create_subprocess_exec |
sqlite3 | aiosqlite |
boto3 (AWS) | aioboto3 |
Roughly: if you need it for I/O, there’s an async version. Use it.
Frameworks built around async
- FastAPI — async-first web framework, async-native ergonomics.
- Starlette — what FastAPI is built on.
- Django — supports async views/middleware/ORM (
asyncORM is solid in 4.2+). - aiohttp — older but still solid, both client and server.
For new API projects in 2026 I default to FastAPI; for full-stack apps I pick Django and use sync where it’s easier.
A practical example: rate-limited fetcher
A real pattern: fetch many URLs concurrently but limit how many run at once.
import asyncio
import httpx
async def fetch(client: httpx.AsyncClient, url: str, sem: asyncio.Semaphore) -> dict:
async with sem:
response = await client.get(url, timeout=10.0)
response.raise_for_status()
return response.json()
async def fetch_all(urls: list[str], concurrency: int = 10) -> list[dict]:
sem = asyncio.Semaphore(concurrency)
async with httpx.AsyncClient() as client:
tasks = [fetch(client, url, sem) for url in urls]
return await asyncio.gather(*tasks)
if __name__ == "__main__":
urls = [f"https://httpbin.org/anything?id={i}" for i in range(50)]
results = asyncio.run(fetch_all(urls, concurrency=10))
print(f"Fetched {len(results)} responses")
50 URLs, but only 10 in flight at any moment. This is the bread-and-butter pattern for any “fan out to a third-party API” job. Doing the same thing with requests would take 50× the slowest call.
Conclusion
Async/await is a tool, not a magic speed-up. It pays off enormously when your code is I/O-bound, and it’s worse than useless for CPU-bound work. The mental model that makes it click: one thread, multiple coroutines, an event loop juggling them while they wait.
Get those right and the rest is just learning the standard-library API and avoiding blocking calls. Get them wrong and you’ll spend your week wondering why your “concurrent” code is slower than your old synchronous code.
If you liked this, you might also enjoy 10 Modern Python Tips That Will Quietly Make You Better and Getting Started with FastAPI .
Happy awaiting!
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .