A user clicks “Stop generating” and you want the LLM to stop. Implemented naively, the model keeps generating; you keep paying. This post is the working pattern.

Cancellation flow

Client clicks stop
AbortController.abort() → fetch sends RST
Server's request handler detects context cancel
Cancels the LLM streaming call
Anthropic/OpenAI stops generation
Server stops billing

Each link must propagate the cancel.

Server-side (FastAPI)

from fastapi import Request
from fastapi.responses import StreamingResponse

@app.post("/chat")
async def chat(req: Request, payload: ChatIn):
    async def generate():
        async with anthropic.AsyncAnthropic().messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            messages=[{"role": "user", "content": payload.message}],
        ) as stream:
            async for text in stream.text_stream:
                if await req.is_disconnected():
                    break               # client cancelled; close stream
                yield f"data: {json.dumps({'text': text})}\n\n"
            yield "data: [DONE]\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")

is_disconnected() checks if the client closed the connection. The async with exits cleanly; the SDK tells Anthropic to stop. No more tokens generated.

Client-side

const ac = new AbortController();
document.getElementById("stop").addEventListener("click", () => ac.abort());

const resp = await fetch("/chat", {
  method: "POST",
  body: JSON.stringify({ message }),
  signal: ac.signal,         // propagates abort to fetch
});

const reader = resp.body!.getReader();
try {
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    process(value);
  }
} catch (e) {
  if (ac.signal.aborted) {
    // user cancelled
  } else { throw e; }
}

AbortController.abort() triggers the cleanup chain.

With WebSockets

For interactive agents where users send mid-stream signals:

ws.send(JSON.stringify({ type: "cancel", message_id: currentId }));
async def handle_ws(ws):
    cancel = asyncio.Event()
    async for msg in ws.iter_json():
        if msg["type"] == "cancel":
            cancel.set()
        elif msg["type"] == "prompt":
            cancel.clear()
            await stream_response(msg["content"], cancel, ws)


async def stream_response(content, cancel, ws):
    async with anthropic.messages.stream(...) as stream:
        async for text in stream.text_stream:
            if cancel.is_set():
                break
            await ws.send_json({"type": "token", "text": text})

For SSE vs WebSockets choice see SSE vs WebSockets in 2026 .

Partial response handling

Cancelled mid-response → you have partial output. Decisions:

  • Show what generated: useful for chat (user sees what got produced).
  • Discard: for structured outputs (a half-JSON is useless).
  • Retry-friendly idempotency: store partial; on retry resume? (Mostly impractical with current APIs.)

For chat, store-and-show is the default.

Cost savings

A user that frequently stops mid-response saves real money:

  • Average response: 500 output tokens at $15/MTok = $0.0075.
  • Cancelled at 100 tokens: $0.0015.
  • Save: $0.006 per cancel.

At 1M cancels/month: $6k saved. Not nothing.

Common mistakes

1. Not propagating cancel to provider

Backend stops sending bytes to client but keeps billing because it didn’t tell Anthropic to stop. Use the SDK’s streaming context manager (async with stream:).

2. Browser fetch without AbortController

Tab closes; request keeps running on the server. Always signal: ac.signal.

3. No client UI for cancel

Long generations with no stop button → user closes tab → server keeps going. UX bug AND cost bug.

4. Forgetting to break after cancel detection

The loop keeps yielding; bytes accumulate; cancel doesn’t actually stop.

5. SSE without keepalive

Some proxies time out idle connections. Send : keepalive\n\n every 15s during slow generation.

Read this next

If you want a working FastAPI + browser stream-with-cancel template, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .