For most “real-time” features in 2026, the right answer isn’t WebSockets. It’s Server-Sent Events. The pattern that replaced AJAX polling has matured into the default for AI streaming, dashboards, notifications, and one-way push. WebSockets earn their cost when you genuinely need bidirectional, low-latency communication.
This post is the working comparison. The protocols, the patterns, the code, and a clear decision rule.
The fundamental difference
| SSE | WebSocket | |
|---|---|---|
| Direction | Server → client | Both ways |
| Protocol | HTTP/1.1, HTTP/2 | Custom, after Upgrade |
| Reconnection | Automatic | Manual |
| Backpressure | Plain TCP | Plain TCP |
| Encoding | UTF-8 text | Text or binary |
| CDN / proxy compat | Excellent | Mixed |
| Auth | Standard HTTP cookies / headers | First message, custom |
| Browser API | EventSource | WebSocket |
SSE is HTTP that keeps streaming. WebSocket is a separate full-duplex protocol that hijacks an HTTP connection.
When SSE wins
The SSE sweet spot:
- AI token streaming. Submit prompt, stream tokens. OpenAI, Anthropic, Google all do it.
- Live dashboards. Server pushes updates; client just renders.
- Notifications. “You got mail.” One-way.
- Long polling replacements. Anything where you want push without ping-pong.
- Activity feeds, social streams, stock tickers — any source of events flowing one direction.
You’ll spend zero engineering time on reconnection (browsers do it), auth (cookies/Authorization header just work), or proxy compat (it’s HTTP).
When WebSockets win
WebSockets earn their complexity when you need:
- Bidirectional during the same session. Cancellation, “stop generating,” collaborative editing.
- Sub-100ms request/response loops. The TCP overhead of opening one HTTP request per send adds up.
- Binary frames. Audio, video, custom protocols, file transfer.
- Many short messages. SSE’s text-with-headers-and-newlines is heavy for high-frequency tiny payloads.
Chat apps, multiplayer games, collaborative editors (Figma, Google Docs), live trading clients — these are WebSocket territory.
SSE in detail
The wire format is plain text with simple framing:
data: {"type":"token","content":"Hello"}
data: {"type":"token","content":" world"}
data: {"type":"end"}
Each event is one or more field: value lines, terminated by a blank line. Fields:
data:— the payload (can repeat for multiline).event:— event name (default:message).id:— set last-event-id for resume after disconnect.retry:— reconnect delay in ms.
That’s it. No framing protocol, no handshake.
Server in FastAPI
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from anthropic import AsyncAnthropic
import json
app = FastAPI()
client = AsyncAnthropic()
@app.post("/chat")
async def chat(payload: dict):
async def event_stream():
async with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": payload["message"]}],
) as stream:
async for text in stream.text_stream:
yield f"data: {json.dumps({'type': 'token', 'content': text})}\n\n"
yield f"data: {json.dumps({'type': 'end'})}\n\n"
return StreamingResponse(
event_stream(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache, no-transform",
"X-Accel-Buffering": "no", # disable nginx buffering
"Connection": "keep-alive",
},
)
Three production details every SSE endpoint needs:
Cache-Control: no-cache, no-transform— prevents middleboxes from buffering.X-Accel-Buffering: no— nginx-specific; SSE through nginx without this hangs until the response ends.- Heartbeat — emit
: keepalive\n\nevery 15–30s so idle connections aren’t dropped by load balancers.
Server in Hono on Bun
import { Hono } from "hono";
import { streamSSE } from "hono/streaming";
import Anthropic from "@anthropic-ai/sdk";
const app = new Hono();
const anthropic = new Anthropic();
app.post("/chat", (c) => {
const { message } = c.req.json();
return streamSSE(c, async (stream) => {
const resp = anthropic.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: await message }],
});
for await (const ev of resp) {
if (ev.type === "content_block_delta") {
await stream.writeSSE({
data: JSON.stringify({ type: "token", content: ev.delta.text }),
});
}
}
await stream.writeSSE({ data: JSON.stringify({ type: "end" }) });
});
});
streamSSE handles the headers, the framing, and the keepalive heartbeat. Three lines of business logic.
Server in Axum
use axum::{response::sse::{Event, KeepAlive, Sse}, routing::post, Router};
use futures::stream::{self, Stream};
use std::convert::Infallible;
use std::time::Duration;
async fn chat() -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
let stream = stream::iter(vec!["Hello", " world"])
.map(|t| Ok(Event::default().data(t)));
Sse::new(stream).keep_alive(
KeepAlive::new()
.interval(Duration::from_secs(15))
.text("keepalive"),
)
}
Axum’s first-class Sse type handles framing and keep-alive automatically.
Client side — EventSource
const es = new EventSource("/chat?prompt=hello");
es.onmessage = (e) => {
const { type, content } = JSON.parse(e.data);
if (type === "token") appendToken(content);
if (type === "end") es.close();
};
es.onerror = (err) => {
// EventSource auto-reconnects; only handle final-close cases here.
};
That’s it. No reconnection logic — the browser does it. No protocol negotiation. Auth via cookies, automatic.
For POST-style requests (which EventSource doesn’t support), use fetch with a streaming response:
const resp = await fetch("/chat", {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ message: "hi" }),
});
const reader = resp.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split("\n");
buffer = lines.pop()!; // keep incomplete line
for (const line of lines) {
if (line.startsWith("data: ")) {
const event = JSON.parse(line.slice(6));
handle(event);
}
}
}
A bit more code than EventSource, but POST + body works.
WebSockets when you need them
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
app = FastAPI()
@app.websocket("/chat")
async def chat(ws: WebSocket):
await ws.accept()
cancel = asyncio.Event()
try:
while True:
msg = await ws.receive_json()
if msg["type"] == "cancel":
cancel.set()
continue
if msg["type"] == "prompt":
cancel.clear()
async with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": msg["content"]}],
) as stream:
async for text in stream.text_stream:
if cancel.is_set():
break
await ws.send_json({"type": "token", "content": text})
await ws.send_json({"type": "end"})
except WebSocketDisconnect:
pass
The same shape but bidirectional: the client sends {"type": "cancel"} mid-stream and the server stops generating on the same connection. SSE can’t do this on the same connection (you’d need a sidecar POST /cancel).
Scaling considerations
SSE
- One TCP connection per client. With HTTP/2 multiplexed, this is cheaper than it sounds.
- Connection limits: 10k–50k concurrent SSE connections per process is reasonable on Linux (raise
ulimit -n). - Idempotent reconnect. Use
Last-Event-IDheaders + cursor in your stream to resume cleanly. - CDN compat. Most CDNs handle SSE; Cloudflare needs you to disable buffering on the route.
WebSockets
- Sticky load balancing required. A WebSocket connection is bound to one process for its lifetime.
- Pub/sub backplane for fan-out. Redis pub/sub, NATS, or a managed service.
- More moving parts. Reconnection logic, heartbeats, message ordering, queue draining on disconnect — all yours to implement.
For a fleet of millions of concurrent users on WebSockets, you’re effectively building a small messaging platform. SSE skips most of this.
CDN and proxy reality
This is the silent reason SSE wins for many teams.
- Cloudflare, Fastly, AWS CloudFront — all support SSE out of the box. WebSockets have specific configuration knobs and tier requirements.
- Corporate proxies and firewalls sometimes strip the
Upgrade: websocketheader. SSE flows through anything that handles HTTP. - API gateways (Kong, Tyk, AWS API Gateway) handle SSE first-class; WebSocket support is uneven.
If your customers run behind random corporate networks, SSE’s reliability advantage is real and quantifiable.
Authentication
SSE
- Standard cookies —
EventSourcesends them automatically. Authorizationheader — needs awithCredentials: trueand same-origin or CORS allow.- Single-use tokens in query string — works but logs leak.
WebSockets
- Browser
WebSocketAPI doesn’t let you set headers in the constructor. Workarounds:- Auth on the first message after connect (custom protocol).
- Cookie auth (works with same-origin).
- Query-string token (works but logs leak).
- Subprotocol header (
Sec-WebSocket-Protocol: bearer.<token>).
WebSocket auth is fiddlier. Plan it before you start.
Decision rule
Use SSE when:
- Direction is server → client.
- The user submits a request, gets a stream of responses.
- You don’t need cancellation on the same connection (a sibling endpoint is fine).
- You want minimum operational cost.
Use WebSockets when:
- Both sides talk during the session.
- You need binary frames.
- Latency budgets are <100ms per send.
- You’re building chat, collaborative editors, multiplayer.
For an LLM chatbot specifically:
- Single-turn streaming? SSE.
- Multi-turn with tool-call approvals, mid-stream steering, agent control? WebSockets.
- Most apps land on SSE + a small
POST /cancelcompanion endpoint that posts a cancel signal, which the streaming endpoint observes via Redis pub/sub. Best of both.
When to consider neither
fetchstreaming (noEventSource) when you need POST+body and the client doesn’t need auto-reconnect.- gRPC server streaming between services. Don’t use it browser-side; use it for backend-to-backend.
- Polling when updates are infrequent (every 30s+). Sometimes simpler is better.
What’s underrated
- SSE keep-alive frames are the difference between “works locally, hangs on AWS NLB” and “works.” Always emit them.
- EventSource’s
lastEventIdlets you implement resumable streams trivially. Almost nobody uses it; everybody should. - HTTP/2 SSE removes the old “browsers cap connections per origin at 6” complaint. Use it.
Read this next
- Anthropic Claude API + Tool Use Guide — the canonical SSE LLM client pattern.
- Build a RAG App with pgvector + FastAPI — apply SSE in a real RAG service.
- FastAPI + Pydantic v2 + SQLAlchemy 2.0 Production Patterns — the surrounding service shape.
- Distributed Systems Fundamentals — backpressure and scaling background.
If you want a working FastAPI/Hono/Axum SSE+cancel companion server with auth and reconnect, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .