For most “real-time” features in 2026, the right answer isn’t WebSockets. It’s Server-Sent Events. The pattern that replaced AJAX polling has matured into the default for AI streaming, dashboards, notifications, and one-way push. WebSockets earn their cost when you genuinely need bidirectional, low-latency communication.

This post is the working comparison. The protocols, the patterns, the code, and a clear decision rule.

The fundamental difference

SSEWebSocket
DirectionServer → clientBoth ways
ProtocolHTTP/1.1, HTTP/2Custom, after Upgrade
ReconnectionAutomaticManual
BackpressurePlain TCPPlain TCP
EncodingUTF-8 textText or binary
CDN / proxy compatExcellentMixed
AuthStandard HTTP cookies / headersFirst message, custom
Browser APIEventSourceWebSocket

SSE is HTTP that keeps streaming. WebSocket is a separate full-duplex protocol that hijacks an HTTP connection.

When SSE wins

The SSE sweet spot:

  • AI token streaming. Submit prompt, stream tokens. OpenAI, Anthropic, Google all do it.
  • Live dashboards. Server pushes updates; client just renders.
  • Notifications. “You got mail.” One-way.
  • Long polling replacements. Anything where you want push without ping-pong.
  • Activity feeds, social streams, stock tickers — any source of events flowing one direction.

You’ll spend zero engineering time on reconnection (browsers do it), auth (cookies/Authorization header just work), or proxy compat (it’s HTTP).

When WebSockets win

WebSockets earn their complexity when you need:

  • Bidirectional during the same session. Cancellation, “stop generating,” collaborative editing.
  • Sub-100ms request/response loops. The TCP overhead of opening one HTTP request per send adds up.
  • Binary frames. Audio, video, custom protocols, file transfer.
  • Many short messages. SSE’s text-with-headers-and-newlines is heavy for high-frequency tiny payloads.

Chat apps, multiplayer games, collaborative editors (Figma, Google Docs), live trading clients — these are WebSocket territory.

SSE in detail

The wire format is plain text with simple framing:

data: {"type":"token","content":"Hello"}

data: {"type":"token","content":" world"}

data: {"type":"end"}

Each event is one or more field: value lines, terminated by a blank line. Fields:

  • data: — the payload (can repeat for multiline).
  • event: — event name (default: message).
  • id: — set last-event-id for resume after disconnect.
  • retry: — reconnect delay in ms.

That’s it. No framing protocol, no handshake.

Server in FastAPI

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from anthropic import AsyncAnthropic
import json

app = FastAPI()
client = AsyncAnthropic()


@app.post("/chat")
async def chat(payload: dict):
    async def event_stream():
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": payload["message"]}],
        ) as stream:
            async for text in stream.text_stream:
                yield f"data: {json.dumps({'type': 'token', 'content': text})}\n\n"
        yield f"data: {json.dumps({'type': 'end'})}\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache, no-transform",
            "X-Accel-Buffering": "no",     # disable nginx buffering
            "Connection": "keep-alive",
        },
    )

Three production details every SSE endpoint needs:

  • Cache-Control: no-cache, no-transform — prevents middleboxes from buffering.
  • X-Accel-Buffering: no — nginx-specific; SSE through nginx without this hangs until the response ends.
  • Heartbeat — emit : keepalive\n\n every 15–30s so idle connections aren’t dropped by load balancers.

Server in Hono on Bun

import { Hono } from "hono";
import { streamSSE } from "hono/streaming";
import Anthropic from "@anthropic-ai/sdk";

const app = new Hono();
const anthropic = new Anthropic();

app.post("/chat", (c) => {
  const { message } = c.req.json();
  return streamSSE(c, async (stream) => {
    const resp = anthropic.messages.stream({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      messages: [{ role: "user", content: await message }],
    });
    for await (const ev of resp) {
      if (ev.type === "content_block_delta") {
        await stream.writeSSE({
          data: JSON.stringify({ type: "token", content: ev.delta.text }),
        });
      }
    }
    await stream.writeSSE({ data: JSON.stringify({ type: "end" }) });
  });
});

streamSSE handles the headers, the framing, and the keepalive heartbeat. Three lines of business logic.

Server in Axum

use axum::{response::sse::{Event, KeepAlive, Sse}, routing::post, Router};
use futures::stream::{self, Stream};
use std::convert::Infallible;
use std::time::Duration;

async fn chat() -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
    let stream = stream::iter(vec!["Hello", " world"])
        .map(|t| Ok(Event::default().data(t)));

    Sse::new(stream).keep_alive(
        KeepAlive::new()
            .interval(Duration::from_secs(15))
            .text("keepalive"),
    )
}

Axum’s first-class Sse type handles framing and keep-alive automatically.

Client side — EventSource

const es = new EventSource("/chat?prompt=hello");
es.onmessage = (e) => {
  const { type, content } = JSON.parse(e.data);
  if (type === "token") appendToken(content);
  if (type === "end") es.close();
};
es.onerror = (err) => {
  // EventSource auto-reconnects; only handle final-close cases here.
};

That’s it. No reconnection logic — the browser does it. No protocol negotiation. Auth via cookies, automatic.

For POST-style requests (which EventSource doesn’t support), use fetch with a streaming response:

const resp = await fetch("/chat", {
  method: "POST",
  headers: { "content-type": "application/json" },
  body: JSON.stringify({ message: "hi" }),
});
const reader = resp.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop()!;          // keep incomplete line
  for (const line of lines) {
    if (line.startsWith("data: ")) {
      const event = JSON.parse(line.slice(6));
      handle(event);
    }
  }
}

A bit more code than EventSource, but POST + body works.

WebSockets when you need them

from fastapi import FastAPI, WebSocket, WebSocketDisconnect

app = FastAPI()


@app.websocket("/chat")
async def chat(ws: WebSocket):
    await ws.accept()
    cancel = asyncio.Event()

    try:
        while True:
            msg = await ws.receive_json()

            if msg["type"] == "cancel":
                cancel.set()
                continue

            if msg["type"] == "prompt":
                cancel.clear()
                async with client.messages.stream(
                    model="claude-sonnet-4-6",
                    max_tokens=1024,
                    messages=[{"role": "user", "content": msg["content"]}],
                ) as stream:
                    async for text in stream.text_stream:
                        if cancel.is_set():
                            break
                        await ws.send_json({"type": "token", "content": text})
                await ws.send_json({"type": "end"})
    except WebSocketDisconnect:
        pass

The same shape but bidirectional: the client sends {"type": "cancel"} mid-stream and the server stops generating on the same connection. SSE can’t do this on the same connection (you’d need a sidecar POST /cancel).

Scaling considerations

SSE

  • One TCP connection per client. With HTTP/2 multiplexed, this is cheaper than it sounds.
  • Connection limits: 10k–50k concurrent SSE connections per process is reasonable on Linux (raise ulimit -n).
  • Idempotent reconnect. Use Last-Event-ID headers + cursor in your stream to resume cleanly.
  • CDN compat. Most CDNs handle SSE; Cloudflare needs you to disable buffering on the route.

WebSockets

  • Sticky load balancing required. A WebSocket connection is bound to one process for its lifetime.
  • Pub/sub backplane for fan-out. Redis pub/sub, NATS, or a managed service.
  • More moving parts. Reconnection logic, heartbeats, message ordering, queue draining on disconnect — all yours to implement.

For a fleet of millions of concurrent users on WebSockets, you’re effectively building a small messaging platform. SSE skips most of this.

CDN and proxy reality

This is the silent reason SSE wins for many teams.

  • Cloudflare, Fastly, AWS CloudFront — all support SSE out of the box. WebSockets have specific configuration knobs and tier requirements.
  • Corporate proxies and firewalls sometimes strip the Upgrade: websocket header. SSE flows through anything that handles HTTP.
  • API gateways (Kong, Tyk, AWS API Gateway) handle SSE first-class; WebSocket support is uneven.

If your customers run behind random corporate networks, SSE’s reliability advantage is real and quantifiable.

Authentication

SSE

  • Standard cookies — EventSource sends them automatically.
  • Authorization header — needs a withCredentials: true and same-origin or CORS allow.
  • Single-use tokens in query string — works but logs leak.

WebSockets

  • Browser WebSocket API doesn’t let you set headers in the constructor. Workarounds:
    • Auth on the first message after connect (custom protocol).
    • Cookie auth (works with same-origin).
    • Query-string token (works but logs leak).
    • Subprotocol header (Sec-WebSocket-Protocol: bearer.<token>).

WebSocket auth is fiddlier. Plan it before you start.

Decision rule

Use SSE when:

  • Direction is server → client.
  • The user submits a request, gets a stream of responses.
  • You don’t need cancellation on the same connection (a sibling endpoint is fine).
  • You want minimum operational cost.

Use WebSockets when:

  • Both sides talk during the session.
  • You need binary frames.
  • Latency budgets are <100ms per send.
  • You’re building chat, collaborative editors, multiplayer.

For an LLM chatbot specifically:

  • Single-turn streaming? SSE.
  • Multi-turn with tool-call approvals, mid-stream steering, agent control? WebSockets.
  • Most apps land on SSE + a small POST /cancel companion endpoint that posts a cancel signal, which the streaming endpoint observes via Redis pub/sub. Best of both.

When to consider neither

  • fetch streaming (no EventSource) when you need POST+body and the client doesn’t need auto-reconnect.
  • gRPC server streaming between services. Don’t use it browser-side; use it for backend-to-backend.
  • Polling when updates are infrequent (every 30s+). Sometimes simpler is better.

What’s underrated

  • SSE keep-alive frames are the difference between “works locally, hangs on AWS NLB” and “works.” Always emit them.
  • EventSource’s lastEventId lets you implement resumable streams trivially. Almost nobody uses it; everybody should.
  • HTTP/2 SSE removes the old “browsers cap connections per origin at 6” complaint. Use it.

Read this next

If you want a working FastAPI/Hono/Axum SSE+cancel companion server with auth and reconnect, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .