Should I use SSE or WebSockets for ChatGPT-style streaming?

SSE is the right default for token streaming where the user submits a prompt and the server streams the response. OpenAI, Anthropic, and most LLM APIs use SSE for this exact pattern. Only switch to WebSockets if you need bidirectional control during the stream — cancellation, tool-call approval, or interactive agent steering.

Does SSE work with HTTP/2 and HTTP/3?

Yes. SSE multiplexes cleanly over HTTP/2 and HTTP/3, removing the historical six-connection-per-domain limit. Modern infrastructure makes SSE more attractive than it was in 2018.

Can SSE go through CDNs and load balancers?

Better than WebSockets. SSE is plain HTTP/1.1 (or HTTP/2), so any CDN, proxy, or load balancer that handles HTTP handles SSE. WebSockets require Upgrade-header support, which some legacy proxies and corporate firewalls strip.

SSE vs WebSockets in 2026 — Streaming AI Apps, Chat, and Real-Time UIs

For most “real-time” features in 2026, the right answer isn’t WebSockets. It’s Server-Sent Events. The pattern that replaced AJAX polling has matured into the default for AI streaming, dashboards, notifications, and one-way push. WebSockets earn their cost when you genuinely need bidirectional, low-latency communication.

This post is the working comparison. The protocols, the patterns, the code, and a clear decision rule.

The fundamental difference

	SSE	WebSocket
Direction	Server → client	Both ways
Protocol	HTTP/1.1, HTTP/2	Custom, after `Upgrade`
Reconnection	Automatic	Manual
Backpressure	Plain TCP	Plain TCP
Encoding	UTF-8 text	Text or binary
CDN / proxy compat	Excellent	Mixed
Auth	Standard HTTP cookies / headers	First message, custom
Browser API	`EventSource`	`WebSocket`

SSE is HTTP that keeps streaming. WebSocket is a separate full-duplex protocol that hijacks an HTTP connection.

When SSE wins

The SSE sweet spot:

AI token streaming. Submit prompt, stream tokens. OpenAI, Anthropic, Google all do it.
Live dashboards. Server pushes updates; client just renders.
Notifications. “You got mail.” One-way.
Long polling replacements. Anything where you want push without ping-pong.
Activity feeds, social streams, stock tickers — any source of events flowing one direction.

You’ll spend zero engineering time on reconnection (browsers do it), auth (cookies/Authorization header just work), or proxy compat (it’s HTTP).

When WebSockets win

WebSockets earn their complexity when you need:

Bidirectional during the same session. Cancellation, “stop generating,” collaborative editing.
Sub-100ms request/response loops. The TCP overhead of opening one HTTP request per send adds up.
Binary frames. Audio, video, custom protocols, file transfer.
Many short messages. SSE’s text-with-headers-and-newlines is heavy for high-frequency tiny payloads.

Chat apps, multiplayer games, collaborative editors (Figma, Google Docs), live trading clients — these are WebSocket territory.

SSE in detail

The wire format is plain text with simple framing:

data: {"type":"token","content":"Hello"}

data: {"type":"token","content":" world"}

data: {"type":"end"}

Each event is one or more field: value lines, terminated by a blank line. Fields:

data: — the payload (can repeat for multiline).
event: — event name (default: message).
id: — set last-event-id for resume after disconnect.
retry: — reconnect delay in ms.

That’s it. No framing protocol, no handshake.

Server in FastAPI

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from anthropic import AsyncAnthropic
import json

app = FastAPI()
client = AsyncAnthropic()


@app.post("/chat")
async def chat(payload: dict):
    async def event_stream():
        async with client.messages.stream(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            messages=[{"role": "user", "content": payload["message"]}],
        ) as stream:
            async for text in stream.text_stream:
                yield f"data: {json.dumps({'type': 'token', 'content': text})}\n\n"
        yield f"data: {json.dumps({'type': 'end'})}\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache, no-transform",
            "X-Accel-Buffering": "no",     # disable nginx buffering
            "Connection": "keep-alive",
        },
    )

Three production details every SSE endpoint needs:

Cache-Control: no-cache, no-transform — prevents middleboxes from buffering.
X-Accel-Buffering: no — nginx-specific; SSE through nginx without this hangs until the response ends.
Heartbeat — emit : keepalive\n\n every 15–30s so idle connections aren’t dropped by load balancers.

Server in Hono on Bun

import { Hono } from "hono";
import { streamSSE } from "hono/streaming";
import Anthropic from "@anthropic-ai/sdk";

const app = new Hono();
const anthropic = new Anthropic();

app.post("/chat", (c) => {
  const { message } = c.req.json();
  return streamSSE(c, async (stream) => {
    const resp = anthropic.messages.stream({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      messages: [{ role: "user", content: await message }],
    });
    for await (const ev of resp) {
      if (ev.type === "content_block_delta") {
        await stream.writeSSE({
          data: JSON.stringify({ type: "token", content: ev.delta.text }),
        });
      }
    }
    await stream.writeSSE({ data: JSON.stringify({ type: "end" }) });
  });
});

streamSSE handles the headers, the framing, and the keepalive heartbeat. Three lines of business logic.

Server in Axum

use axum::{response::sse::{Event, KeepAlive, Sse}, routing::post, Router};
use futures::stream::{self, Stream};
use std::convert::Infallible;
use std::time::Duration;

async fn chat() -> Sse<impl Stream<Item = Result<Event, Infallible>>> {
    let stream = stream::iter(vec!["Hello", " world"])
        .map(|t| Ok(Event::default().data(t)));

    Sse::new(stream).keep_alive(
        KeepAlive::new()
            .interval(Duration::from_secs(15))
            .text("keepalive"),
    )
}

Axum’s first-class Sse type handles framing and keep-alive automatically.

Client side — `EventSource`

const es = new EventSource("/chat?prompt=hello");
es.onmessage = (e) => {
  const { type, content } = JSON.parse(e.data);
  if (type === "token") appendToken(content);
  if (type === "end") es.close();
};
es.onerror = (err) => {
  // EventSource auto-reconnects; only handle final-close cases here.
};

That’s it. No reconnection logic — the browser does it. No protocol negotiation. Auth via cookies, automatic.

For POST-style requests (which EventSource doesn’t support), use fetch with a streaming response:

const resp = await fetch("/chat", {
  method: "POST",
  headers: { "content-type": "application/json" },
  body: JSON.stringify({ message: "hi" }),
});
const reader = resp.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop()!;          // keep incomplete line
  for (const line of lines) {
    if (line.startsWith("data: ")) {
      const event = JSON.parse(line.slice(6));
      handle(event);
    }
  }
}

A bit more code than EventSource, but POST + body works.

WebSockets when you need them

from fastapi import FastAPI, WebSocket, WebSocketDisconnect

app = FastAPI()


@app.websocket("/chat")
async def chat(ws: WebSocket):
    await ws.accept()
    cancel = asyncio.Event()

    try:
        while True:
            msg = await ws.receive_json()

            if msg["type"] == "cancel":
                cancel.set()
                continue

            if msg["type"] == "prompt":
                cancel.clear()
                async with client.messages.stream(
                    model="claude-sonnet-4-6",
                    max_tokens=1024,
                    messages=[{"role": "user", "content": msg["content"]}],
                ) as stream:
                    async for text in stream.text_stream:
                        if cancel.is_set():
                            break
                        await ws.send_json({"type": "token", "content": text})
                await ws.send_json({"type": "end"})
    except WebSocketDisconnect:
        pass

The same shape but bidirectional: the client sends {"type": "cancel"} mid-stream and the server stops generating on the same connection. SSE can’t do this on the same connection (you’d need a sidecar POST /cancel).

Scaling considerations

SSE

One TCP connection per client. With HTTP/2 multiplexed, this is cheaper than it sounds.
Connection limits: 10k–50k concurrent SSE connections per process is reasonable on Linux (raise ulimit -n).
Idempotent reconnect. Use Last-Event-ID headers + cursor in your stream to resume cleanly.
CDN compat. Most CDNs handle SSE; Cloudflare needs you to disable buffering on the route.

WebSockets

Sticky load balancing required. A WebSocket connection is bound to one process for its lifetime.
Pub/sub backplane for fan-out. Redis pub/sub, NATS, or a managed service.
More moving parts. Reconnection logic, heartbeats, message ordering, queue draining on disconnect — all yours to implement.

For a fleet of millions of concurrent users on WebSockets, you’re effectively building a small messaging platform. SSE skips most of this.

CDN and proxy reality

This is the silent reason SSE wins for many teams.

Cloudflare, Fastly, AWS CloudFront — all support SSE out of the box. WebSockets have specific configuration knobs and tier requirements.
Corporate proxies and firewalls sometimes strip the Upgrade: websocket header. SSE flows through anything that handles HTTP.
API gateways (Kong, Tyk, AWS API Gateway) handle SSE first-class; WebSocket support is uneven.

If your customers run behind random corporate networks, SSE’s reliability advantage is real and quantifiable.

Authentication

SSE

Standard cookies — EventSource sends them automatically.
Authorization header — needs a withCredentials: true and same-origin or CORS allow.
Single-use tokens in query string — works but logs leak.

WebSockets

Browser WebSocket API doesn’t let you set headers in the constructor. Workarounds:
- Auth on the first message after connect (custom protocol).
- Cookie auth (works with same-origin).
- Query-string token (works but logs leak).
- Subprotocol header (Sec-WebSocket-Protocol: bearer.<token>).

WebSocket auth is fiddlier. Plan it before you start.

Decision rule

Use SSE when:

Direction is server → client.
The user submits a request, gets a stream of responses.
You don’t need cancellation on the same connection (a sibling endpoint is fine).
You want minimum operational cost.

Use WebSockets when:

Both sides talk during the session.
You need binary frames.
Latency budgets are <100ms per send.
You’re building chat, collaborative editors, multiplayer.

For an LLM chatbot specifically:

Single-turn streaming? SSE.
Multi-turn with tool-call approvals, mid-stream steering, agent control? WebSockets.
Most apps land on SSE + a small POST /cancel companion endpoint that posts a cancel signal, which the streaming endpoint observes via Redis pub/sub. Best of both.

When to consider neither

fetch streaming (no EventSource) when you need POST+body and the client doesn’t need auto-reconnect.
gRPC server streaming between services. Don’t use it browser-side; use it for backend-to-backend.
Polling when updates are infrequent (every 30s+). Sometimes simpler is better.

What’s underrated

SSE keep-alive frames are the difference between “works locally, hangs on AWS NLB” and “works.” Always emit them.
EventSource’s lastEventId lets you implement resumable streams trivially. Almost nobody uses it; everybody should.
HTTP/2 SSE removes the old “browsers cap connections per origin at 6” complaint. Use it.

The fundamental difference#

When SSE wins#

When WebSockets win#

SSE in detail#

Server in FastAPI#

Server in Hono on Bun#

Server in Axum#

Client side — EventSource#

WebSockets when you need them#

Scaling considerations#

SSE#

WebSockets#

CDN and proxy reality#

Authentication#

SSE#

WebSockets#

Decision rule#

When to consider neither#

What’s underrated#

Read this next#

The fundamental difference

When SSE wins

When WebSockets win

SSE in detail

Server in FastAPI

Server in Hono on Bun

Server in Axum

Client side — `EventSource`

WebSockets when you need them

Scaling considerations

SSE

WebSockets

CDN and proxy reality

Authentication

SSE

WebSockets

Decision rule

When to consider neither

What’s underrated

Read this next