Load testing tells you what breaks under traffic. Without it, you find out from production. The patterns are well-known but inconsistently applied. This post is the working playbook.

When to load test

  • Before launches: validate capacity for expected traffic.
  • After major changes: regression check.
  • Before scaling decisions: justify the spend.
  • Debugging slow paths: isolate under controlled load.
  • Capacity planning: find the breaking point.

Always: before you trust new infra.

k6 basics

import http from "k6/http";
import { check, sleep } from "k6";

export const options = {
    stages: [
        { duration: "30s", target: 100 },   // ramp to 100 VUs
        { duration: "5m", target: 100 },    // stay
        { duration: "30s", target: 0 },     // ramp down
    ],
    thresholds: {
        http_req_duration: ["p(95)<500", "p(99)<2000"],
        http_req_failed: ["rate<0.01"],
    },
};

export default function() {
    const r = http.get("https://api.example.com/users");
    check(r, { "200": (r) => r.status === 200 });
    sleep(1);
}
k6 run script.js

Stages = traffic shape. Thresholds = pass/fail criteria. Built-in reporting.

Realistic load shapes

export const options = {
    scenarios: {
        steady: {
            executor: "constant-arrival-rate",
            rate: 1000,
            duration: "10m",
            timeUnit: "1s",
            preAllocatedVUs: 50,
        },
        spike: {
            executor: "ramping-arrival-rate",
            startTime: "5m",
            startRate: 1000,
            stages: [
                { duration: "30s", target: 5000 },
                { duration: "30s", target: 1000 },
            ],
            preAllocatedVUs: 200,
        },
    },
};

Steady-state + spike = realistic. Don’t ramp linearly to crash; model the actual shape.

Locust

from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)
    
    def on_start(self):
        self.client.post("/login", json={"user": "test", "pass": "..."})
    
    @task(3)
    def view_posts(self):
        self.client.get("/posts")
    
    @task(1)
    def create_post(self):
        self.client.post("/posts", json={"title": "..."})
locust -f locustfile.py --users 1000 --spawn-rate 50

Python-native; web UI. Distributed mode for huge load.

Vegeta

echo "GET https://api.example.com/users" | vegeta attack -rate=500/s -duration=1m | vegeta report

Simple. CLI-only. Constant rate. For quick checks.

Modeling real traffic

Sample your access logs (last 1k requests, anonymized):

import http from "k6/http";
import { SharedArray } from "k6/data";

const requests = new SharedArray("reqs", () => JSON.parse(open("./real-traffic.json")));

export default function() {
    const req = requests[Math.floor(Math.random() * requests.length)];
    http.request(req.method, req.url, req.body, { headers: req.headers });
}

Better than synthetic. Hits real paths in real proportions.

Capacity ceiling

Run with increasing load until SLO breaches:

RPS: 1k  → p99: 100ms (good)
RPS: 5k  → p99: 200ms (good)
RPS: 10k → p99: 500ms (still good)
RPS: 15k → p99: 1500ms (breach!)
RPS: 20k → errors climb

Capacity ceiling: ~12k RPS per replica. Scale before hitting that.

CI integration

- run: k6 run --env STAGE=staging tests/load.js
- run: |
    if [ $? -ne 0 ]; then
      echo "Load test failed; perf regression"
      exit 1
    fi

Run on each PR (against staging). Catches perf regressions before merge.

Distributed load

For huge load (>50k RPS), single client machine isn’t enough:

  • k6 Cloud: managed; expensive.
  • k6 distributed: roll-your-own.
  • Locust master/workers: built-in.
  • Grafana Cloud k6: hosted.

For sustained huge load testing: invest in distributed; otherwise single beefy machine fine.

Don’t load-test prod (usually)

Hammer staging that mirrors prod. Or:

  • Shadow traffic: copy prod traffic to a parallel system.
  • Canary at full load: deploy new version, route % of real traffic.

Direct prod load tests risk customers. Reserve for “capacity proofs” with explicit comms.

What to measure

  • Latency: p50, p90, p99, max.
  • Error rate.
  • Throughput: RPS achieved.
  • Resource usage: CPU, memory, DB connections, GC pauses.
  • Downstream impact: did your test break partner services?

Visualize over time. Spike vs sustained vs ramp differ.

Common load test mistakes

1. No think time

sleep(0) between requests = unrealistic; bots, not users. Add sleep(rand).

2. Same input every request

Cache hits 100%; metrics lie. Vary inputs.

3. Cold targets

First minute: warming caches. Skip first 60s in metrics.

4. Ignoring the test machine

Load generator at 100% CPU; reported numbers wrong. Verify tester has headroom.

5. No error analysis

Throughput is great; 5% errors. Pass? No — find what’s failing.

Production traffic shapes to test

  • Steady baseline: typical load.
  • Diurnal: peak hours.
  • Spike: marketing email blast.
  • Cold start: zero → peak in 10s (after deploy).
  • Failure: simulate dependency down (chaos engineering).

Each reveals different bottlenecks.

Tools beyond load gen

  • Vegeta for simple curl-style.
  • Apache Bench (ab): ancient; quick.
  • wrk / wrk2: low-overhead C tool.
  • Bombardier: Go-based.
  • Artillery: JS / Node.
  • Tsung: Erlang; distributed.

For most teams in 2026: k6 default; Locust if Python-native preferred.

What I’d ship today

For new services:

  • k6 as the standard load tool.
  • CI integration with thresholds.
  • Real traffic samples for realistic shapes.
  • Capacity ceiling tests quarterly.
  • Pre-launch tests for all new features.
  • Dashboard correlating load test results with infra metrics.

Read this next

If you want my k6 templates (steady, spike, baseline), it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .