Load testing tells you what breaks under traffic. Without it, you find out from production. The patterns are well-known but inconsistently applied. This post is the working playbook.
When to load test
- Before launches: validate capacity for expected traffic.
- After major changes: regression check.
- Before scaling decisions: justify the spend.
- Debugging slow paths: isolate under controlled load.
- Capacity planning: find the breaking point.
Always: before you trust new infra.
k6 basics
import http from "k6/http";
import { check, sleep } from "k6";
export const options = {
stages: [
{ duration: "30s", target: 100 }, // ramp to 100 VUs
{ duration: "5m", target: 100 }, // stay
{ duration: "30s", target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ["p(95)<500", "p(99)<2000"],
http_req_failed: ["rate<0.01"],
},
};
export default function() {
const r = http.get("https://api.example.com/users");
check(r, { "200": (r) => r.status === 200 });
sleep(1);
}
k6 run script.js
Stages = traffic shape. Thresholds = pass/fail criteria. Built-in reporting.
Realistic load shapes
export const options = {
scenarios: {
steady: {
executor: "constant-arrival-rate",
rate: 1000,
duration: "10m",
timeUnit: "1s",
preAllocatedVUs: 50,
},
spike: {
executor: "ramping-arrival-rate",
startTime: "5m",
startRate: 1000,
stages: [
{ duration: "30s", target: 5000 },
{ duration: "30s", target: 1000 },
],
preAllocatedVUs: 200,
},
},
};
Steady-state + spike = realistic. Don’t ramp linearly to crash; model the actual shape.
Locust
from locust import HttpUser, task, between
class WebsiteUser(HttpUser):
wait_time = between(1, 5)
def on_start(self):
self.client.post("/login", json={"user": "test", "pass": "..."})
@task(3)
def view_posts(self):
self.client.get("/posts")
@task(1)
def create_post(self):
self.client.post("/posts", json={"title": "..."})
locust -f locustfile.py --users 1000 --spawn-rate 50
Python-native; web UI. Distributed mode for huge load.
Vegeta
echo "GET https://api.example.com/users" | vegeta attack -rate=500/s -duration=1m | vegeta report
Simple. CLI-only. Constant rate. For quick checks.
Modeling real traffic
Sample your access logs (last 1k requests, anonymized):
import http from "k6/http";
import { SharedArray } from "k6/data";
const requests = new SharedArray("reqs", () => JSON.parse(open("./real-traffic.json")));
export default function() {
const req = requests[Math.floor(Math.random() * requests.length)];
http.request(req.method, req.url, req.body, { headers: req.headers });
}
Better than synthetic. Hits real paths in real proportions.
Capacity ceiling
Run with increasing load until SLO breaches:
RPS: 1k → p99: 100ms (good)
RPS: 5k → p99: 200ms (good)
RPS: 10k → p99: 500ms (still good)
RPS: 15k → p99: 1500ms (breach!)
RPS: 20k → errors climb
Capacity ceiling: ~12k RPS per replica. Scale before hitting that.
CI integration
- run: k6 run --env STAGE=staging tests/load.js
- run: |
if [ $? -ne 0 ]; then
echo "Load test failed; perf regression"
exit 1
fi
Run on each PR (against staging). Catches perf regressions before merge.
Distributed load
For huge load (>50k RPS), single client machine isn’t enough:
- k6 Cloud: managed; expensive.
- k6 distributed: roll-your-own.
- Locust master/workers: built-in.
- Grafana Cloud k6: hosted.
For sustained huge load testing: invest in distributed; otherwise single beefy machine fine.
Don’t load-test prod (usually)
Hammer staging that mirrors prod. Or:
- Shadow traffic: copy prod traffic to a parallel system.
- Canary at full load: deploy new version, route % of real traffic.
Direct prod load tests risk customers. Reserve for “capacity proofs” with explicit comms.
What to measure
- Latency: p50, p90, p99, max.
- Error rate.
- Throughput: RPS achieved.
- Resource usage: CPU, memory, DB connections, GC pauses.
- Downstream impact: did your test break partner services?
Visualize over time. Spike vs sustained vs ramp differ.
Common load test mistakes
1. No think time
sleep(0) between requests = unrealistic; bots, not users. Add sleep(rand).
2. Same input every request
Cache hits 100%; metrics lie. Vary inputs.
3. Cold targets
First minute: warming caches. Skip first 60s in metrics.
4. Ignoring the test machine
Load generator at 100% CPU; reported numbers wrong. Verify tester has headroom.
5. No error analysis
Throughput is great; 5% errors. Pass? No — find what’s failing.
Production traffic shapes to test
- Steady baseline: typical load.
- Diurnal: peak hours.
- Spike: marketing email blast.
- Cold start: zero → peak in 10s (after deploy).
- Failure: simulate dependency down (chaos engineering).
Each reveals different bottlenecks.
Tools beyond load gen
- Vegeta for simple curl-style.
- Apache Bench (ab): ancient; quick.
- wrk / wrk2: low-overhead C tool.
- Bombardier: Go-based.
- Artillery: JS / Node.
- Tsung: Erlang; distributed.
For most teams in 2026: k6 default; Locust if Python-native preferred.
What I’d ship today
For new services:
- k6 as the standard load tool.
- CI integration with thresholds.
- Real traffic samples for realistic shapes.
- Capacity ceiling tests quarterly.
- Pre-launch tests for all new features.
- Dashboard correlating load test results with infra metrics.
Read this next
- Kubernetes Resource Limits 2026
- Observability Stack 2026
- SLOs and Error Budgets 2026
- Cloud Cost Optimization 2026
If you want my k6 templates (steady, spike, baseline), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .