For a public API, rate limits are part of the product. Done well, they’re invisible. Done badly, they’re support tickets. This post is the design playbook.
Tier structure
Free: 100 req/min, 1k req/day
Pro: 1000 req/min, 100k req/day
Team: 5000 req/min, 1M req/day
Enterprise: custom
Communicate clearly in the docs. Make upgrade obvious.
Quota types
Most APIs need three:
- Per-second burst limit (token bucket). Anti-abuse.
- Per-minute / per-hour limit. Smooth rate.
- Per-day / per-month quota. Billing.
"Up to 100 req/sec burst, 5000 req/min sustained, 1M req/day."
Each enforced separately. The tightest binds.
Headers
Stripe / GitHub / Twitter all converge on the same headers:
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4823
X-RateLimit-Reset: 1717920660 # unix ts when window resets
On 429:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717920672
Content-Type: application/json
{
"error": "rate_limited",
"message": "Too many requests. Try again in 12s.",
"documentation_url": "https://docs.example.com/rate-limits"
}
Retry-After is the killer header. SDKs auto-back-off when present.
Distinct vs combined limits
Per-API-key limit: 5000 req/min
Per-IP limit: 10000 req/min (anti-abuse, even unauthenticated)
Per-endpoint limit: 100 req/min on /search (expensive)
Layer them. The tightest wins.
What to count
Naive: every request. Better: weight by cost.
Read endpoint: 1 unit per request
Write endpoint: 5 units
Heavy search: 10 units
Bulk export: 100 units
Customer sees one quota; under the hood, expensive operations consume more. GitHub does this; AWS does this.
The implementation
Token-bucket per key in Redis. See Design a Rate Limiter and Distributed Rate Limiter .
async def check_rate_limit(api_key: str, weight: int = 1):
bucket_key = f"rl:{api_key}"
allowed, remaining, retry_after = await redis_token_bucket(
bucket_key,
capacity=tier_for(api_key).burst,
refill_per_sec=tier_for(api_key).rate,
cost=weight,
)
return RateLimit(allowed=allowed, remaining=remaining, retry_after=retry_after)
Plus per-day / per-month counters in Postgres for billing.
Daily quota patterns
CREATE TABLE api_usage (
api_key TEXT,
day DATE,
count BIGINT,
PRIMARY KEY (api_key, day)
);
INSERT INTO api_usage (api_key, day, count)
VALUES ($1, current_date, 1)
ON CONFLICT (api_key, day) DO UPDATE SET count = api_usage.count + 1
RETURNING count;
Increment + return for atomicity. Compare to tier limit.
Customer experience
Document clearly:
- Quota numbers.
- How to upgrade.
- How to handle 429 (with code samples in popular languages).
- Idempotency-Key support so retries don’t double-charge (Idempotency post ).
Per-endpoint specials
/auth/* : 5 req/min (anti-brute-force)
/search : 30 req/min (expensive)
/upload : 10 req/min (resource-heavy)
Stricter than the global. Document separately.
Common mistakes
1. No Retry-After
Customer SDKs don’t know how long to wait. They guess; usually wrong.
2. Inconsistent headers
Some endpoints return X-RateLimit-* ; some don’t. Customers can’t reason about quota.
3. Hard cliff with no warning
Customer hits limit; suddenly all requests fail. Better: degrade gradually (slower responses near limit).
4. No upgrade path
Customer hit the limit; doesn’t know how to get more. Show CTA.
5. Per-IP limit on legitimate corporate users
A 10k-user corporation behind one NAT all hit your IP limit together. Use API-key as primary; IP as secondary.
Read this next
- Design a Rate Limiter
- Design a Distributed Rate Limiter at Scale
- Designing REST APIs That Don’t Suck
- Idempotency, Retries, and Exactly-Once Illusions
If you want a Hono / FastAPI rate-limit middleware with Redis token bucket + headers, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .