For a public API, rate limits are part of the product. Done well, they’re invisible. Done badly, they’re support tickets. This post is the design playbook.

Tier structure

Free:    100 req/min,    1k req/day
Pro:     1000 req/min,   100k req/day
Team:    5000 req/min,   1M req/day
Enterprise: custom

Communicate clearly in the docs. Make upgrade obvious.

Quota types

Most APIs need three:

  1. Per-second burst limit (token bucket). Anti-abuse.
  2. Per-minute / per-hour limit. Smooth rate.
  3. Per-day / per-month quota. Billing.
"Up to 100 req/sec burst, 5000 req/min sustained, 1M req/day."

Each enforced separately. The tightest binds.

Headers

Stripe / GitHub / Twitter all converge on the same headers:

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4823
X-RateLimit-Reset: 1717920660       # unix ts when window resets

On 429:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717920672
Content-Type: application/json

{
  "error": "rate_limited",
  "message": "Too many requests. Try again in 12s.",
  "documentation_url": "https://docs.example.com/rate-limits"
}

Retry-After is the killer header. SDKs auto-back-off when present.

Distinct vs combined limits

Per-API-key limit: 5000 req/min
Per-IP limit:      10000 req/min   (anti-abuse, even unauthenticated)
Per-endpoint limit: 100 req/min on /search (expensive)

Layer them. The tightest wins.

What to count

Naive: every request. Better: weight by cost.

Read endpoint:      1 unit per request
Write endpoint:     5 units
Heavy search:       10 units
Bulk export:        100 units

Customer sees one quota; under the hood, expensive operations consume more. GitHub does this; AWS does this.

The implementation

Token-bucket per key in Redis. See Design a Rate Limiter and Distributed Rate Limiter .

async def check_rate_limit(api_key: str, weight: int = 1):
    bucket_key = f"rl:{api_key}"
    allowed, remaining, retry_after = await redis_token_bucket(
        bucket_key,
        capacity=tier_for(api_key).burst,
        refill_per_sec=tier_for(api_key).rate,
        cost=weight,
    )
    return RateLimit(allowed=allowed, remaining=remaining, retry_after=retry_after)

Plus per-day / per-month counters in Postgres for billing.

Daily quota patterns

CREATE TABLE api_usage (
    api_key TEXT,
    day DATE,
    count BIGINT,
    PRIMARY KEY (api_key, day)
);

INSERT INTO api_usage (api_key, day, count)
VALUES ($1, current_date, 1)
ON CONFLICT (api_key, day) DO UPDATE SET count = api_usage.count + 1
RETURNING count;

Increment + return for atomicity. Compare to tier limit.

Customer experience

Document clearly:

  • Quota numbers.
  • How to upgrade.
  • How to handle 429 (with code samples in popular languages).
  • Idempotency-Key support so retries don’t double-charge (Idempotency post ).

Per-endpoint specials

/auth/*    : 5 req/min (anti-brute-force)
/search    : 30 req/min (expensive)
/upload    : 10 req/min (resource-heavy)

Stricter than the global. Document separately.

Common mistakes

1. No Retry-After

Customer SDKs don’t know how long to wait. They guess; usually wrong.

2. Inconsistent headers

Some endpoints return X-RateLimit-* ; some don’t. Customers can’t reason about quota.

3. Hard cliff with no warning

Customer hits limit; suddenly all requests fail. Better: degrade gradually (slower responses near limit).

4. No upgrade path

Customer hit the limit; doesn’t know how to get more. Show CTA.

5. Per-IP limit on legitimate corporate users

A 10k-user corporation behind one NAT all hit your IP limit together. Use API-key as primary; IP as secondary.

Read this next

If you want a Hono / FastAPI rate-limit middleware with Redis token bucket + headers, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .