Real-time bidding has unforgiving latency. Miss the 100ms window and you don’t get the impression. The architecture that wins is brutally tight on every dimension. This post is the working design.
Latency budget
Total: 100ms hard limit. Allocations:
- Network in (publisher → ad server): 30ms.
- Decisioning (which ad to bid + price): 30ms.
- Network out: 30ms.
- Slack for spikes: 10ms.
The 30ms decisioning budget is what you architect.
Architecture
Publisher → SSP → Bid Request → Bidder
↓
Candidate retrieval (~5ms)
↓
Filtering (~5ms)
↓
Ranking (~10ms)
↓
Bid response
Each stage has its own SLO. Miss any → no bid.
Candidate retrieval
Active campaigns matching the request’s targeting (geo, language, vertical). Cache:
- Per-publisher index of eligible campaigns.
- Per-vertical filters.
- Per-geo segments.
In Redis or in-process. Postgres is too slow for the hot path.
Filtering
Frequency caps, brand safety, blocklists. All in-memory:
if user_id in seen_user_today[campaign_id]:
return None # frequency cap
if request.url in blocklist:
return None
Bloom filters for blocklists keep memory bounded.
Ranking
The model decides bid price. For RTB:
- Click-through-rate model.
- Conversion-rate model.
- Combined to expected value per impression.
- Bid = expected value × profit margin.
Models trained offline; served online. Inference must be <5ms (lightweight gradient-boosted trees, not deep nets).
Storage
For state that’s queried in the hot path:
| Layer | Latency budget | Purpose |
|---|---|---|
| In-process cache | <100μs | Active campaigns, blocklist, user features |
| Redis | <2ms | Cross-instance shared state |
| Cassandra / Aerospike | <5ms | User profiles |
| Postgres | not in hot path | Source of truth |
Aerospike is a known choice — sub-millisecond at high concurrency.
Logging and billing
Don’t log synchronously. Push to Kafka:
async def bid(request):
decision = decide(request)
asyncio.create_task(log_event(request, decision)) # fire-and-forget
return decision
Loss of a few logs is acceptable; missing the bid window isn’t.
Aggregate billing in a separate pipeline (Kafka → Flink → Postgres → invoicing). Real-time accuracy not required; eventually-consistent.
Fraud
A separate stream consumes bid logs; flags suspicious patterns (velocity, geo mismatches, known bad IPs). Decisions feed back to the in-memory blocklist.
Capacity
For 1M QPS:
- 10–20 bidder instances at 50–100k QPS each.
- Each instance multi-threaded; tuned for low GC pressure.
- Go, Rust, or C++ for the hot path. JVM if you keep heap small + ZGC.
Latency-sensitive code in Go:
type Bidder struct {
campaigns *CampaignCache
profiles *RedisCluster
model *GBTModel
}
func (b *Bidder) Bid(ctx context.Context, req *BidRequest) (*BidResponse, error) {
ctx, cancel := context.WithTimeout(ctx, 30*time.Millisecond)
defer cancel()
candidates := b.campaigns.Match(req)
candidates = b.filter(candidates, req)
if len(candidates) == 0 { return nil, nil }
ranked := b.rank(candidates, req)
return b.respond(ranked[0]), nil
}
Everything bounded by the deadline.
What’s hard
- Tail latency. Budget says 30ms; the 99.9th percentile must also fit.
- GC pauses. Even 50ms pauses in a long tail mean you miss bids.
- Cache misses on cold paths. Every query that hits a slow store is a miss.
- Network: cross-region adds 50–100ms. RTB serves from-region only.
What I’d build today
- Go service with tight allocation discipline.
- Aerospike or DragonflyDB for hot user state.
- In-process LRU for the hottest data.
- Kafka for async log/billing.
- Per-region clusters; no cross-region traffic in the hot path.
- GBT model served via a thin client; no GPU.
Read this next
- Distributed Systems Fundamentals
- Caching Strategies in 2026
- Design a Distributed Cache
- Go 1.24 Features
If you want a sketch RTB bidder in Go with the structure above, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .