End-to-end budget is 100ms — and that includes network round trips. Ad servers receive bid requests, decide, and respond before the ad slot times out. Every microsecond counts.

Where does the time go?

Network: 30–50ms (round trips). Decisioning: 20–40ms. Database / cache lookups: 10–20ms. Logging/billing: 5ms. Plan a tight budget for each stage.

Design a Real-Time Bidding System — Sub-100ms Latency at Scale

Real-time bidding has unforgiving latency. Miss the 100ms window and you don’t get the impression. The architecture that wins is brutally tight on every dimension. This post is the working design.

Latency budget

Total: 100ms hard limit. Allocations:

Network in (publisher → ad server): 30ms.
Decisioning (which ad to bid + price): 30ms.
Network out: 30ms.
Slack for spikes: 10ms.

The 30ms decisioning budget is what you architect.

Architecture

Publisher → SSP → Bid Request → Bidder
                                   ↓
                          Candidate retrieval (~5ms)
                                   ↓
                            Filtering (~5ms)
                                   ↓
                              Ranking (~10ms)
                                   ↓
                              Bid response

Each stage has its own SLO. Miss any → no bid.

Candidate retrieval

Active campaigns matching the request’s targeting (geo, language, vertical). Cache:

Per-publisher index of eligible campaigns.
Per-vertical filters.
Per-geo segments.

In Redis or in-process. Postgres is too slow for the hot path.

Filtering

Frequency caps, brand safety, blocklists. All in-memory:

if user_id in seen_user_today[campaign_id]:
    return None    # frequency cap
if request.url in blocklist:
    return None

Bloom filters for blocklists keep memory bounded.

Ranking

The model decides bid price. For RTB:

Click-through-rate model.
Conversion-rate model.
Combined to expected value per impression.
Bid = expected value × profit margin.

Models trained offline; served online. Inference must be <5ms (lightweight gradient-boosted trees, not deep nets).

Storage

For state that’s queried in the hot path:

Layer	Latency budget	Purpose
In-process cache	<100μs	Active campaigns, blocklist, user features
Redis	<2ms	Cross-instance shared state
Cassandra / Aerospike	<5ms	User profiles
Postgres	not in hot path	Source of truth

Aerospike is a known choice — sub-millisecond at high concurrency.

Logging and billing

Don’t log synchronously. Push to Kafka:

async def bid(request):
    decision = decide(request)
    asyncio.create_task(log_event(request, decision))   # fire-and-forget
    return decision

Loss of a few logs is acceptable; missing the bid window isn’t.

Aggregate billing in a separate pipeline (Kafka → Flink → Postgres → invoicing). Real-time accuracy not required; eventually-consistent.

Fraud

A separate stream consumes bid logs; flags suspicious patterns (velocity, geo mismatches, known bad IPs). Decisions feed back to the in-memory blocklist.

Capacity

For 1M QPS:

10–20 bidder instances at 50–100k QPS each.
Each instance multi-threaded; tuned for low GC pressure.
Go, Rust, or C++ for the hot path. JVM if you keep heap small + ZGC.

Latency-sensitive code in Go:

type Bidder struct {
    campaigns *CampaignCache
    profiles  *RedisCluster
    model     *GBTModel
}

func (b *Bidder) Bid(ctx context.Context, req *BidRequest) (*BidResponse, error) {
    ctx, cancel := context.WithTimeout(ctx, 30*time.Millisecond)
    defer cancel()
    
    candidates := b.campaigns.Match(req)
    candidates = b.filter(candidates, req)
    if len(candidates) == 0 { return nil, nil }
    
    ranked := b.rank(candidates, req)
    return b.respond(ranked[0]), nil
}

Everything bounded by the deadline.

What’s hard

Tail latency. Budget says 30ms; the 99.9th percentile must also fit.
GC pauses. Even 50ms pauses in a long tail mean you miss bids.
Cache misses on cold paths. Every query that hits a slow store is a miss.
Network: cross-region adds 50–100ms. RTB serves from-region only.

What I’d build today

Go service with tight allocation discipline.
Aerospike or DragonflyDB for hot user state.
In-process LRU for the hottest data.
Kafka for async log/billing.
Per-region clusters; no cross-region traffic in the hot path.
GBT model served via a thin client; no GPU.

Read this next

If you want a sketch RTB bidder in Go with the structure above, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Latency budget#

Architecture#

Candidate retrieval#

Filtering#

Ranking#

Storage#

Logging and billing#

Fraud#

Capacity#

What’s hard#

What I’d build today#

Read this next#