Designing Uber is a system design classic because every component is interesting: real-time geolocation at million-driver scale, sub-second dispatch matching, ETAs that update live, surge pricing, and distributed coordination across cities. Here’s how I’d design it.
Requirements
Functional
- Drivers broadcast location continuously while online.
- Rider requests a ride; service finds and assigns the nearest available driver.
- ETAs update live during the ride.
- Surge pricing in high-demand areas.
- Trip history, billing, ratings.
Non-functional
- Sub-second match at p95.
- Sub-100ms location update end-to-end during a trip.
- High availability — outage = riders stranded.
- Eventual consistency acceptable for non-critical state.
Capacity
| Number | |
|---|---|
| Active drivers (concurrent, peak) | 5M |
| Active riders (concurrent, peak) | 20M |
| Driver location updates / sec | ~1M (5M × 1 update / 5s) |
| Trips / day | 30M |
| Cities | ~10,000 |
The interesting numbers:
- 1M location updates / sec — the highest write rate in the system.
- Match latency budget: ~500ms end-to-end.
- Geographic locality — most queries are within one city.
Architecture
Mobile Apps (driver, rider)
│ WebSocket
▼
┌──────────────────────────────────────┐
│ Edge gateway (sticky by city) │
└──────────┬───────────────────────────┘
│
┌────────────┼─────────────────┐
▼ ▼ ▼
Location Dispatch Trip
service service service
│ │ │
▼ ▼ ▼
Geo-index Match queue Postgres / Cassandra
(Redis / (NATS / (durable state)
custom) Kafka)
The hot path: location updates flow into a geo-index. Dispatch reads the geo-index to find candidates. Trip service handles persistent state.
Geo-indexing
Three popular choices in 2026:
1. Geohash
Encode (lat, lon) as a string. Same prefix = nearby. Easy to reason about; uneven cell shapes near poles. Good enough for most apps.
2. Google S2
Hierarchical sphere-based cells. Better uniformity than Geohash. Used by Google, including Maps and Earth.
3. Uber H3
Hexagonal grid. Why hexagons: equal-distance neighbors (Geohash and S2 squares have neighbor-distance asymmetry — diagonal vs adjacent). Open source from Uber.
import h3
# Driver at (12.97, 77.59) — Bangalore
cell = h3.latlng_to_cell(12.97, 77.59, resolution=9) # ~150m hexagon
neighbors = h3.grid_disk(cell, k=2) # cells within 2 hexes
# Drivers in those cells:
candidate_drivers = []
for c in neighbors:
candidate_drivers.extend(redis.smembers(f"drivers:{c}"))
H3 resolution 9 ≈ 150m hexagons. Resolution 8 ≈ 460m. Pick by your app’s matching radius.
Location ingestion
Driver app
│ {"lat":..., "lon":..., "ts":...}
│ every 4–10s, WebSocket
▼
Location service (per-city sharded)
│
├──▶ Update in-memory geo-index (H3 cell → set of drivers)
│
├──▶ Append to time-series store (TimescaleDB, hot for last 24h)
│
└──▶ Publish to Kafka 'location.driver' topic (downstream consumers: ETA, ML)
Three writes per location update:
- Geo-index (Redis SET keyed by H3 cell, or in-memory with replication).
- Time-series for history.
- Event topic for downstream.
The geo-index is the hot path. Time-series and Kafka are async.
Dispatch matching
When a rider requests a ride:
- Compute pickup H3 cell.
- Read driver list from cell + adjacent cells.
- Score each candidate (distance, ETA, rating, idle time).
- Send offer to top driver(s).
- On acceptance, lock the driver and assign the ride.
def dispatch(pickup_lat: float, pickup_lon: float) -> str:
cell = h3.latlng_to_cell(pickup_lat, pickup_lon, resolution=8)
nearby = h3.grid_disk(cell, k=2)
candidates = []
for c in nearby:
candidates.extend(redis.smembers(f"drivers:available:{c}"))
# Score (haversine distance × ETA × rating)
candidates.sort(key=lambda d: score(d, pickup_lat, pickup_lon))
for driver_id in candidates[:5]:
if try_lock_driver(driver_id):
offer(driver_id, pickup_lat, pickup_lon)
return driver_id
raise NoDriversAvailable()
try_lock_driver is a Redis SET NX EX 30 — reserve the driver for 30s while they decide. If they reject, release; offer next.
For the locking primitives see PostgreSQL MVCC, Isolation, Locking — same ideas, Redis flavor.
ETA prediction
Two flavors of ETA:
- Pickup ETA (driver to rider).
- Trip ETA (rider’s destination).
Both are routing problems. Approaches:
- Routing API (OSRM, Mapbox Directions, Google Directions). Returns route + duration.
- Cached routing tiles for hot routes.
- ML correction layer. Learned model predicts how much real-world traffic deviates from the routing engine’s estimate. Trained on historic trip data.
ETAs are recomputed every 10–30s during the trip and pushed to the rider’s app.
Surge pricing
A pricing service computes per-cell multipliers:
def compute_surge(cell: str) -> float:
demand = redis.get(f"demand:{cell}:5m") # ride requests in 5m
supply = redis.scard(f"drivers:available:{cell}")
if supply == 0:
return MAX_SURGE
ratio = demand / supply
if ratio > 5: return 2.5
if ratio > 3: return 1.8
if ratio > 2: return 1.4
return 1.0
Pushed to rider apps via existing WebSocket. Updated every 30s–1m.
In practice surge is more sophisticated — predictive (forecast next-5-min demand), fairness-aware (smooth jumps), and capped (regulatory).
Trip lifecycle
Each trip is a small state machine:
requested → matched → driver_arriving → in_progress → completed
↘ cancelled ↘ cancelled
State transitions are persistent (Postgres or Cassandra). Each transition emits an event for downstream services (billing, notifications, ML).
For multi-step workflows with retries / failures, see Temporal Durable Execution . Many ride-hailing companies use Temporal-style durable workflows for trip orchestration.
Storage
| Data | Store |
|---|---|
| Driver location (latest) | Redis / in-memory |
| Driver location history (last 24h) | TimescaleDB / Cassandra |
| Driver location history (cold) | Parquet on S3 |
| Trip state | Postgres (sharded by city) |
| Trip history | Cassandra (time-series style) |
| Pricing | Redis + Postgres |
| Driver / rider profiles | Postgres |
The mix reflects access pattern: hot-and-fresh in memory; recent and durable in time-series; cold archival for cheap storage.
Connection management
5M concurrent driver WebSocket connections + 20M rider connections = a serious WebSocket fleet. The pattern is the same as in Design WhatsApp / Chat :
- Sticky routing by user_id hash.
- ~50k connections per node.
- ~500 nodes for drivers, ~2000 for riders.
- Pub/sub backplane for cross-node messaging.
Heartbeats: every 30s. Drop a driver from the pool if no heartbeat for 90s.
Multi-region
Each city has its own primary cluster. Cross-city operations (rare) go through a global dispatcher. Riders moving between cities are re-bound to the new city’s cluster.
This is simpler than truly global distributed systems. Most ride-hailing data is intrinsically city-local — physics constrains it.
Common operational issues
Hot cells
City center, airport, train station = hot cells. Sharding strategy must keep them spread:
- Sub-shard a hot cell by driver-ID hash.
- Cap concurrent reads with semaphores.
- Cache popular queries for short windows.
Driver offline / online flapping
A driver’s app loses signal for 5 seconds. Naive: kick from pool, allow back. Better: tolerant of brief gaps; only mark offline after 30s of no updates.
Cancellation storms
A surge ends and many riders simultaneously cancel. The driver service must handle a burst of state transitions. Idempotency keys (Idempotency post ) save you.
What interviewers love to dig into
- “How do you stop two riders from being assigned the same driver?” → Distributed lock on
driver_idwith TTL; first writer wins. - “How do you handle driver crash during a trip?” → Trip state in DB; another driver can be reassigned with a fault-tolerant reassign flow (or refund + cancel).
- “How do you predict supply / demand?” → Time-series forecasting per cell; use weather, events, time-of-day features.
- “How do you scale dispatch to a million drivers?” → Per-city sharding; geo-index in memory; bound the candidate pool with H3 cell radius; horizontal scale of dispatch service.
What I’d build today
For a small ride-hailing product (one city, 100k riders):
- Postgres + PostGIS for everything (location, trips, profiles).
- Redis for live driver positions.
- Hono / FastAPI backend.
- Mapbox / OSRM for routing.
- WebSockets via NATS or in-process for live updates.
Scale up to per-city sharding when the city outgrows one Postgres.
Read this next
- Distributed Systems Fundamentals
- Design WhatsApp / Chat — same WebSocket fleet shape.
- Design a Rate Limiter
- Caching Strategies in 2026
If you want a small ride-hailing reference build (FastAPI + PostGIS + Redis + WebSocket), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .