A notification system that scales is its own subsystem. Email, push, SMS, in-app — each with templates, preferences, fan-out, and the unique operational quirks of each channel. Here’s how to design it.
Architecture
Source events → Notification queue → Templating → Channel router
│
┌───────────────────────────────┤
▼ ▼ ▼
Email worker Push worker SMS worker
↓ ↓ ↓
SendGrid FCM/APNs Twilio
Decoupled. Channels are independent. Failures in one don’t impact others.
Templates
# templates/order_shipped.yaml
id: order_shipped
channels:
email:
subject: "Your order #{{order.id}} has shipped!"
body: "{{user.name}}, your order is on its way. Tracking: {{order.tracking_url}}"
push:
title: "Order shipped"
body: "Your order #{{order.id}} is on its way"
sms:
body: "Order {{order.id}} shipped. Track: {{order.tracking_url}}"
One template, multiple channels. Render with handlebars / Jinja / similar.
Preferences
CREATE TABLE notification_preferences (
user_id BIGINT,
category TEXT, -- 'order_updates' | 'marketing' | 'security'
channel TEXT, -- 'email' | 'push' | 'sms' | 'in_app'
enabled BOOLEAN DEFAULT TRUE,
PRIMARY KEY (user_id, category, channel)
);
Always check before sending:
allowed = await fetch_pref(user_id, category, channel)
if not allowed:
return # respect user's choice
Some categories are mandatory (security alerts, transactional). Mark them and skip the check — but document the policy.
Channels
- Templates versioned.
- Sender reputation — don’t blast cold lists; warm up your domain.
- Bounce / complaint handling — auto-suppress emails to addresses that bounce or complain.
- Provider: SendGrid, Postmark, AWS SES, Resend.
Push
- APNs (iOS), FCM (Android), Web Push.
- Token registration — store per device; expire on failure.
- Silent updates vs alerts — different APIs.
- Deep links for in-app navigation.
SMS
- Provider: Twilio, Vonage, MessageBird.
- Cost discipline — SMS is expensive; rate-limit.
- A2P 10DLC compliance in the US.
- Short codes for high volume.
In-app
- Stored in DB, displayed in app.
- Real-time delivery via WebSocket.
- Read receipts.
Batching and digests
For non-urgent notifications, batch:
- “10 new comments on your post” instead of 10 emails.
- Daily digest instead of immediate per-event.
- User preference: immediate / hourly / daily.
Saves money, reduces noise, often higher engagement.
Deduplication
The same logical event must not produce duplicate notifications. The fan-out worker dedupes:
async def fan_out(event_id, user_id, channels):
for channel in channels:
if not await mark_sent(event_id, user_id, channel):
continue # already sent
await dispatch(channel, ...)
mark_sent is an upsert with a unique constraint. Concurrent workers can’t both send.
Observability
For every notification:
- Sent / delivered / opened / clicked / bounced.
- Per-channel, per-template, per-user.
Dashboards:
- Delivery rate per channel.
- Open rate per template.
- Bounce rate (high → reputation issue).
- Complaint rate.
For production patterns see Webhook Design 2026 — the ESP webhooks are how you get delivery / bounce events.
Common mistakes
1. Sending without preferences check
Compliance violation. Spam complaints. User churn.
2. No suppression list
Sending to addresses that bounced 10× damages domain reputation.
3. No rate limit
Bug → user gets 1000 emails. Apologize tour.
4. No templates / hardcoded strings
Template change = code deploy. Keep templates editable.
5. Treating notifications as fire-and-forget
Track delivery. Some notifications are critical (security alerts); know if they didn’t arrive.
Read this next
- Design a Distributed Task Queue
- Webhook Design 2026
- Background Jobs in Python
- Idempotency, Retries, and Exactly-Once Illusions
If you want a notification system reference (Postgres + Redis + multi-channel), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .