Notification System Design 2026 — Fanout, Priority Queue, Idempotency, and Template Engine for Millions of Push/Email/SMS per Day

Posted on: 4/17/2026 2:10:41 AM

Table of contents

1. Why Notifications Are Harder Than You Think
2. Functional and Non-Functional Requirements
3. Channel Map and Channel Characteristics
1. Don't mix marketing and transactional in one pipeline
4. Overall Architecture
5. Core Data Model
6. Ingest Pipeline — From API to Queue
1. Idempotency done right
7. Fanout — One Event, Many Deliveries
8. Template Engine — Separate Content from Code
1. Template versioning and A/B testing
9. Priority Queue and Back-Pressure
10. Retry, DLQ, and Self-Healing
11. Dedup, Suppression, and Per-User Rate Limiting
12. Quiet Hours, Timezones, and Localisation
13. Delivery Callback — The Truth Is at the Provider
14. In-App Inbox and Realtime via WebSocket
15. Observability — Metrics, Traces, and Analytics
1. SLI/SLO for a Notification Service
16. Campaign Scheduler — Millions of Messages, Each User's Local Time
17. Security and Abuse Prevention
18. Realistic Capacity Planning
19. Case Studies — How the Big Players Solve It
20. Rollout Checklist for Your Team
1. Pre-launch
2. First 90 days in production
21. Conclusion
References

1. Why Notifications Are Harder Than You Think

At first glance, sending a notification is just calling a Firebase or Twilio API. But when your system has millions of users, dozens of event types, three or four concurrent channels (push, email, SMS, in-app, webhook), and must respect quiet-hours in each user's timezone, the picture turns into a complex distributed system with a distinctive set of constraints: per-channel latency varies wildly, per-channel cost varies wildly, failure modes vary wildly, and the hardest constraint of all is that you must never send duplicates, yet you must never forget a single one.

This article dives deep into Notification Service architecture at millions-of-users scale: from data model, ingest pipeline, multi-channel fanout, template engine, personalised rate-limiting, idempotency, retry and DLQ, through to campaign effectiveness observability and unsubscribe/quiet-hours enforcement. The illustrated stack is .NET 10 plus Vue/Nuxt for the admin dashboard, but the principles apply to any stack — Java Spring, Node.js, Go, or Python.

4+Channels (push, email, SMS, in-app, webhook)

~50msIngest-to-queue p99

99.9%Delivery SLA achievable with retry

10M/dayPer-worker-node throughput

2. Functional and Non-Functional Requirements

Before sketching the architecture, you have to nail down what the system must do — and what it must not do. This is where many "ship fast" teams trip up: they only think about being able to send, not about not sending the wrong thing or not spamming the user.

Requirement group	Content	Concrete example
Functional	Multi-channel delivery driven by domain events (order, payment, promo, system)	When an order transitions to `Shipped`, send push + email but not SMS
Template	Multi-locale templating with variables, A/B testing, localisation	`Your order {{orderId}} has shipped, {{recipient.firstName}}`
Priority	Priority tiers, transactional strictly separated from promotional	OTP P0, order updates P1, marketing P3
Idempotency	Each event must be delivered exactly once, regardless of upstream retries	Producer replays the same `idempotencyKey` ⇒ one delivery only
Throughput	≥ 50k notifications/s peak for push, ≥ 10k/s for email	Black Friday flash-sale push to 2 million users
Latency	OTP e2e ≤ 3s p99, order updates ≤ 30s, marketing unconstrained	International OTP SMS still under 5s
Reliability	No message loss on process crash, intelligent retry, replayable DLQ	Kafka replication 3, manual offset checkpointing
Privacy	Per-channel and per-category user opt-out, respect quiet hours and timezone	Never send promo push after 22:00 local time
Observability	Per-channel/template/campaign metrics, end-to-end trace, open/click rate	Grafana dashboard plus a data warehouse for marketing

3. Channel Map and Channel Characteristics

Every channel has its own behaviour. A "one-size-fits-all" design collapses because SMS costs money per message but arrives with near certainty, while push may be free but can be silently dropped the moment a user disables notifications or turns off background app refresh. These differences must be encoded into your send and retry strategy.

Channel	Typical provider	Latency	Cost/msg	Success rate	Quirks
Mobile push	APNs (iOS), FCM (Android, Web)	~1s	Free	80–95%	Token expiration, notifications disabled, silent fail common
Email	Amazon SES, SendGrid, Postmark, Mailgun	2–30s	$0.0001–0.001	95–99%	Async bounce & complaint callbacks, IP/domain reputation matters
SMS	Twilio, Vonage, local carrier	2–15s	$0.005–0.05	97–99%	Expensive, 160-char limit, per-country whitelisting required
In-app	In-house service + WebSocket/SSE	~100ms	Internal infra	~100% if online	Must persist inbox for offline users
Webhook	Customer-hosted endpoint	100ms–10s	Internal infra	90–99%	Endpoint out of your control, need retry plus signed payload
Chat	Slack, Teams, Zalo, Viber	1–5s	Free/Bot	95–99%	Strict rate limits, OAuth tokens need refresh

Don't mix marketing and transactional in one pipeline

OTPs and "today's discount" have wildly different SLAs. If you share a queue, one marketing campaign of 2 million SMS can push the OTP p99 from 30 seconds to 5 minutes — long enough for a user to abandon their cart. Always split at least two lanes: transactional (high priority, never throttled) and marketing (low priority, throttled when the system is loaded).

4. Overall Architecture

The heart of a Notification Service is an event-driven pipeline with a clear path from domain event to each send channel. Every component must be idempotent, independently restartable, and measurable at every hop.

flowchart LR
    subgraph Producers
      ORD["Order Service"]
      PAY["Payment Service"]
      AUTH["Auth Service (OTP)"]
      MKT["Marketing Campaign"]
    end

    Producers --> API["Notification API
(.NET 10 Minimal API)"]
    API --> VAL["Validator + Dedup
(Redis idempotency cache)"]
    VAL --> TOPIC{"Kafka topics"}
    TOPIC -->|transactional| WKT["Transactional Worker Pool"]
    TOPIC -->|marketing| WKM["Marketing Worker Pool"]

    WKT --> FAN["Fanout & Preferences"]
    WKM --> FAN
    FAN --> TMPL["Template Engine"]
    TMPL --> ROUTE{"Per-channel router"}
    ROUTE --> APNs
    ROUTE --> FCM
    ROUTE --> SES["SES / SendGrid"]
    ROUTE --> SMS["Twilio / Local SMS"]
    ROUTE --> WS["WebSocket / SSE"]
    ROUTE --> HOOK["Webhook dispatcher"]

    APNs --> CB1["Delivery callback"]
    FCM --> CB1
    SES --> CB1
    SMS --> CB1
    CB1 --> LEDGER[("Delivery ledger
PostgreSQL / OLTP")]
    CB1 --> ANALYTICS[("Analytics warehouse")]

Notification System — event in, multi-channel out, every hop measured

A few important things about this diagram:

Topics split by priority: at minimum two topics, notif.transactional and notif.marketing. Separate worker pools, separate resources.
Fanout lives after ingest: the producer only needs to know that user X should be notified about event A. Fanning out to N channels × M devices happens inside the service — the producer shouldn't have to know how many devices the user owns.
Dedicated delivery ledger: an authoritative history table — who was sent what, when, with what result. This table feeds debugging, the user-facing notification history UI, and compliance.

5. Core Data Model

A Notification Service data schema doesn't have many tables, but the relationships between them determine how well it scales later. Here's the minimal model — it can grow when you need A/B testing, campaign scheduling, journey orchestration…

erDiagram
    USER ||--o{ DEVICE : owns
    USER ||--o{ PREFERENCE : has
    USER ||--o{ DELIVERY : receives
    TEMPLATE ||--o{ CAMPAIGN : used_in
    CAMPAIGN ||--o{ NOTIFICATION_EVENT : produces
    NOTIFICATION_EVENT ||--o{ DELIVERY : fans_out_to
    DEVICE ||--o{ DELIVERY : targeted_by
    CHANNEL ||--o{ DELIVERY : uses

    USER {
      uuid id
      string locale
      string timezone
    }
    DEVICE {
      uuid id
      uuid user_id
      string platform
      string push_token
      datetime last_seen
      bool active
    }
    PREFERENCE {
      uuid user_id
      string category
      string channel
      bool enabled
      time quiet_start
      time quiet_end
    }
    TEMPLATE {
      uuid id
      string code
      string channel
      string locale
      text body
      json schema
    }
    NOTIFICATION_EVENT {
      uuid id
      string idempotency_key
      string type
      int priority
      json payload
      datetime created_at
    }
    DELIVERY {
      uuid id
      uuid event_id
      uuid user_id
      string channel
      string provider_msg_id
      string status
      datetime sent_at
      datetime delivered_at
    }

Notification Service data schema — minimal to scale yet enough to audit

Design notes:

NOTIFICATION_EVENT.idempotency_key is the input-side dedup key, produced by the caller (e.g. order:1234:shipped). Insert with a UNIQUE constraint; on duplicate, reject with 200 OK returning the existing event.
DELIVERY is split from EVENT so that one event can produce many deliveries (push to device A, push to device B, email). Each delivery has its own lifecycle: queued → sent → delivered → opened → clicked.
PREFERENCE must be split down to category × channel. A user may want promo emails but not promo pushes. A single is_subscribed column is not enough.
Sharding: the DELIVERY table grows fast. From day one, shard by user_id or created_at (partition by day). Don't wait until 500 million rows to deal with it.

6. Ingest Pipeline — From API to Queue

Producers call the Notification Service API instead of calling FCM/SES directly. This centralises control: rate limiting, templates, preferences, idempotency, and audit all live in one place. The .NET 10 Minimal API snippet below illustrates a minimal but complete ingest endpoint:

app.MapPost("/v1/notifications", async (
    NotifyRequest req,
    IValidator<NotifyRequest> validator,
    IIdempotencyStore idem,
    IPublisher publisher,
    CancellationToken ct) =>
{
    // 1. Validate payload
    var result = await validator.ValidateAsync(req, ct);
    if (!result.IsValid) return Results.ValidationProblem(result.ToDictionary());

    // 2. Idempotency: if key already seen, return the existing event
    var existing = await idem.GetAsync(req.IdempotencyKey, ct);
    if (existing is not null) return Results.Accepted($"/v1/notifications/{existing.Id}", existing);

    // 3. Build event, pick topic by priority
    var evt = NotificationEvent.Create(req);
    var topic = req.Priority <= 1 ? "notif.transactional" : "notif.marketing";

    // 4. Publish to Kafka (transactional producer, exactly-once)
    await publisher.PublishAsync(topic, evt, ct);

    // 5. Cache idempotency for 24h
    await idem.SetAsync(req.IdempotencyKey, evt, TimeSpan.FromHours(24), ct);

    return Results.Accepted($"/v1/notifications/{evt.Id}", evt);
})
.RequireAuthorization("NotificationProducer")
.WithName("SubmitNotification");

Idempotency done right

Redis SET key value NX EX 86400 is enough on the hot path. If you need absolute certainty, pair it with a UNIQUE constraint in the DB — but avoid DB round-trips on the hot path; let the DB only catch duplicates that Redis missed during a cache flush. For critical events (OTP, payment), add a per-user sequence number to catch out-of-order caused by upstream multi-retry.

7. Fanout — One Event, Many Deliveries

A worker consuming the topic breaks one event into a list of deliveries by this formula:

delivery_set =
  (user.devices ∪ user.emails ∪ user.phone)
  ∩ template.channels
  ∩ user.preferences
  \ user.quiet_hours_violating_channels

In other words, you take the intersection of three sets and subtract channels that are currently in quiet hours. The result is a concrete list of (channel, target) pairs to send. A user with 2 phones + 1 email + a web-push subscription can blow up into 4 distinct deliveries from a single event.

public async Task<IReadOnlyList<Delivery>> FanoutAsync(NotificationEvent evt, CancellationToken ct)
{
    var user = await users.GetAsync(evt.UserId, ct);
    var prefs = await prefs.GetAsync(evt.UserId, evt.Category, ct);
    var template = await templates.GetAsync(evt.TemplateCode, user.Locale, ct);

    var deliveries = new List<Delivery>();
    var nowLocal = DateTimeOffset.UtcNow.ToTimeZone(user.TimeZone);

    foreach (var channel in template.Channels)
    {
        if (!prefs.IsEnabled(channel)) continue;
        if (prefs.InQuietHours(channel, nowLocal) && evt.Priority > 1) continue;

        foreach (var target in user.TargetsFor(channel))
        {
            deliveries.Add(Delivery.New(evt.Id, user.Id, channel, target, template));
        }
    }

    return deliveries;
}

Subtle but important: P0/P1 overrides quiet hours. Nobody wants to miss an OTP just because they happen to be asleep.

8. Template Engine — Separate Content from Code

Templates live in the DB and are preloaded into a cache (Redis or in-memory). Each template carries a schema validating its inputs: if an event is missing a variable, reject at ingest instead of letting a worker discover it and die midway.

code: order.shipped
locale: en-US
channel: push
body: |
  Hi {{recipient.firstName}}, your order {{orderId}} is on the way
  to {{shippingAddress.short}}. ETA {{eta | date:"HH:mm, MM/dd"}}.
schema:
  required: [recipient.firstName, orderId, shippingAddress.short, eta]

Template versioning and A/B testing

Each template carries a version. When updating, create a new version and keep the old one running. Route 10% of traffic to the new version for 24 hours, watch CTR and open rate in the warehouse. If it wins, cut over completely. This is the same principle as a feature flag, applied to content.

9. Priority Queue and Back-Pressure

The system will be overloaded occasionally. An upstream service fails and retries everything, a marketing campaign hits Send on 2 million users in one shot, an SMS carrier slows to 500ms/msg. Without priority and back-pressure, every layer suffers — from CPU to provider limits.

flowchart TB
    IN["Kafka:
notif.transactional
notif.marketing"] --> WK["Worker dispatcher"]
    WK --> P0["P0 pool (OTP, auth)
high concurrency, no throttle"]
    WK --> P1["P1 pool (order, payment)
moderate concurrency"]
    WK --> P3["P3 pool (marketing)
low concurrency, token-bucket throttled"]
    P0 --> PROV["Provider pool"]
    P1 --> PROV
    P3 --> PROV
    PROV --> RL["Global per-provider rate limit"]
    RL -->|block| RET["Retry queue (delay)"]
    RET -. after 2^n seconds .-> PROV

Separate pools per priority, plus retry with exponential backoff

A few rules distilled from real incidents:

Distinct worker pools per priority: use separate thread pools or separate processes, NOT shared. If shared, one marketing burst will push OTPs behind tens of thousands of messages.
Back-pressure from provider to queue: when SES returns 429, the worker must pause consumption for a window — don't blindly push to DLQ.
Per-provider token bucket: FCM 600 req/s, SES 100/s by default. Apply the limit at the worker layer so you don't get cut off by the provider.
Graceful degradation: if the primary SMS provider dies, fail over to secondary for P0/P1 but accept dropping P3. A marketing notification delayed an hour hurts nobody; an OTP delayed a minute loses you the order.

10. Retry, DLQ, and Self-Healing

Wrong channel, expired token, external endpoint timeout — all of these are either transient or permanent failures. Correctly distinguishing the two is the key to not spamming retries for nothing.

Failure type	Example	Handling
Transient	5xx provider, timeout, rate-limit	Retry with exponential backoff + jitter, capped at 5 attempts
Permanent	Invalid token, email hard bounce, malformed phone number	Do not retry. Log it, disable the target, trigger cleanup
Ambiguous	Provider returned 202 without a delivery status	Retry only if the delivery callback doesn't arrive within TTL
Critical bug	Malformed template, worker crash loop	Park in DLQ, page on-call, keep the main queue flowing

A DLQ isn't "where messages go to die". It must come with a replay tool. A simple CLI that lists, edits metadata, and replays into the main queue is enough for on-call to handle most incidents.

// Exponential backoff with jitter
public static TimeSpan NextRetryDelay(int attempt)
{
    var baseMs = Math.Min(30_000, 500 * Math.Pow(2, attempt));
    var jitter = Random.Shared.NextDouble() * 0.3; // ±30%
    return TimeSpan.FromMilliseconds(baseMs * (1 + jitter));
}

11. Dedup, Suppression, and Per-User Rate Limiting

No user should get 47 notifications in 10 minutes. But you don't want to hard-block either, because sometimes they genuinely need them (e.g. a sequence of order, payment, shipped events within seconds). The answer: per-user, per-category rate limits.

public async Task<bool> ShouldSuppressAsync(Guid userId, string category, CancellationToken ct)
{
    // Leaky bucket in Redis: 5 marketing pushes per hour
    var key = $"ratelimit:push:{userId}:{category}";
    var count = await redis.StringIncrementAsync(key);
    if (count == 1) await redis.KeyExpireAsync(key, TimeSpan.FromHours(1));
    return category == "marketing" && count > 5;
}

Pair it with the digest pattern: when you detect that you're about to breach the rate limit, instead of dropping, bundle 10 small notifications into one "You have 10 new updates" message. This pattern works wonders for social and collaboration apps.

12. Quiet Hours, Timezones, and Localisation

A project I once worked on pushed a promotional notification at 3am because the servers ran on UTC and the campaign was scheduled in UTC. Outcome: thousands of 1-star reviews. Lesson: every user-facing time must be interpreted in the user's timezone, never the server's.

Three minimum rules:

Store user.timezone as IANA (Asia/Ho_Chi_Minh), not a raw offset.
Default quiet hours 22:00–07:00 local for marketing (studies show very low open rates outside 8:00–21:00).
For "send to each user at 9:00 local" batch campaigns, you need a dedicated scheduler: split the campaign into buckets by timezone, enqueue each bucket at the right moment.

13. Delivery Callback — The Truth Is at the Provider

You call SES and get a 202 — don't assume you're done. 202 just means SES accepted. The email can bounce, trigger a complaint, arrive immediately, or end up in the promotions tab. The truth lives in the delivery callback the provider sends back to your webhook.

app.MapPost("/webhooks/ses", async (SesEvent evt, IDeliveryService svc, CancellationToken ct) =>
{
    // Verify SNS signature first
    if (!SesSignature.Verify(evt.RawPayload, evt.Signature)) return Results.Unauthorized();

    var deliveryId = evt.Tags["delivery_id"];
    var status = evt.Type switch
    {
        "Delivery" => DeliveryStatus.Delivered,
        "Bounce" => DeliveryStatus.Bounced,
        "Complaint" => DeliveryStatus.Complaint,
        "Open" => DeliveryStatus.Opened,
        "Click" => DeliveryStatus.Clicked,
        _ => DeliveryStatus.Unknown
    };
    await svc.UpdateAsync(Guid.Parse(deliveryId), status, evt.Timestamp, ct);
    return Results.Ok();
});

When the status is a hard Bounce, the cleanup worker must disable that email. Continuing to send will destroy your sender reputation — SES and SendGrid score this very quickly and you can be blocked from sending until you contact support.

14. In-App Inbox and Realtime via WebSocket

Push is used to alert in the moment. But when users open the app, they want to see the history — that's the job of the in-app inbox. It has two requirements: (1) fast lookup by user, (2) realtime update when a new message arrives.

Common architecture:

sequenceDiagram
    participant W as Worker
    participant DB as Postgres (inbox)
    participant R as Redis Pub/Sub
    participant GW as Realtime Gateway
    participant APP as Mobile/Web App
    W->>DB: INSERT inbox row
    W->>R: PUBLISH user:{id} new_msg
    R->>GW: subscription event
    GW->>APP: WebSocket/SSE push
    APP->>APP: Update badge count
    Note over APP: On user tap, call GET /inbox?limit=50
    APP->>DB: SELECT with cursor pagination

In-app inbox — persisted in DB, realtime via pub/sub

A few often-overlooked details:

Badge count must be computed on the server. Don't rely on the client counting, multi-device will drift.
Mark-as-read needs an event upstream so other devices stay in sync — opened on mobile, badge on web drops too.
Pagination must use cursors (WHERE created_at < :lastSeen), not OFFSET — with a large inbox, OFFSET is painfully slow.
TTL: inboxes older than 90 days can be archived to cold storage or deleted per data policy.

15. Observability — Metrics, Traces, and Analytics

An unobserved Notification Service will almost certainly fail silently: you still deliver 99%, but that 1% is your most important users. Measure three layers:

Layer	Metric	Used for
Pipeline	events_in/s, fanout_ratio, queue_lag, worker_throughput	System health monitoring, SRE alerts
Channel	send_rate, success_rate, bounce_rate, latency p50/p95/p99	Provider comparison, alert on degradation
Business	delivery_rate, open_rate, CTR per template/campaign	Marketing and product content optimisation

End-to-end traces should carry attributes event.id, user.id, template.code, channel so a specific message can be followed from ingest to callback. OpenTelemetry auto-instrumentation for Kafka and HTTP clients gets you most of the way with little config; the hard part is setting attributes correctly at the fanout point — where one event becomes N spans.

SLI/SLO for a Notification Service

Example: 99.5% of OTP SMS are acknowledged by the provider within 3 seconds of ingest. Record the delta delivered_at - event_created_at, compute hourly percentiles, alert when >30% of the weekly error budget burns. That's how you turn "it seems fine" into a number you can defend to stakeholders.

16. Campaign Scheduler — Millions of Messages, Each User's Local Time

Marketing wants to send "Monday 9am local time" to 2 million users. Simple-sounding but users span dozens of timezones. The naïve approach — enqueue all 2 million at 00:00 UTC and have workers hold each message — isn't just memory-hungry, it can't survive a restart.

The tidy solution: time-bucketed scheduler.

flowchart LR
    CAMP["Campaign 'Weekly promo'
send at 9:00 local"] --> BUCKET["Bucket by timezone"]
    BUCKET --> B1["bucket +7 (Asia/Ho_Chi_Minh)"]
    BUCKET --> B2["bucket 0 (UTC, London)"]
    BUCKET --> B3["bucket -5 (America/New_York)"]
    B1 --> C["Cron at 02:00 UTC = 09:00 VN"]
    B2 --> D["Cron at 09:00 UTC"]
    B3 --> E["Cron at 14:00 UTC"]
    C --> ENQ["Enqueue into Kafka"]
    D --> ENQ
    E --> ENQ

Time-bucketed scheduler for multi-timezone campaigns

Inside each bucket, enqueue in ~10k-user batches at a fixed rate so you don't spike the provider. If a user changes timezone, re-check at read-time and shift buckets for that user — not the whole user base, just those few.

17. Security and Abuse Prevention

Notifications are an attack surface people overlook. Until someone uses the internal API to flood SMS to a victim's phone number, or spoof emails from your company's domain. A few mandatory measures:

Authenticated producers: only internal backend services (mTLS or OAuth2 service-to-service) may call the API. Never expose it directly to clients.
Template whitelisting: the body must be an identified template code; no free-text sends. Locking down free text is how you prevent internal phishing.
Per-tenant rate limits: each producer has its own quota. Stops a buggy service from collapsing the whole pipeline.
PII minimisation: payloads contain only keys (userId, orderId). The worker resolves personal data from the user service itself. Logs must never print email/phone in plaintext.
DKIM, SPF, DMARC for email; reputable sending IPs for SES/SendGrid; signed payloads (HMAC-SHA256) for outbound webhooks.
Hard opt-out honour: when a user clicks unsubscribe, the worker must block at fanout, not merely hide the UI. Regulations such as CAN-SPAM, GDPR, and Vietnam's Decree 91/2020 all require this.

18. Realistic Capacity Planning

Reference numbers you can anchor your estimate to:

Component	Per-node capacity	Notes
API ingest (.NET 10 Minimal)	~20–30k req/s	Bounded by CPU, validation, Redis I/O
Kafka broker	~100MB/s write, 3 replicas	Tune batch size, ack=all for transactional
Fanout worker	~2k events/s	With fanout ratio ~3, yields ~6k deliveries/s
FCM push	~600 req/s per HTTP/2 connection	Scale with many connections + 500-token batch
SES email	100/s default, rampable to 10k/s	Quota is per-account, request early
Twilio SMS	10/s per phone number	More numbers for throughput, or Messaging Service
PostgreSQL delivery ledger	~20k write/s with batching	Partition by day, proactive vacuuming

For 50 million notifications/day (~580/s average, 5k/s peak), a cluster of 3 Kafka brokers, 6–8 worker nodes, and 2 API nodes is comfortable. Don't ignore cost: 50M SMS × $0.02 = $1M/month. Shifting 80% of non-critical content off to free channels (push + in-app inbox) is a much bigger lever than tuning worker code.

19. Case Studies — How the Big Players Solve It

A few publicly documented architectures worth studying:

Slack: the in-app inbox is the source of truth. Push is only a teaser. They use a "fanout-on-read" pattern for large channels: don't push to 10k members simultaneously; push based on presence and active subscribers.
Uber: transactional (trip events) is completely separated from promotional, on a dedicated Kafka pipeline. Marketing runs in another service with a hard quota.
LinkedIn: their "Air Traffic Controller" balances many notification types, preventing a user from receiving multiple messages on the same topic within 24 hours. This is the canonical lesson on digest and frequency capping.
Pinterest: uses ML to predict when a user opens the app, sending exactly at that moment instead of spamming. Beautiful idea, but you need a large behavioural dataset before it's worth building.

20. Rollout Checklist for Your Team

Pre-launch

Idempotency key convention agreed across producers.
Template versioning and schema validation on by default.
DLQ with replay tooling, on-call runbook with remediation steps.
SLO defined per priority tier (OTP, transactional, marketing).
Token refresh for chat/webhook, signing-key rotation.

First 90 days in production

Track daily email bounce rate and SES complaint rate. Kill bad targets immediately.
Audit PII logs, ensure no email/phone leaks in plaintext.
Run game-days: simulate FCM/SES outages, verify graceful degradation.
Review marketing-messages-per-user-per-week — if the median exceeds 5, opt-out risk soars.

21. Conclusion

The Notification Service is one of the most underestimated backends out there. On the surface it's just "call FCM and SES", but as you go deeper, it's the sum of nearly every distributed-systems pattern: event-driven, idempotency, priority queue, retry with backoff, fanout, rate limiting, scheduling, observability, security. Building it right up front saves the team months of firefighting; building it in a rush means paying the price every Black Friday, every time a user reports "I didn't get my OTP".

Hopefully this article gives you a detailed enough map so you don't have to code and learn at the same time. The most important takeaways: separate transactional from marketing, enforce idempotency at ingest, make templates schema-backed, honour quiet hours in the user's timezone, and observe every single message. Those five principles alone separate a Notification Service that "works" from one that "can be trusted".

References

#system design #Rate Limiting #Token Bucket #Observability #Kafka #Redis Streams #WebSocket #.NET 10 #OpenTelemetry #PostgreSQL #Minimal API #Idempotency #Leaky Bucket #Exponential Backoff #Webhook #Notification System #Notification Service #Push Notification #FCM #APNs #Email #Amazon SES #SendGrid #SMS #Twilio #In-app Inbox #Event-Driven #Fanout #Priority Queue #Back-pressure #Retry #Dead Letter Queue #Template Engine #Localization #Quiet Hours #Digest Pattern #Delivery Callback #SLI #SLO #Campaign Scheduler #DKIM #SPF #DMARC #Sharding #OTP #Transactional #Marketing

# Notification System Design 2026 — Fanout, Priority Queue, Idempotency, and Template Engine for Millions of Push/Email/SMS per Day

## 1. Why Notifications Are Harder Than You Think

At first glance, sending a notification is just calling a Firebase or Twilio API. But when your system has **millions of users, dozens of event types, three or four concurrent channels (push, email, SMS, in-app, webhook)**, and must respect quiet-hours in each user's timezone, the picture turns into a complex distributed system with a distinctive set of constraints: per-channel latency varies wildly, per-channel cost varies wildly, failure modes vary wildly, and the hardest constraint of all is that *you must never send duplicates, yet you must never forget a single one*.

4+Channels (push, email, SMS, in-app, webhook)

~50msIngest-to-queue p99

99.9%Delivery SLA achievable with retry

10M/dayPer-worker-node throughput

## 2. Functional and Non-Functional Requirements

Before sketching the architecture, you have to nail down what the system must do — and what it must not do. This is where many "ship fast" teams trip up: they only think about being able to send, not about *not sending the wrong thing* or *not spamming the user*.

| Requirement group | Content | Concrete example |
| --- | --- | --- |
| **Functional** | Multi-channel delivery driven by domain events (order, payment, promo, system) | When an order transitions to `Shipped`, send push + email but not SMS |
| **Template** | Multi-locale templating with variables, A/B testing, localisation | `Your order {{orderId}} has shipped, {{recipient.firstName}}` |
| **Priority** | Priority tiers, transactional strictly separated from promotional | OTP P0, order updates P1, marketing P3 |
| **Idempotency** | Each event must be delivered exactly once, regardless of upstream retries | Producer replays the same `idempotencyKey` ⇒ one delivery only |
| **Throughput** | ≥ 50k notifications/s peak for push, ≥ 10k/s for email | Black Friday flash-sale push to 2 million users |
| **Latency** | OTP e2e ≤ 3s p99, order updates ≤ 30s, marketing unconstrained | International OTP SMS still under 5s |
| **Reliability** | No message loss on process crash, intelligent retry, replayable DLQ | Kafka replication 3, manual offset checkpointing |
| **Privacy** | Per-channel and per-category user opt-out, respect quiet hours and timezone | Never send promo push after 22:00 local time |
| **Observability** | Per-channel/template/campaign metrics, end-to-end trace, open/click rate | Grafana dashboard plus a data warehouse for marketing |

## 3. Channel Map and Channel Characteristics

| Channel | Typical provider | Latency | Cost/msg | Success rate | Quirks |
| --- | --- | --- | --- | --- | --- |
| **Mobile push** | APNs (iOS), FCM (Android, Web) | ~1s | Free | 80–95% | Token expiration, notifications disabled, silent fail common |
| **Email** | Amazon SES, SendGrid, Postmark, Mailgun | 2–30s | $0.0001–0.001 | 95–99% | Async bounce & complaint callbacks, IP/domain reputation matters |
| **SMS** | Twilio, Vonage, local carrier | 2–15s | $0.005–0.05 | 97–99% | Expensive, 160-char limit, per-country whitelisting required |
| **In-app** | In-house service + WebSocket/SSE | ~100ms | Internal infra | ~100% if online | Must persist inbox for offline users |
| **Webhook** | Customer-hosted endpoint | 100ms–10s | Internal infra | 90–99% | Endpoint out of your control, need retry plus signed payload |
| **Chat** | Slack, Teams, Zalo, Viber | 1–5s | Free/Bot | 95–99% | Strict rate limits, OAuth tokens need refresh |

#### Don't mix marketing and transactional in one pipeline

OTPs and "today's discount" have wildly different SLAs. If you share a queue, one marketing campaign of 2 million SMS can push the OTP p99 from 30 seconds to 5 minutes — long enough for a user to abandon their cart. Always split at least two lanes: **transactional** (high priority, never throttled) and **marketing** (low priority, throttled when the system is loaded).

## 4. Overall Architecture

The heart of a Notification Service is an **event-driven pipeline** with a clear path from domain event to each send channel. Every component must be idempotent, independently restartable, and measurable at every hop.

```
flowchart LR
    subgraph Producers
      ORD["Order Service"]
      PAY["Payment Service"]
      AUTH["Auth Service (OTP)"]
      MKT["Marketing Campaign"]
    end

Producers --> API["Notification API  
(.NET 10 Minimal API)"]
    API --> VAL["Validator + Dedup  
(Redis idempotency cache)"]
    VAL --> TOPIC{"Kafka topics"}
    TOPIC -->|transactional| WKT["Transactional Worker Pool"]
    TOPIC -->|marketing| WKM["Marketing Worker Pool"]

WKT --> FAN["Fanout & Preferences"]
    WKM --> FAN
    FAN --> TMPL["Template Engine"]
    TMPL --> ROUTE{"Per-channel router"}
    ROUTE --> APNs
    ROUTE --> FCM
    ROUTE --> SES["SES / SendGrid"]
    ROUTE --> SMS["Twilio / Local SMS"]
    ROUTE --> WS["WebSocket / SSE"]
    ROUTE --> HOOK["Webhook dispatcher"]

APNs --> CB1["Delivery callback"]
    FCM --> CB1
    SES --> CB1
    SMS --> CB1
    CB1 --> LEDGER[("Delivery ledger  
PostgreSQL / OLTP")]
    CB1 --> ANALYTICS[("Analytics warehouse")]

```

Notification System — event in, multi-channel out, every hop measured

A few important things about this diagram:

- **Topics split by priority**: at minimum two topics, `notif.transactional` and `notif.marketing`. Separate worker pools, separate resources.
- **Fanout lives after ingest**: the producer only needs to know that user X should be notified about event A. Fanning out to *N channels × M devices* happens inside the service — the producer shouldn't have to know how many devices the user owns.
- **Dedicated delivery ledger**: an authoritative history table — who was sent what, when, with what result. This table feeds debugging, the user-facing notification history UI, and compliance.

## 5. Core Data Model

```
erDiagram
    USER ||--o{ DEVICE : owns
    USER ||--o{ PREFERENCE : has
    USER ||--o{ DELIVERY : receives
    TEMPLATE ||--o{ CAMPAIGN : used_in
    CAMPAIGN ||--o{ NOTIFICATION_EVENT : produces
    NOTIFICATION_EVENT ||--o{ DELIVERY : fans_out_to
    DEVICE ||--o{ DELIVERY : targeted_by
    CHANNEL ||--o{ DELIVERY : uses

USER {
      uuid id
      string locale
      string timezone
    }
    DEVICE {
      uuid id
      uuid user_id
      string platform
      string push_token
      datetime last_seen
      bool active
    }
    PREFERENCE {
      uuid user_id
      string category
      string channel
      bool enabled
      time quiet_start
      time quiet_end
    }
    TEMPLATE {
      uuid id
      string code
      string channel
      string locale
      text body
      json schema
    }
    NOTIFICATION_EVENT {
      uuid id
      string idempotency_key
      string type
      int priority
      json payload
      datetime created_at
    }
    DELIVERY {
      uuid id
      uuid event_id
      uuid user_id
      string channel
      string provider_msg_id
      string status
      datetime sent_at
      datetime delivered_at
    }

```

Notification Service data schema — minimal to scale yet enough to audit

Design notes:

- `NOTIFICATION_EVENT.idempotency_key` is the **input-side dedup key**, produced by the caller (e.g. `order:1234:shipped`). Insert with a UNIQUE constraint; on duplicate, reject with 200 OK returning the existing event.
- `DELIVERY` is split from `EVENT` so that one event can produce many deliveries (push to device A, push to device B, email). Each delivery has its own **lifecycle**: `queued → sent → delivered → opened → clicked`.
- `PREFERENCE` must be split down to **category × channel**. A user may want promo emails but not promo pushes. A single `is_subscribed` column is not enough.
- **Sharding**: the `DELIVERY` table grows fast. From day one, shard by `user_id` or `created_at` (partition by day). Don't wait until 500 million rows to deal with it.

## 6. Ingest Pipeline — From API to Queue

```csharp
app.MapPost("/v1/notifications", async (
    NotifyRequest req,
    IValidator<NotifyRequest> validator,
    IIdempotencyStore idem,
    IPublisher publisher,
    CancellationToken ct) =>
{
    // 1. Validate payload
    var result = await validator.ValidateAsync(req, ct);
    if (!result.IsValid) return Results.ValidationProblem(result.ToDictionary());

// 2. Idempotency: if key already seen, return the existing event
    var existing = await idem.GetAsync(req.IdempotencyKey, ct);
    if (existing is not null) return Results.Accepted($"/v1/notifications/{existing.Id}", existing);

// 3. Build event, pick topic by priority
    var evt = NotificationEvent.Create(req);
    var topic = req.Priority <= 1 ? "notif.transactional" : "notif.marketing";

// 4. Publish to Kafka (transactional producer, exactly-once)
    await publisher.PublishAsync(topic, evt, ct);

// 5. Cache idempotency for 24h
    await idem.SetAsync(req.IdempotencyKey, evt, TimeSpan.FromHours(24), ct);

return Results.Accepted($"/v1/notifications/{evt.Id}", evt);
})
.RequireAuthorization("NotificationProducer")
.WithName("SubmitNotification");

```

#### Idempotency done right

Redis `SET key value NX EX 86400` is enough on the hot path. If you need absolute certainty, pair it with a UNIQUE constraint in the DB — but avoid DB round-trips on the hot path; let the DB only catch duplicates that Redis missed during a cache flush. For critical events (OTP, payment), add a per-user **sequence number** to catch out-of-order caused by upstream multi-retry.

## 7. Fanout — One Event, Many Deliveries

A worker consuming the topic breaks one event into a list of deliveries by this formula:

```text
delivery_set =
  (user.devices ∪ user.emails ∪ user.phone)
  ∩ template.channels
  ∩ user.preferences
  \ user.quiet_hours_violating_channels
```
In other words, you take the intersection of three sets and subtract channels that are currently in quiet hours. The result is a concrete list of *(channel, target)* pairs to send. A user with 2 phones + 1 email + a web-push subscription can blow up into 4 distinct deliveries from a single event.

```csharp
public async Task<IReadOnlyList<Delivery>> FanoutAsync(NotificationEvent evt, CancellationToken ct)
{
    var user = await users.GetAsync(evt.UserId, ct);
    var prefs = await prefs.GetAsync(evt.UserId, evt.Category, ct);
    var template = await templates.GetAsync(evt.TemplateCode, user.Locale, ct);

var deliveries = new List<Delivery>();
    var nowLocal = DateTimeOffset.UtcNow.ToTimeZone(user.TimeZone);

foreach (var channel in template.Channels)
    {
        if (!prefs.IsEnabled(channel)) continue;
        if (prefs.InQuietHours(channel, nowLocal) && evt.Priority > 1) continue;

foreach (var target in user.TargetsFor(channel))
        {
            deliveries.Add(Delivery.New(evt.Id, user.Id, channel, target, template));
        }
    }

return deliveries;
}
```
Subtle but important: **P0/P1 overrides quiet hours**. Nobody wants to miss an OTP just because they happen to be asleep.

## 8. Template Engine — Separate Content from Code

Templates live in the DB and are preloaded into a cache (Redis or in-memory). Each template carries a *schema* validating its inputs: if an event is missing a variable, reject at ingest instead of letting a worker discover it and die midway.

```text
code: order.shipped
locale: en-US
channel: push
body: |
  Hi {{recipient.firstName}}, your order {{orderId}} is on the way
  to {{shippingAddress.short}}. ETA {{eta | date:"HH:mm, MM/dd"}}.
schema:
  required: [recipient.firstName, orderId, shippingAddress.short, eta]
```

#### Template versioning and A/B testing

Each template carries a `version`. When updating, create a new version and keep the old one running. Route 10% of traffic to the new version for 24 hours, watch CTR and open rate in the warehouse. If it wins, cut over completely. This is the same principle as a feature flag, applied to content.

## 9. Priority Queue and Back-Pressure

```
flowchart TB
    IN["Kafka:  
notif.transactional  
notif.marketing"] --> WK["Worker dispatcher"]
    WK --> P0["P0 pool (OTP, auth)  
high concurrency, no throttle"]
    WK --> P1["P1 pool (order, payment)  
moderate concurrency"]
    WK --> P3["P3 pool (marketing)  
low concurrency, token-bucket throttled"]
    P0 --> PROV["Provider pool"]
    P1 --> PROV
    P3 --> PROV
    PROV --> RL["Global per-provider rate limit"]
    RL -->|block| RET["Retry queue (delay)"]
    RET -. after 2^n seconds .-> PROV

```

Separate pools per priority, plus retry with exponential backoff

A few rules distilled from real incidents:

- **Distinct worker pools per priority**: use separate thread pools or separate processes, NOT shared. If shared, one marketing burst will push OTPs behind tens of thousands of messages.
- **Back-pressure from provider to queue**: when SES returns 429, the worker **must pause consumption** for a window — don't blindly push to DLQ.
- **Per-provider token bucket**: FCM 600 req/s, SES 100/s by default. Apply the limit at the worker layer so you don't get cut off by the provider.
- **Graceful degradation**: if the primary SMS provider dies, fail over to secondary for P0/P1 but accept dropping P3. A marketing notification delayed an hour hurts nobody; an OTP delayed a minute loses you the order.

## 10. Retry, DLQ, and Self-Healing

Wrong channel, expired token, external endpoint timeout — all of these are either *transient* or *permanent* failures. Correctly distinguishing the two is the key to not spamming retries for nothing.

| Failure type | Example | Handling |
| --- | --- | --- |
| **Transient** | 5xx provider, timeout, rate-limit | Retry with exponential backoff + jitter, capped at 5 attempts |
| **Permanent** | Invalid token, email hard bounce, malformed phone number | Do not retry. Log it, disable the target, trigger cleanup |
| **Ambiguous** | Provider returned 202 without a delivery status | Retry only if the delivery callback doesn't arrive within TTL |
| **Critical bug** | Malformed template, worker crash loop | Park in DLQ, page on-call, keep the main queue flowing |

A DLQ isn't "where messages go to die". It must come with a **replay tool**. A simple CLI that lists, edits metadata, and replays into the main queue is enough for on-call to handle most incidents.

```csharp
// Exponential backoff with jitter
public static TimeSpan NextRetryDelay(int attempt)
{
    var baseMs = Math.Min(30_000, 500 * Math.Pow(2, attempt));
    var jitter = Random.Shared.NextDouble() * 0.3; // ±30%
    return TimeSpan.FromMilliseconds(baseMs * (1 + jitter));
}
```

## 11. Dedup, Suppression, and Per-User Rate Limiting

```csharp
public async Task<bool> ShouldSuppressAsync(Guid userId, string category, CancellationToken ct)
{
    // Leaky bucket in Redis: 5 marketing pushes per hour
    var key = $"ratelimit:push:{userId}:{category}";
    var count = await redis.StringIncrementAsync(key);
    if (count == 1) await redis.KeyExpireAsync(key, TimeSpan.FromHours(1));
    return category == "marketing" && count > 5;
}
```
Pair it with the **digest pattern**: when you detect that you're about to breach the rate limit, instead of dropping, bundle 10 small notifications into one "You have 10 new updates" message. This pattern works wonders for social and collaboration apps.

## 12. Quiet Hours, Timezones, and Localisation

A project I once worked on pushed a promotional notification at 3am because the servers ran on UTC and the campaign was scheduled in UTC. Outcome: thousands of 1-star reviews. Lesson: **every user-facing time must be interpreted in the user's timezone**, never the server's.

Three minimum rules:

- Store `user.timezone` as IANA (`Asia/Ho_Chi_Minh`), not a raw offset.
- Default quiet hours 22:00–07:00 local for marketing (studies show very low open rates outside 8:00–21:00).
- For "send to each user at 9:00 local" batch campaigns, you need a dedicated scheduler: split the campaign into buckets by timezone, enqueue each bucket at the right moment.

## 13. Delivery Callback — The Truth Is at the Provider

You call SES and get a 202 — don't assume you're done. 202 just means SES accepted. The email can bounce, trigger a complaint, arrive immediately, or end up in the promotions tab. The truth lives in the **delivery callback** the provider sends back to your webhook.

```csharp
app.MapPost("/webhooks/ses", async (SesEvent evt, IDeliveryService svc, CancellationToken ct) =>
{
    // Verify SNS signature first
    if (!SesSignature.Verify(evt.RawPayload, evt.Signature)) return Results.Unauthorized();

var deliveryId = evt.Tags["delivery_id"];
    var status = evt.Type switch
    {
        "Delivery" => DeliveryStatus.Delivered,
        "Bounce" => DeliveryStatus.Bounced,
        "Complaint" => DeliveryStatus.Complaint,
        "Open" => DeliveryStatus.Opened,
        "Click" => DeliveryStatus.Clicked,
        _ => DeliveryStatus.Unknown
    };
    await svc.UpdateAsync(Guid.Parse(deliveryId), status, evt.Timestamp, ct);
    return Results.Ok();
});
```
When the status is a hard `Bounce`, the cleanup worker must **disable that email**. Continuing to send will destroy your sender reputation — SES and SendGrid score this very quickly and you can be blocked from sending until you contact support.

## 14. In-App Inbox and Realtime via WebSocket

Push is used to alert in the moment. But when users open the app, they want to see the history — that's the job of the **in-app inbox**. It has two requirements: (1) fast lookup by user, (2) realtime update when a new message arrives.

Common architecture:

```
sequenceDiagram
    participant W as Worker
    participant DB as Postgres (inbox)
    participant R as Redis Pub/Sub
    participant GW as Realtime Gateway
    participant APP as Mobile/Web App
    W->>DB: INSERT inbox row
    W->>R: PUBLISH user:{id} new_msg
    R->>GW: subscription event
    GW->>APP: WebSocket/SSE push
    APP->>APP: Update badge count
    Note over APP: On user tap, call GET /inbox?limit=50
    APP->>DB: SELECT with cursor pagination

```

In-app inbox — persisted in DB, realtime via pub/sub

A few often-overlooked details:

- **Badge count** must be computed on the server. Don't rely on the client counting, multi-device will drift.
- **Mark-as-read** needs an event upstream so other devices stay in sync — opened on mobile, badge on web drops too.
- **Pagination must use cursors** (`WHERE created_at < :lastSeen`), not OFFSET — with a large inbox, OFFSET is painfully slow.
- TTL: inboxes older than 90 days can be archived to cold storage or deleted per data policy.

## 15. Observability — Metrics, Traces, and Analytics

An unobserved Notification Service will almost certainly fail silently: you still deliver 99%, but that 1% is your most important users. Measure three layers:

| Layer | Metric | Used for |
| --- | --- | --- |
| **Pipeline** | events_in/s, fanout_ratio, queue_lag, worker_throughput | System health monitoring, SRE alerts |
| **Channel** | send_rate, success_rate, bounce_rate, latency p50/p95/p99 | Provider comparison, alert on degradation |
| **Business** | delivery_rate, open_rate, CTR per template/campaign | Marketing and product content optimisation |

End-to-end traces should carry attributes `event.id`, `user.id`, `template.code`, `channel` so a specific message can be followed from ingest to callback. OpenTelemetry auto-instrumentation for Kafka and HTTP clients gets you most of the way with little config; the hard part is **setting attributes correctly** at the fanout point — where one event becomes N spans.

#### SLI/SLO for a Notification Service

Example: *99.5% of OTP SMS are acknowledged by the provider within 3 seconds of ingest*. Record the delta `delivered_at - event_created_at`, compute hourly percentiles, alert when >30% of the weekly error budget burns. That's how you turn "it seems fine" into a number you can defend to stakeholders.

## 16. Campaign Scheduler — Millions of Messages, Each User's Local Time

The tidy solution: **time-bucketed scheduler**.

```
flowchart LR
    CAMP["Campaign 'Weekly promo'  
send at 9:00 local"] --> BUCKET["Bucket by timezone"]
    BUCKET --> B1["bucket +7 (Asia/Ho_Chi_Minh)"]
    BUCKET --> B2["bucket 0 (UTC, London)"]
    BUCKET --> B3["bucket -5 (America/New_York)"]
    B1 --> C["Cron at 02:00 UTC = 09:00 VN"]
    B2 --> D["Cron at 09:00 UTC"]
    B3 --> E["Cron at 14:00 UTC"]
    C --> ENQ["Enqueue into Kafka"]
    D --> ENQ
    E --> ENQ

```

Time-bucketed scheduler for multi-timezone campaigns

## 17. Security and Abuse Prevention

Notifications are an attack surface people overlook. Until someone uses the internal API to flood SMS to a victim's phone number, or spoof emails from your company's domain. A few mandatory measures:

- **Authenticated producers**: only internal backend services (mTLS or OAuth2 service-to-service) may call the API. Never expose it directly to clients.
- **Template whitelisting**: the body must be an identified template code; no free-text sends. Locking down free text is how you prevent internal phishing.
- **Per-tenant rate limits**: each producer has its own quota. Stops a buggy service from collapsing the whole pipeline.
- **PII minimisation**: payloads contain only keys (userId, orderId). The worker resolves personal data from the user service itself. Logs must never print email/phone in plaintext.
- **DKIM, SPF, DMARC** for email; reputable sending IPs for SES/SendGrid; signed payloads (HMAC-SHA256) for outbound webhooks.
- **Hard opt-out honour**: when a user clicks unsubscribe, the worker must block at fanout, not merely hide the UI. Regulations such as CAN-SPAM, GDPR, and Vietnam's Decree 91/2020 all require this.

## 18. Realistic Capacity Planning

Reference numbers you can anchor your estimate to:

| Component | Per-node capacity | Notes |
| --- | --- | --- |
| API ingest (.NET 10 Minimal) | ~20–30k req/s | Bounded by CPU, validation, Redis I/O |
| Kafka broker | ~100MB/s write, 3 replicas | Tune batch size, ack=all for transactional |
| Fanout worker | ~2k events/s | With fanout ratio ~3, yields ~6k deliveries/s |
| FCM push | ~600 req/s per HTTP/2 connection | Scale with many connections + 500-token batch |
| SES email | 100/s default, rampable to 10k/s | Quota is per-account, request early |
| Twilio SMS | 10/s per phone number | More numbers for throughput, or Messaging Service |
| PostgreSQL delivery ledger | ~20k write/s with batching | Partition by day, proactive vacuuming |

## 19. Case Studies — How the Big Players Solve It

A few publicly documented architectures worth studying:

- **Slack**: the in-app inbox is the source of truth. Push is only a teaser. They use a "fanout-on-read" pattern for large channels: don't push to 10k members simultaneously; push based on presence and active subscribers.
- **Uber**: transactional (trip events) is completely separated from promotional, on a dedicated Kafka pipeline. Marketing runs in another service with a hard quota.
- **LinkedIn**: their "Air Traffic Controller" balances many notification types, preventing a user from receiving multiple messages on the same topic within 24 hours. This is the canonical lesson on *digest and frequency capping*.
- **Pinterest**: uses ML to predict when a user opens the app, sending exactly at that moment instead of spamming. Beautiful idea, but you need a large behavioural dataset before it's worth building.

## 20. Rollout Checklist for Your Team

#### Pre-launch

- Idempotency key convention agreed across producers.
- Template versioning and schema validation on by default.
- DLQ with replay tooling, on-call runbook with remediation steps.
- SLO defined per priority tier (OTP, transactional, marketing).
- Token refresh for chat/webhook, signing-key rotation.

#### First 90 days in production

- Track daily email bounce rate and SES complaint rate. Kill bad targets immediately.
- Audit PII logs, ensure no email/phone leaks in plaintext.
- Run game-days: simulate FCM/SES outages, verify graceful degradation.
- Review marketing-messages-per-user-per-week — if the median exceeds 5, opt-out risk soars.

## 21. Conclusion

Hopefully this article gives you a detailed enough map so you don't have to code and learn at the same time. The most important takeaways: **separate transactional from marketing, enforce idempotency at ingest, make templates schema-backed, honour quiet hours in the user's timezone, and observe every single message**. Those five principles alone separate a Notification Service that "works" from one that "can be trusted".

## References

- [Google Cloud — Building a large-scale notification system](https://cloud.google.com/architecture/large-scale-notification-system)
- [Firebase Cloud Messaging documentation](https://firebase.google.com/docs/cloud-messaging)
- [Apple Developer — Setting up a remote notification server (APNs)](https://developer.apple.com/documentation/usernotifications/setting_up_a_remote_notification_server)
- [Amazon SES — Email deliverability concepts](https://docs.aws.amazon.com/ses/latest/dg/send-email-concepts-deliverability.html)
- [Twilio — Messaging Services and high-throughput SMS](https://www.twilio.com/docs/messaging/services)
- [LinkedIn Engineering — Air Traffic Controller: Member-First Notifications](https://engineering.linkedin.com/blog/2016/03/air-traffic-controller--member-first-notifications-at-linkedin)
- [Slack Engineering — Messaging and inbox architecture](https://slack.engineering/rebuilding-slack-on-the-desktop/)
- [Uber Engineering — Real-time push platform](https://www.uber.com/en/blog/real-time-push-platform/)
- [Apache Kafka — Delivery semantics and exactly-once](https://kafka.apache.org/documentation/#semantics)
- [Microsoft Learn — Background tasks with IHostedService (.NET)](https://learn.microsoft.com/en-us/dotnet/architecture/microservices/multi-container-microservice-net-applications/background-tasks-with-ihostedservice)
- [OpenTelemetry — Messaging semantic conventions](https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/)

Blazor on .NET 10 in 2026 — Mastering Render Modes, Stream Rendering, and Enhanced Navigation for Full-stack C#

ClickHouse 2026 — Sub-second OLAP Architecture with SharedMergeTree, Parallel Replicas, and Storage-Compute Separation for Petabyte Analytics

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.