Notification System Design 2026 — Fanout, Priority Queue, Idempotency, and Template Engine for Millions of Push/Email/SMS per Day

Posted on: 4/17/2026 2:10:41 AM

1. Why Notifications Are Harder Than You Think

At first glance, sending a notification is just calling a Firebase or Twilio API. But when your system has millions of users, dozens of event types, three or four concurrent channels (push, email, SMS, in-app, webhook), and must respect quiet-hours in each user's timezone, the picture turns into a complex distributed system with a distinctive set of constraints: per-channel latency varies wildly, per-channel cost varies wildly, failure modes vary wildly, and the hardest constraint of all is that you must never send duplicates, yet you must never forget a single one.

This article dives deep into Notification Service architecture at millions-of-users scale: from data model, ingest pipeline, multi-channel fanout, template engine, personalised rate-limiting, idempotency, retry and DLQ, through to campaign effectiveness observability and unsubscribe/quiet-hours enforcement. The illustrated stack is .NET 10 plus Vue/Nuxt for the admin dashboard, but the principles apply to any stack — Java Spring, Node.js, Go, or Python.

4+Channels (push, email, SMS, in-app, webhook)
~50msIngest-to-queue p99
99.9%Delivery SLA achievable with retry
10M/dayPer-worker-node throughput

2. Functional and Non-Functional Requirements

Before sketching the architecture, you have to nail down what the system must do — and what it must not do. This is where many "ship fast" teams trip up: they only think about being able to send, not about not sending the wrong thing or not spamming the user.

Requirement groupContentConcrete example
FunctionalMulti-channel delivery driven by domain events (order, payment, promo, system)When an order transitions to Shipped, send push + email but not SMS
TemplateMulti-locale templating with variables, A/B testing, localisationYour order {{orderId}} has shipped, {{recipient.firstName}}
PriorityPriority tiers, transactional strictly separated from promotionalOTP P0, order updates P1, marketing P3
IdempotencyEach event must be delivered exactly once, regardless of upstream retriesProducer replays the same idempotencyKey ⇒ one delivery only
Throughput≥ 50k notifications/s peak for push, ≥ 10k/s for emailBlack Friday flash-sale push to 2 million users
LatencyOTP e2e ≤ 3s p99, order updates ≤ 30s, marketing unconstrainedInternational OTP SMS still under 5s
ReliabilityNo message loss on process crash, intelligent retry, replayable DLQKafka replication 3, manual offset checkpointing
PrivacyPer-channel and per-category user opt-out, respect quiet hours and timezoneNever send promo push after 22:00 local time
ObservabilityPer-channel/template/campaign metrics, end-to-end trace, open/click rateGrafana dashboard plus a data warehouse for marketing

3. Channel Map and Channel Characteristics

Every channel has its own behaviour. A "one-size-fits-all" design collapses because SMS costs money per message but arrives with near certainty, while push may be free but can be silently dropped the moment a user disables notifications or turns off background app refresh. These differences must be encoded into your send and retry strategy.

ChannelTypical providerLatencyCost/msgSuccess rateQuirks
Mobile pushAPNs (iOS), FCM (Android, Web)~1sFree80–95%Token expiration, notifications disabled, silent fail common
EmailAmazon SES, SendGrid, Postmark, Mailgun2–30s$0.0001–0.00195–99%Async bounce & complaint callbacks, IP/domain reputation matters
SMSTwilio, Vonage, local carrier2–15s$0.005–0.0597–99%Expensive, 160-char limit, per-country whitelisting required
In-appIn-house service + WebSocket/SSE~100msInternal infra~100% if onlineMust persist inbox for offline users
WebhookCustomer-hosted endpoint100ms–10sInternal infra90–99%Endpoint out of your control, need retry plus signed payload
ChatSlack, Teams, Zalo, Viber1–5sFree/Bot95–99%Strict rate limits, OAuth tokens need refresh

Don't mix marketing and transactional in one pipeline

OTPs and "today's discount" have wildly different SLAs. If you share a queue, one marketing campaign of 2 million SMS can push the OTP p99 from 30 seconds to 5 minutes — long enough for a user to abandon their cart. Always split at least two lanes: transactional (high priority, never throttled) and marketing (low priority, throttled when the system is loaded).

4. Overall Architecture

The heart of a Notification Service is an event-driven pipeline with a clear path from domain event to each send channel. Every component must be idempotent, independently restartable, and measurable at every hop.

flowchart LR
    subgraph Producers
      ORD["Order Service"]
      PAY["Payment Service"]
      AUTH["Auth Service (OTP)"]
      MKT["Marketing Campaign"]
    end

    Producers --> API["Notification API
(.NET 10 Minimal API)"] API --> VAL["Validator + Dedup
(Redis idempotency cache)"] VAL --> TOPIC{"Kafka topics"} TOPIC -->|transactional| WKT["Transactional Worker Pool"] TOPIC -->|marketing| WKM["Marketing Worker Pool"] WKT --> FAN["Fanout & Preferences"] WKM --> FAN FAN --> TMPL["Template Engine"] TMPL --> ROUTE{"Per-channel router"} ROUTE --> APNs ROUTE --> FCM ROUTE --> SES["SES / SendGrid"] ROUTE --> SMS["Twilio / Local SMS"] ROUTE --> WS["WebSocket / SSE"] ROUTE --> HOOK["Webhook dispatcher"] APNs --> CB1["Delivery callback"] FCM --> CB1 SES --> CB1 SMS --> CB1 CB1 --> LEDGER[("Delivery ledger
PostgreSQL / OLTP")] CB1 --> ANALYTICS[("Analytics warehouse")]
Notification System — event in, multi-channel out, every hop measured

A few important things about this diagram:

  • Topics split by priority: at minimum two topics, notif.transactional and notif.marketing. Separate worker pools, separate resources.
  • Fanout lives after ingest: the producer only needs to know that user X should be notified about event A. Fanning out to N channels × M devices happens inside the service — the producer shouldn't have to know how many devices the user owns.
  • Dedicated delivery ledger: an authoritative history table — who was sent what, when, with what result. This table feeds debugging, the user-facing notification history UI, and compliance.

5. Core Data Model

A Notification Service data schema doesn't have many tables, but the relationships between them determine how well it scales later. Here's the minimal model — it can grow when you need A/B testing, campaign scheduling, journey orchestration…

erDiagram
    USER ||--o{ DEVICE : owns
    USER ||--o{ PREFERENCE : has
    USER ||--o{ DELIVERY : receives
    TEMPLATE ||--o{ CAMPAIGN : used_in
    CAMPAIGN ||--o{ NOTIFICATION_EVENT : produces
    NOTIFICATION_EVENT ||--o{ DELIVERY : fans_out_to
    DEVICE ||--o{ DELIVERY : targeted_by
    CHANNEL ||--o{ DELIVERY : uses

    USER {
      uuid id
      string locale
      string timezone
    }
    DEVICE {
      uuid id
      uuid user_id
      string platform
      string push_token
      datetime last_seen
      bool active
    }
    PREFERENCE {
      uuid user_id
      string category
      string channel
      bool enabled
      time quiet_start
      time quiet_end
    }
    TEMPLATE {
      uuid id
      string code
      string channel
      string locale
      text body
      json schema
    }
    NOTIFICATION_EVENT {
      uuid id
      string idempotency_key
      string type
      int priority
      json payload
      datetime created_at
    }
    DELIVERY {
      uuid id
      uuid event_id
      uuid user_id
      string channel
      string provider_msg_id
      string status
      datetime sent_at
      datetime delivered_at
    }
Notification Service data schema — minimal to scale yet enough to audit

Design notes:

  • NOTIFICATION_EVENT.idempotency_key is the input-side dedup key, produced by the caller (e.g. order:1234:shipped). Insert with a UNIQUE constraint; on duplicate, reject with 200 OK returning the existing event.
  • DELIVERY is split from EVENT so that one event can produce many deliveries (push to device A, push to device B, email). Each delivery has its own lifecycle: queued → sent → delivered → opened → clicked.
  • PREFERENCE must be split down to category × channel. A user may want promo emails but not promo pushes. A single is_subscribed column is not enough.
  • Sharding: the DELIVERY table grows fast. From day one, shard by user_id or created_at (partition by day). Don't wait until 500 million rows to deal with it.

6. Ingest Pipeline — From API to Queue

Producers call the Notification Service API instead of calling FCM/SES directly. This centralises control: rate limiting, templates, preferences, idempotency, and audit all live in one place. The .NET 10 Minimal API snippet below illustrates a minimal but complete ingest endpoint:

app.MapPost("/v1/notifications", async (
    NotifyRequest req,
    IValidator<NotifyRequest> validator,
    IIdempotencyStore idem,
    IPublisher publisher,
    CancellationToken ct) =>
{
    // 1. Validate payload
    var result = await validator.ValidateAsync(req, ct);
    if (!result.IsValid) return Results.ValidationProblem(result.ToDictionary());

    // 2. Idempotency: if key already seen, return the existing event
    var existing = await idem.GetAsync(req.IdempotencyKey, ct);
    if (existing is not null) return Results.Accepted($"/v1/notifications/{existing.Id}", existing);

    // 3. Build event, pick topic by priority
    var evt = NotificationEvent.Create(req);
    var topic = req.Priority <= 1 ? "notif.transactional" : "notif.marketing";

    // 4. Publish to Kafka (transactional producer, exactly-once)
    await publisher.PublishAsync(topic, evt, ct);

    // 5. Cache idempotency for 24h
    await idem.SetAsync(req.IdempotencyKey, evt, TimeSpan.FromHours(24), ct);

    return Results.Accepted($"/v1/notifications/{evt.Id}", evt);
})
.RequireAuthorization("NotificationProducer")
.WithName("SubmitNotification");

Idempotency done right

Redis SET key value NX EX 86400 is enough on the hot path. If you need absolute certainty, pair it with a UNIQUE constraint in the DB — but avoid DB round-trips on the hot path; let the DB only catch duplicates that Redis missed during a cache flush. For critical events (OTP, payment), add a per-user sequence number to catch out-of-order caused by upstream multi-retry.

7. Fanout — One Event, Many Deliveries

A worker consuming the topic breaks one event into a list of deliveries by this formula:

delivery_set =
  (user.devices ∪ user.emails ∪ user.phone)
  ∩ template.channels
  ∩ user.preferences
  \ user.quiet_hours_violating_channels

In other words, you take the intersection of three sets and subtract channels that are currently in quiet hours. The result is a concrete list of (channel, target) pairs to send. A user with 2 phones + 1 email + a web-push subscription can blow up into 4 distinct deliveries from a single event.

public async Task<IReadOnlyList<Delivery>> FanoutAsync(NotificationEvent evt, CancellationToken ct)
{
    var user = await users.GetAsync(evt.UserId, ct);
    var prefs = await prefs.GetAsync(evt.UserId, evt.Category, ct);
    var template = await templates.GetAsync(evt.TemplateCode, user.Locale, ct);

    var deliveries = new List<Delivery>();
    var nowLocal = DateTimeOffset.UtcNow.ToTimeZone(user.TimeZone);

    foreach (var channel in template.Channels)
    {
        if (!prefs.IsEnabled(channel)) continue;
        if (prefs.InQuietHours(channel, nowLocal) && evt.Priority > 1) continue;

        foreach (var target in user.TargetsFor(channel))
        {
            deliveries.Add(Delivery.New(evt.Id, user.Id, channel, target, template));
        }
    }

    return deliveries;
}

Subtle but important: P0/P1 overrides quiet hours. Nobody wants to miss an OTP just because they happen to be asleep.

8. Template Engine — Separate Content from Code

Templates live in the DB and are preloaded into a cache (Redis or in-memory). Each template carries a schema validating its inputs: if an event is missing a variable, reject at ingest instead of letting a worker discover it and die midway.

code: order.shipped
locale: en-US
channel: push
body: |
  Hi {{recipient.firstName}}, your order {{orderId}} is on the way
  to {{shippingAddress.short}}. ETA {{eta | date:"HH:mm, MM/dd"}}.
schema:
  required: [recipient.firstName, orderId, shippingAddress.short, eta]

Template versioning and A/B testing

Each template carries a version. When updating, create a new version and keep the old one running. Route 10% of traffic to the new version for 24 hours, watch CTR and open rate in the warehouse. If it wins, cut over completely. This is the same principle as a feature flag, applied to content.

9. Priority Queue and Back-Pressure

The system will be overloaded occasionally. An upstream service fails and retries everything, a marketing campaign hits Send on 2 million users in one shot, an SMS carrier slows to 500ms/msg. Without priority and back-pressure, every layer suffers — from CPU to provider limits.

flowchart TB
    IN["Kafka:
notif.transactional
notif.marketing"] --> WK["Worker dispatcher"] WK --> P0["P0 pool (OTP, auth)
high concurrency, no throttle"] WK --> P1["P1 pool (order, payment)
moderate concurrency"] WK --> P3["P3 pool (marketing)
low concurrency, token-bucket throttled"] P0 --> PROV["Provider pool"] P1 --> PROV P3 --> PROV PROV --> RL["Global per-provider rate limit"] RL -->|block| RET["Retry queue (delay)"] RET -. after 2^n seconds .-> PROV
Separate pools per priority, plus retry with exponential backoff

A few rules distilled from real incidents:

  • Distinct worker pools per priority: use separate thread pools or separate processes, NOT shared. If shared, one marketing burst will push OTPs behind tens of thousands of messages.
  • Back-pressure from provider to queue: when SES returns 429, the worker must pause consumption for a window — don't blindly push to DLQ.
  • Per-provider token bucket: FCM 600 req/s, SES 100/s by default. Apply the limit at the worker layer so you don't get cut off by the provider.
  • Graceful degradation: if the primary SMS provider dies, fail over to secondary for P0/P1 but accept dropping P3. A marketing notification delayed an hour hurts nobody; an OTP delayed a minute loses you the order.

10. Retry, DLQ, and Self-Healing

Wrong channel, expired token, external endpoint timeout — all of these are either transient or permanent failures. Correctly distinguishing the two is the key to not spamming retries for nothing.

Failure typeExampleHandling
Transient5xx provider, timeout, rate-limitRetry with exponential backoff + jitter, capped at 5 attempts
PermanentInvalid token, email hard bounce, malformed phone numberDo not retry. Log it, disable the target, trigger cleanup
AmbiguousProvider returned 202 without a delivery statusRetry only if the delivery callback doesn't arrive within TTL
Critical bugMalformed template, worker crash loopPark in DLQ, page on-call, keep the main queue flowing

A DLQ isn't "where messages go to die". It must come with a replay tool. A simple CLI that lists, edits metadata, and replays into the main queue is enough for on-call to handle most incidents.

// Exponential backoff with jitter
public static TimeSpan NextRetryDelay(int attempt)
{
    var baseMs = Math.Min(30_000, 500 * Math.Pow(2, attempt));
    var jitter = Random.Shared.NextDouble() * 0.3; // ±30%
    return TimeSpan.FromMilliseconds(baseMs * (1 + jitter));
}

11. Dedup, Suppression, and Per-User Rate Limiting

No user should get 47 notifications in 10 minutes. But you don't want to hard-block either, because sometimes they genuinely need them (e.g. a sequence of order, payment, shipped events within seconds). The answer: per-user, per-category rate limits.

public async Task<bool> ShouldSuppressAsync(Guid userId, string category, CancellationToken ct)
{
    // Leaky bucket in Redis: 5 marketing pushes per hour
    var key = $"ratelimit:push:{userId}:{category}";
    var count = await redis.StringIncrementAsync(key);
    if (count == 1) await redis.KeyExpireAsync(key, TimeSpan.FromHours(1));
    return category == "marketing" && count > 5;
}

Pair it with the digest pattern: when you detect that you're about to breach the rate limit, instead of dropping, bundle 10 small notifications into one "You have 10 new updates" message. This pattern works wonders for social and collaboration apps.

12. Quiet Hours, Timezones, and Localisation

A project I once worked on pushed a promotional notification at 3am because the servers ran on UTC and the campaign was scheduled in UTC. Outcome: thousands of 1-star reviews. Lesson: every user-facing time must be interpreted in the user's timezone, never the server's.

Three minimum rules:

  • Store user.timezone as IANA (Asia/Ho_Chi_Minh), not a raw offset.
  • Default quiet hours 22:00–07:00 local for marketing (studies show very low open rates outside 8:00–21:00).
  • For "send to each user at 9:00 local" batch campaigns, you need a dedicated scheduler: split the campaign into buckets by timezone, enqueue each bucket at the right moment.

13. Delivery Callback — The Truth Is at the Provider

You call SES and get a 202 — don't assume you're done. 202 just means SES accepted. The email can bounce, trigger a complaint, arrive immediately, or end up in the promotions tab. The truth lives in the delivery callback the provider sends back to your webhook.

app.MapPost("/webhooks/ses", async (SesEvent evt, IDeliveryService svc, CancellationToken ct) =>
{
    // Verify SNS signature first
    if (!SesSignature.Verify(evt.RawPayload, evt.Signature)) return Results.Unauthorized();

    var deliveryId = evt.Tags["delivery_id"];
    var status = evt.Type switch
    {
        "Delivery" => DeliveryStatus.Delivered,
        "Bounce" => DeliveryStatus.Bounced,
        "Complaint" => DeliveryStatus.Complaint,
        "Open" => DeliveryStatus.Opened,
        "Click" => DeliveryStatus.Clicked,
        _ => DeliveryStatus.Unknown
    };
    await svc.UpdateAsync(Guid.Parse(deliveryId), status, evt.Timestamp, ct);
    return Results.Ok();
});

When the status is a hard Bounce, the cleanup worker must disable that email. Continuing to send will destroy your sender reputation — SES and SendGrid score this very quickly and you can be blocked from sending until you contact support.

14. In-App Inbox and Realtime via WebSocket

Push is used to alert in the moment. But when users open the app, they want to see the history — that's the job of the in-app inbox. It has two requirements: (1) fast lookup by user, (2) realtime update when a new message arrives.

Common architecture:

sequenceDiagram
    participant W as Worker
    participant DB as Postgres (inbox)
    participant R as Redis Pub/Sub
    participant GW as Realtime Gateway
    participant APP as Mobile/Web App
    W->>DB: INSERT inbox row
    W->>R: PUBLISH user:{id} new_msg
    R->>GW: subscription event
    GW->>APP: WebSocket/SSE push
    APP->>APP: Update badge count
    Note over APP: On user tap, call GET /inbox?limit=50
    APP->>DB: SELECT with cursor pagination
In-app inbox — persisted in DB, realtime via pub/sub

A few often-overlooked details:

  • Badge count must be computed on the server. Don't rely on the client counting, multi-device will drift.
  • Mark-as-read needs an event upstream so other devices stay in sync — opened on mobile, badge on web drops too.
  • Pagination must use cursors (WHERE created_at < :lastSeen), not OFFSET — with a large inbox, OFFSET is painfully slow.
  • TTL: inboxes older than 90 days can be archived to cold storage or deleted per data policy.

15. Observability — Metrics, Traces, and Analytics

An unobserved Notification Service will almost certainly fail silently: you still deliver 99%, but that 1% is your most important users. Measure three layers:

LayerMetricUsed for
Pipelineevents_in/s, fanout_ratio, queue_lag, worker_throughputSystem health monitoring, SRE alerts
Channelsend_rate, success_rate, bounce_rate, latency p50/p95/p99Provider comparison, alert on degradation
Businessdelivery_rate, open_rate, CTR per template/campaignMarketing and product content optimisation

End-to-end traces should carry attributes event.id, user.id, template.code, channel so a specific message can be followed from ingest to callback. OpenTelemetry auto-instrumentation for Kafka and HTTP clients gets you most of the way with little config; the hard part is setting attributes correctly at the fanout point — where one event becomes N spans.

SLI/SLO for a Notification Service

Example: 99.5% of OTP SMS are acknowledged by the provider within 3 seconds of ingest. Record the delta delivered_at - event_created_at, compute hourly percentiles, alert when >30% of the weekly error budget burns. That's how you turn "it seems fine" into a number you can defend to stakeholders.

16. Campaign Scheduler — Millions of Messages, Each User's Local Time

Marketing wants to send "Monday 9am local time" to 2 million users. Simple-sounding but users span dozens of timezones. The naïve approach — enqueue all 2 million at 00:00 UTC and have workers hold each message — isn't just memory-hungry, it can't survive a restart.

The tidy solution: time-bucketed scheduler.

flowchart LR
    CAMP["Campaign 'Weekly promo'
send at 9:00 local"] --> BUCKET["Bucket by timezone"] BUCKET --> B1["bucket +7 (Asia/Ho_Chi_Minh)"] BUCKET --> B2["bucket 0 (UTC, London)"] BUCKET --> B3["bucket -5 (America/New_York)"] B1 --> C["Cron at 02:00 UTC = 09:00 VN"] B2 --> D["Cron at 09:00 UTC"] B3 --> E["Cron at 14:00 UTC"] C --> ENQ["Enqueue into Kafka"] D --> ENQ E --> ENQ
Time-bucketed scheduler for multi-timezone campaigns

Inside each bucket, enqueue in ~10k-user batches at a fixed rate so you don't spike the provider. If a user changes timezone, re-check at read-time and shift buckets for that user — not the whole user base, just those few.

17. Security and Abuse Prevention

Notifications are an attack surface people overlook. Until someone uses the internal API to flood SMS to a victim's phone number, or spoof emails from your company's domain. A few mandatory measures:

  • Authenticated producers: only internal backend services (mTLS or OAuth2 service-to-service) may call the API. Never expose it directly to clients.
  • Template whitelisting: the body must be an identified template code; no free-text sends. Locking down free text is how you prevent internal phishing.
  • Per-tenant rate limits: each producer has its own quota. Stops a buggy service from collapsing the whole pipeline.
  • PII minimisation: payloads contain only keys (userId, orderId). The worker resolves personal data from the user service itself. Logs must never print email/phone in plaintext.
  • DKIM, SPF, DMARC for email; reputable sending IPs for SES/SendGrid; signed payloads (HMAC-SHA256) for outbound webhooks.
  • Hard opt-out honour: when a user clicks unsubscribe, the worker must block at fanout, not merely hide the UI. Regulations such as CAN-SPAM, GDPR, and Vietnam's Decree 91/2020 all require this.

18. Realistic Capacity Planning

Reference numbers you can anchor your estimate to:

ComponentPer-node capacityNotes
API ingest (.NET 10 Minimal)~20–30k req/sBounded by CPU, validation, Redis I/O
Kafka broker~100MB/s write, 3 replicasTune batch size, ack=all for transactional
Fanout worker~2k events/sWith fanout ratio ~3, yields ~6k deliveries/s
FCM push~600 req/s per HTTP/2 connectionScale with many connections + 500-token batch
SES email100/s default, rampable to 10k/sQuota is per-account, request early
Twilio SMS10/s per phone numberMore numbers for throughput, or Messaging Service
PostgreSQL delivery ledger~20k write/s with batchingPartition by day, proactive vacuuming

For 50 million notifications/day (~580/s average, 5k/s peak), a cluster of 3 Kafka brokers, 6–8 worker nodes, and 2 API nodes is comfortable. Don't ignore cost: 50M SMS × $0.02 = $1M/month. Shifting 80% of non-critical content off to free channels (push + in-app inbox) is a much bigger lever than tuning worker code.

19. Case Studies — How the Big Players Solve It

A few publicly documented architectures worth studying:

  • Slack: the in-app inbox is the source of truth. Push is only a teaser. They use a "fanout-on-read" pattern for large channels: don't push to 10k members simultaneously; push based on presence and active subscribers.
  • Uber: transactional (trip events) is completely separated from promotional, on a dedicated Kafka pipeline. Marketing runs in another service with a hard quota.
  • LinkedIn: their "Air Traffic Controller" balances many notification types, preventing a user from receiving multiple messages on the same topic within 24 hours. This is the canonical lesson on digest and frequency capping.
  • Pinterest: uses ML to predict when a user opens the app, sending exactly at that moment instead of spamming. Beautiful idea, but you need a large behavioural dataset before it's worth building.

20. Rollout Checklist for Your Team

Pre-launch

  • Idempotency key convention agreed across producers.
  • Template versioning and schema validation on by default.
  • DLQ with replay tooling, on-call runbook with remediation steps.
  • SLO defined per priority tier (OTP, transactional, marketing).
  • Token refresh for chat/webhook, signing-key rotation.

First 90 days in production

  • Track daily email bounce rate and SES complaint rate. Kill bad targets immediately.
  • Audit PII logs, ensure no email/phone leaks in plaintext.
  • Run game-days: simulate FCM/SES outages, verify graceful degradation.
  • Review marketing-messages-per-user-per-week — if the median exceeds 5, opt-out risk soars.

21. Conclusion

The Notification Service is one of the most underestimated backends out there. On the surface it's just "call FCM and SES", but as you go deeper, it's the sum of nearly every distributed-systems pattern: event-driven, idempotency, priority queue, retry with backoff, fanout, rate limiting, scheduling, observability, security. Building it right up front saves the team months of firefighting; building it in a rush means paying the price every Black Friday, every time a user reports "I didn't get my OTP".

Hopefully this article gives you a detailed enough map so you don't have to code and learn at the same time. The most important takeaways: separate transactional from marketing, enforce idempotency at ingest, make templates schema-backed, honour quiet hours in the user's timezone, and observe every single message. Those five principles alone separate a Notification Service that "works" from one that "can be trusted".

References