Webhook Design Patterns — Building Reliable Event Notification Systems

Posted on: 4/21/2026 2:13:47 PM

1. What Are Webhooks and Why Should You Care?

A webhook is an HTTP callback mechanism — when an event occurs in System A, it sends an HTTP POST request to a URL that System B has registered in advance. Unlike polling (where B repeatedly asks A "anything new?"), webhooks are push-based: A proactively notifies B the moment an event happens.

~85% SaaS platforms support webhooks (2026)
<500ms Average notification latency
10x Fewer API calls vs polling
99.9% Target delivery rate with retries

Major platforms like Stripe, GitHub, Shopify, and Twilio all use webhooks as the backbone of their integration ecosystem. When a payment succeeds on Stripe, a payment_intent.succeeded webhook fires to your server. When code is pushed to GitHub, a push webhook triggers your CI/CD pipeline.

Polling vs Webhook — The Classic Tradeoff

Suppose you need to know when an order is paid. With polling, your server calls the API every 5 seconds — that's 17,280 requests/day, 99% of which return nothing useful. With a webhook, you receive exactly 1 request when payment succeeds. Less load on both sides, near real-time response.

2. Webhook System Architecture Overview

A production-ready webhook system is far more than just "send an HTTP POST." It comprises multiple components working together to ensure reliability, security, and observability.

graph LR
    A[Event Source] -->|Publish| B[Event Queue]
    B --> C[Webhook Dispatcher]
    C -->|HTTP POST| D[Consumer Endpoint]
    D -->|2xx OK| E[Mark Delivered]
    D -->|4xx/5xx/Timeout| F[Retry Queue]
    F -->|Exponential Backoff| C
    F -->|Max retries exceeded| G[Dead Letter Queue]
    C --> H[Delivery Log]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#ff9800,stroke:#fff,color:#fff

Webhook system architecture with retry and dead letter queue

Key components:

  • Event Source: Where events originate (payment succeeded, user registered, file uploaded...)
  • Event Queue: Buffer between event source and dispatcher, ensuring no events are lost during traffic spikes
  • Webhook Dispatcher: Dequeues events, builds payloads, sends HTTP POST to consumers
  • Retry Queue: Holds failed deliveries, schedules retries with exponential backoff
  • Dead Letter Queue (DLQ): Stores events that exhausted all retry attempts — requires human intervention
  • Delivery Log: Complete send/receive history for debugging and auditing

3. Designing Webhook Payloads — Not Too Much, Not Too Little

There are two main schools of thought when designing payloads:

ApproachDescriptionProsCons
Fat payload Send all data in the webhook Consumer doesn't need additional API calls Large payload, risk of stale data
Thin payload Send only event type + resource ID Small payload, data always fresh Consumer must call API for details
Hybrid Event type + ID + snapshot of most commonly used fields Balance between convenience and performance Must clearly document which fields are included

Stripe uses the hybrid approach — sending an object snapshot in the webhook while recommending consumers call the API to verify:

{
  "id": "evt_1R2x3Y4Z5",
  "type": "payment_intent.succeeded",
  "created": 1713700000,
  "data": {
    "object": {
      "id": "pi_abc123",
      "amount": 5000,
      "currency": "usd",
      "status": "succeeded",
      "metadata": { "order_id": "ORD-2026-001" }
    }
  }
}

Best Practice: Always Include These Fields

id (unique event ID for idempotency), type (event type), created (timestamp), data (resource snapshot or ID). Add api_version if your API supports versioning.

4. Idempotency — The Most Critical Problem When Receiving Webhooks

Webhooks can be delivered more than once (at-least-once delivery). Network timeouts, retries, or the consumer returning 200 but the sender not receiving the response — all lead to duplicate deliveries. Consumers must handle this idempotently.

sequenceDiagram
    participant S as Webhook Sender
    participant C as Consumer
    participant DB as Database

    S->>C: POST /webhook (event_id: evt_001)
    C->>DB: Has evt_001 been processed?
    DB-->>C: Not found
    C->>DB: INSERT processed_events(evt_001)
    C->>DB: Execute business logic
    C-->>S: 200 OK

    Note over S: Timeout — never received 200

    S->>C: POST /webhook (event_id: evt_001) [RETRY]
    C->>DB: Has evt_001 been processed?
    DB-->>C: Already processed!
    C-->>S: 200 OK (skip processing)

Idempotency flow — event_id is the key to avoiding duplicate processing

Implementing idempotency in .NET:

public class WebhookController : ControllerBase
{
    [HttpPost("webhook")]
    public async Task<IActionResult> HandleWebhook(
        [FromBody] WebhookPayload payload)
    {
        // Step 1: Check idempotency
        var alreadyProcessed = await _db.ProcessedEvents
            .AnyAsync(e => e.EventId == payload.Id);

        if (alreadyProcessed)
            return Ok(); // Return 200 so sender doesn't retry

        // Step 2: Process within a transaction
        await using var transaction = await _db.Database
            .BeginTransactionAsync();

        try
        {
            _db.ProcessedEvents.Add(new ProcessedEvent
            {
                EventId = payload.Id,
                EventType = payload.Type,
                ProcessedAt = DateTime.UtcNow
            });

            await ProcessEvent(payload);
            await _db.SaveChangesAsync();
            await transaction.CommitAsync();

            return Ok();
        }
        catch
        {
            await transaction.RollbackAsync();
            return StatusCode(500);
        }
    }
}

Watch Out: Race Conditions with Concurrent Webhooks

If two requests with the same event_id arrive simultaneously, both check "not processed" → both INSERT. Solution: use a UNIQUE constraint on EventId and handle the duplicate key exception, or use a distributed lock (Redis SETNX) for more complex scenarios.

5. Retry Strategy — Exponential Backoff with Jitter

When delivery fails, you shouldn't retry immediately (thundering herd problem) or at fixed intervals (still causes spikes). The standard pattern is exponential backoff with jitter:

public class RetryPolicy
{
    private static readonly int[] BaseDelaysSeconds = { 10, 30, 60, 300, 900, 3600, 7200 };

    public static TimeSpan GetDelay(int attemptNumber)
    {
        var index = Math.Min(attemptNumber, BaseDelaysSeconds.Length - 1);
        var baseDelay = BaseDelaysSeconds[index];
        // Jitter: ±25% to prevent thundering herd
        var jitter = Random.Shared.NextDouble() * 0.5 + 0.75;
        return TimeSpan.FromSeconds(baseDelay * jitter);
    }
}
Retry #Base DelayWith Jitter (range)Purpose
110s7.5s – 12.5sTransient error (network blip)
230s22.5s – 37.5sService restarting
31 min45s – 75sMinor outage
45 min3.75m – 6.25mDeployment in progress
515 min11.25m – 18.75mModerate outage
61 hour45m – 75mExtended outage
72 hours1.5h – 2.5hLast attempt before DLQ

Total retry window: approximately 4-5 hours. Stripe retries for 72 hours, GitHub for 3 days — adjust based on your SLA.

6. Webhook Security — Signature Verification

Your webhook endpoint is a public URL — anyone who knows it can send fake requests. You must verify that requests come from a legitimate sender.

The most common pattern: HMAC-SHA256 signature.

graph LR
    A[Sender] -->|1. HMAC-SHA256 payload + secret| B[Signature]
    A -->|2. Send payload + signature header| C[Consumer]
    C -->|3. HMAC-SHA256 payload + shared secret| D[Expected Signature]
    C -->|4. Compare B == D?| E{Match?}
    E -->|Yes| F[Process]
    E -->|No| G[Reject 401]

    style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#ff9800,stroke:#fff,color:#fff

HMAC-SHA256 signature verification flow

public class WebhookSignatureValidator
{
    public static bool Validate(string payload, string signature,
        string secret)
    {
        using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(secret));
        var computedHash = hmac.ComputeHash(
            Encoding.UTF8.GetBytes(payload));
        var computedSignature = "sha256=" +
            Convert.ToHexString(computedHash).ToLowerInvariant();

        // Timing-safe comparison to prevent timing attacks
        return CryptographicOperations
            .FixedTimeEquals(
                Encoding.UTF8.GetBytes(computedSignature),
                Encoding.UTF8.GetBytes(signature));
    }
}

// In the controller:
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
    var payload = await new StreamReader(Request.Body)
        .ReadToEndAsync();
    var signature = Request.Headers["X-Webhook-Signature"]
        .FirstOrDefault();

    if (!WebhookSignatureValidator.Validate(
        payload, signature, _config["WebhookSecret"]))
        return Unauthorized();

    var data = JsonSerializer.Deserialize<WebhookPayload>(payload);
    // Continue processing...
}

Common Mistake: Using == to Compare Signatures

String comparison with == short-circuits at the first differing character — an attacker can measure response times to brute-force each byte (timing attack). Always use CryptographicOperations.FixedTimeEquals (.NET) or crypto.timingSafeEqual (Node.js).

7. Timestamp Validation — Preventing Replay Attacks

Signature verification alone isn't enough. An attacker can capture a valid request and replay it later. The solution: include a timestamp in the signed payload and reject old requests.

public bool IsTimestampValid(long webhookTimestamp,
    int toleranceSeconds = 300)
{
    var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
    return Math.Abs(now - webhookTimestamp) <= toleranceSeconds;
}

// Combined: sign = HMAC(timestamp + "." + payload)
// Header: X-Webhook-Signature: t=1713700000,v1=abc123...

Stripe uses exactly this pattern — with a default tolerance of 5 minutes (300 seconds). If a request is older than 5 minutes, it's rejected immediately even if the signature is valid.

8. Consumer-Side Processing — Respond Fast, Process Later

The golden rule: return 200 OK within 5 seconds. If business logic is complex, don't process it in the request handler — enqueue and process asynchronously.

graph LR
    A[Webhook Request] --> B[Controller]
    B -->|Verify signature| C{Valid?}
    C -->|No| D[Return 401]
    C -->|Yes| E[Save to local queue]
    E --> F[Return 200 OK]
    E --> G[Background Worker]
    G --> H[Process business logic]
    H --> I[Update database]
    H --> J[Send notifications]

    style A fill:#e94560,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff

"Accept then Process" pattern — return 200 first, process later

// Minimal controller — verify + enqueue only
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
    var payload = await ReadAndVerifySignature();
    if (payload == null) return Unauthorized();

    // Save raw event to database/queue
    await _db.WebhookEvents.AddAsync(new WebhookEvent
    {
        EventId = payload.Id,
        EventType = payload.Type,
        RawPayload = payload.RawJson,
        Status = "pending",
        ReceivedAt = DateTime.UtcNow
    });
    await _db.SaveChangesAsync();

    return Ok(); // Respond ASAP
}

// Background service for async processing
public class WebhookProcessorService : BackgroundService
{
    protected override async Task ExecuteAsync(
        CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            var pending = await _db.WebhookEvents
                .Where(e => e.Status == "pending")
                .OrderBy(e => e.ReceivedAt)
                .Take(50)
                .ToListAsync(ct);

            foreach (var evt in pending)
            {
                try
                {
                    await ProcessEvent(evt);
                    evt.Status = "processed";
                }
                catch (Exception ex)
                {
                    evt.Status = "failed";
                    evt.ErrorMessage = ex.Message;
                    evt.RetryCount++;
                }
            }

            await _db.SaveChangesAsync(ct);
            await Task.Delay(1000, ct);
        }
    }
}

9. Building a Webhook Sender

If you're building a platform that needs to provide webhooks to customers, here are the essential components:

9.1 Subscription Management

CREATE TABLE webhook_subscriptions (
    id BIGINT IDENTITY PRIMARY KEY,
    tenant_id BIGINT NOT NULL,
    url NVARCHAR(2048) NOT NULL,
    secret NVARCHAR(256) NOT NULL,
    events NVARCHAR(MAX) NOT NULL,  -- ["order.created","payment.succeeded"]
    is_active BIT DEFAULT 1,
    created_at DATETIME2 DEFAULT GETUTCDATE(),
    INDEX ix_tenant_active (tenant_id, is_active)
);

CREATE TABLE webhook_deliveries (
    id BIGINT IDENTITY PRIMARY KEY,
    subscription_id BIGINT FOREIGN KEY REFERENCES webhook_subscriptions(id),
    event_id NVARCHAR(128) NOT NULL,
    event_type NVARCHAR(128) NOT NULL,
    payload NVARCHAR(MAX),
    status NVARCHAR(20) DEFAULT 'pending',
    attempt_count INT DEFAULT 0,
    next_retry_at DATETIME2,
    last_response_code INT,
    last_response_body NVARCHAR(MAX),
    created_at DATETIME2 DEFAULT GETUTCDATE(),
    INDEX ix_status_retry (status, next_retry_at)
);

9.2 Circuit Breaker per Subscription

When a consumer's endpoint continuously fails, you shouldn't retry forever — apply the circuit breaker pattern to temporarily pause delivery and notify the consumer.

StateConditionBehavior
Closed (normal) Failure rate < 50% in last 10 minutes Send webhooks normally
Open (paused) 5 consecutive delivery failures Skip delivery, queue events, send alert email to consumer
Half-Open (probing) After 30 minutes in Open state Try sending 1 event: if OK → Closed, if fail → Open again

10. Monitoring & Observability

A webhook system without monitoring is like driving at night without headlights. Critical metrics to track:

P99 Latency Time from event to delivery (target: <2s)
Success Rate % first-attempt delivery success (target: >95%)
DLQ Size Events in dead letter queue (target: near 0)
Active Circuits Subscriptions currently circuit-broken
// Using .NET Metrics API
public class WebhookMetrics
{
    private static readonly Meter Meter = new("Webhook.Delivery");

    public static readonly Counter<long> DeliveryAttempts =
        Meter.CreateCounter<long>("webhook.delivery.attempts");

    public static readonly Counter<long> DeliverySuccesses =
        Meter.CreateCounter<long>("webhook.delivery.successes");

    public static readonly Histogram<double> DeliveryDuration =
        Meter.CreateHistogram<double>("webhook.delivery.duration_ms");

    public static readonly UpDownCounter<long> DlqSize =
        Meter.CreateUpDownCounter<long>("webhook.dlq.size");
}

11. Build vs Buy — Self-Built vs Managed Services

You don't always need to build webhook infrastructure from scratch. Here's a comparison to help you decide:

CriteriaSelf-builtManaged (Svix, Hookdeck)Cloud-native (Azure Event Grid, AWS SNS)
Initial cost High (2-4 weeks dev) Low (integrate in <1 day) Low (pay-per-use)
Customization Full control Limited to vendor API Moderate
Retry & DLQ Must implement yourself Built-in Built-in
Monitoring Build your own dashboard Dashboard + alerts included Integrated with CloudWatch/Monitor
Scale Depends on your infra Auto-scale Auto-scale, global
Best for Large teams, specific requirements Startups, fast MVPs Already on AWS/Azure

12. Production-Ready Webhook Checklist

Webhook Sender Checklist

✓ HMAC-SHA256 signature for every delivery
✓ Timestamp in signed payload (anti-replay)
✓ Exponential backoff with jitter for retries
✓ Circuit breaker per subscription
✓ Dead letter queue + alerting
✓ Queryable delivery log (at least 30 days)
✓ Rate limiting per subscription (avoid overwhelming consumer)
✓ API for consumers to view delivery history and manually retry
✓ Webhook testing endpoint (echo server)

Webhook Consumer Checklist

✓ Verify signature BEFORE parsing payload
✓ Validate timestamp (reject requests older than 5 minutes)
✓ Idempotent processing based on event_id
✓ Respond 200 within <5 seconds, process async
✓ HTTPS endpoint mandatory
✓ Handle out-of-order delivery (event B arrives before A)
✓ Log every received webhook for debugging
✓ Alert when processing failure rate increases

Webhooks seem simple on the surface, but getting them right is far from easy. From idempotency and signature verification to retry strategies and circuit breakers — each layer has its own pitfalls. This guide gives you a complete blueprint for implementing production-grade webhooks in your systems.

References