Webhook Design Patterns — Building Reliable Event Notification Systems

Posted on: 4/21/2026 2:13:47 PM

Table of contents

1. What Are Webhooks and Why Should You Care?
1. Polling vs Webhook — The Classic Tradeoff
2. Webhook System Architecture Overview
3. Designing Webhook Payloads — Not Too Much, Not Too Little
1. Best Practice: Always Include These Fields
4. Idempotency — The Most Critical Problem When Receiving Webhooks
1. Watch Out: Race Conditions with Concurrent Webhooks
5. Retry Strategy — Exponential Backoff with Jitter
6. Webhook Security — Signature Verification
1. Common Mistake: Using == to Compare Signatures
7. Timestamp Validation — Preventing Replay Attacks
8. Consumer-Side Processing — Respond Fast, Process Later
9. Building a Webhook Sender
1. 9.1 Subscription Management
2. 9.2 Circuit Breaker per Subscription
10. Monitoring & Observability
11. Build vs Buy — Self-Built vs Managed Services
12. Production-Ready Webhook Checklist
1. Webhook Sender Checklist
2. Webhook Consumer Checklist
References

1. What Are Webhooks and Why Should You Care?

A webhook is an HTTP callback mechanism — when an event occurs in System A, it sends an HTTP POST request to a URL that System B has registered in advance. Unlike polling (where B repeatedly asks A "anything new?"), webhooks are push-based: A proactively notifies B the moment an event happens.

~85% SaaS platforms support webhooks (2026)

<500ms Average notification latency

10x Fewer API calls vs polling

99.9% Target delivery rate with retries

Major platforms like Stripe, GitHub, Shopify, and Twilio all use webhooks as the backbone of their integration ecosystem. When a payment succeeds on Stripe, a payment_intent.succeeded webhook fires to your server. When code is pushed to GitHub, a push webhook triggers your CI/CD pipeline.

Polling vs Webhook — The Classic Tradeoff

Suppose you need to know when an order is paid. With polling, your server calls the API every 5 seconds — that's 17,280 requests/day, 99% of which return nothing useful. With a webhook, you receive exactly 1 request when payment succeeds. Less load on both sides, near real-time response.

2. Webhook System Architecture Overview

A production-ready webhook system is far more than just "send an HTTP POST." It comprises multiple components working together to ensure reliability, security, and observability.

graph LR
    A[Event Source] -->|Publish| B[Event Queue]
    B --> C[Webhook Dispatcher]
    C -->|HTTP POST| D[Consumer Endpoint]
    D -->|2xx OK| E[Mark Delivered]
    D -->|4xx/5xx/Timeout| F[Retry Queue]
    F -->|Exponential Backoff| C
    F -->|Max retries exceeded| G[Dead Letter Queue]
    C --> H[Delivery Log]

    style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#ff9800,stroke:#fff,color:#fff

Webhook system architecture with retry and dead letter queue

Key components:

Event Source: Where events originate (payment succeeded, user registered, file uploaded...)
Event Queue: Buffer between event source and dispatcher, ensuring no events are lost during traffic spikes
Webhook Dispatcher: Dequeues events, builds payloads, sends HTTP POST to consumers
Retry Queue: Holds failed deliveries, schedules retries with exponential backoff
Dead Letter Queue (DLQ): Stores events that exhausted all retry attempts — requires human intervention
Delivery Log: Complete send/receive history for debugging and auditing

3. Designing Webhook Payloads — Not Too Much, Not Too Little

There are two main schools of thought when designing payloads:

Approach	Description	Pros	Cons
Fat payload	Send all data in the webhook	Consumer doesn't need additional API calls	Large payload, risk of stale data
Thin payload	Send only event type + resource ID	Small payload, data always fresh	Consumer must call API for details
Hybrid ⭐	Event type + ID + snapshot of most commonly used fields	Balance between convenience and performance	Must clearly document which fields are included

Stripe uses the hybrid approach — sending an object snapshot in the webhook while recommending consumers call the API to verify:

{
  "id": "evt_1R2x3Y4Z5",
  "type": "payment_intent.succeeded",
  "created": 1713700000,
  "data": {
    "object": {
      "id": "pi_abc123",
      "amount": 5000,
      "currency": "usd",
      "status": "succeeded",
      "metadata": { "order_id": "ORD-2026-001" }
    }
  }
}

Best Practice: Always Include These Fields

id (unique event ID for idempotency), type (event type), created (timestamp), data (resource snapshot or ID). Add api_version if your API supports versioning.

4. Idempotency — The Most Critical Problem When Receiving Webhooks

Webhooks can be delivered more than once (at-least-once delivery). Network timeouts, retries, or the consumer returning 200 but the sender not receiving the response — all lead to duplicate deliveries. Consumers must handle this idempotently.

sequenceDiagram
    participant S as Webhook Sender
    participant C as Consumer
    participant DB as Database

    S->>C: POST /webhook (event_id: evt_001)
    C->>DB: Has evt_001 been processed?
    DB-->>C: Not found
    C->>DB: INSERT processed_events(evt_001)
    C->>DB: Execute business logic
    C-->>S: 200 OK

    Note over S: Timeout — never received 200

    S->>C: POST /webhook (event_id: evt_001) [RETRY]
    C->>DB: Has evt_001 been processed?
    DB-->>C: Already processed!
    C-->>S: 200 OK (skip processing)

Idempotency flow — event_id is the key to avoiding duplicate processing

Implementing idempotency in .NET:

public class WebhookController : ControllerBase
{
    [HttpPost("webhook")]
    public async Task<IActionResult> HandleWebhook(
        [FromBody] WebhookPayload payload)
    {
        // Step 1: Check idempotency
        var alreadyProcessed = await _db.ProcessedEvents
            .AnyAsync(e => e.EventId == payload.Id);

        if (alreadyProcessed)
            return Ok(); // Return 200 so sender doesn't retry

        // Step 2: Process within a transaction
        await using var transaction = await _db.Database
            .BeginTransactionAsync();

        try
        {
            _db.ProcessedEvents.Add(new ProcessedEvent
            {
                EventId = payload.Id,
                EventType = payload.Type,
                ProcessedAt = DateTime.UtcNow
            });

            await ProcessEvent(payload);
            await _db.SaveChangesAsync();
            await transaction.CommitAsync();

            return Ok();
        }
        catch
        {
            await transaction.RollbackAsync();
            return StatusCode(500);
        }
    }
}

Watch Out: Race Conditions with Concurrent Webhooks

If two requests with the same event_id arrive simultaneously, both check "not processed" → both INSERT. Solution: use a UNIQUE constraint on EventId and handle the duplicate key exception, or use a distributed lock (Redis SETNX) for more complex scenarios.

5. Retry Strategy — Exponential Backoff with Jitter

When delivery fails, you shouldn't retry immediately (thundering herd problem) or at fixed intervals (still causes spikes). The standard pattern is exponential backoff with jitter:

public class RetryPolicy
{
    private static readonly int[] BaseDelaysSeconds = { 10, 30, 60, 300, 900, 3600, 7200 };

    public static TimeSpan GetDelay(int attemptNumber)
    {
        var index = Math.Min(attemptNumber, BaseDelaysSeconds.Length - 1);
        var baseDelay = BaseDelaysSeconds[index];
        // Jitter: ±25% to prevent thundering herd
        var jitter = Random.Shared.NextDouble() * 0.5 + 0.75;
        return TimeSpan.FromSeconds(baseDelay * jitter);
    }
}

Retry #	Base Delay	With Jitter (range)	Purpose
1	10s	7.5s – 12.5s	Transient error (network blip)
2	30s	22.5s – 37.5s	Service restarting
3	1 min	45s – 75s	Minor outage
4	5 min	3.75m – 6.25m	Deployment in progress
5	15 min	11.25m – 18.75m	Moderate outage
6	1 hour	45m – 75m	Extended outage
7	2 hours	1.5h – 2.5h	Last attempt before DLQ

Total retry window: approximately 4-5 hours. Stripe retries for 72 hours, GitHub for 3 days — adjust based on your SLA.

6. Webhook Security — Signature Verification

Your webhook endpoint is a public URL — anyone who knows it can send fake requests. You must verify that requests come from a legitimate sender.

The most common pattern: HMAC-SHA256 signature.

graph LR
    A[Sender] -->|1. HMAC-SHA256 payload + secret| B[Signature]
    A -->|2. Send payload + signature header| C[Consumer]
    C -->|3. HMAC-SHA256 payload + shared secret| D[Expected Signature]
    C -->|4. Compare B == D?| E{Match?}
    E -->|Yes| F[Process]
    E -->|No| G[Reject 401]

    style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#ff9800,stroke:#fff,color:#fff

HMAC-SHA256 signature verification flow

public class WebhookSignatureValidator
{
    public static bool Validate(string payload, string signature,
        string secret)
    {
        using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(secret));
        var computedHash = hmac.ComputeHash(
            Encoding.UTF8.GetBytes(payload));
        var computedSignature = "sha256=" +
            Convert.ToHexString(computedHash).ToLowerInvariant();

        // Timing-safe comparison to prevent timing attacks
        return CryptographicOperations
            .FixedTimeEquals(
                Encoding.UTF8.GetBytes(computedSignature),
                Encoding.UTF8.GetBytes(signature));
    }
}

// In the controller:
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
    var payload = await new StreamReader(Request.Body)
        .ReadToEndAsync();
    var signature = Request.Headers["X-Webhook-Signature"]
        .FirstOrDefault();

    if (!WebhookSignatureValidator.Validate(
        payload, signature, _config["WebhookSecret"]))
        return Unauthorized();

    var data = JsonSerializer.Deserialize<WebhookPayload>(payload);
    // Continue processing...
}

Common Mistake: Using == to Compare Signatures

String comparison with == short-circuits at the first differing character — an attacker can measure response times to brute-force each byte (timing attack). Always use CryptographicOperations.FixedTimeEquals (.NET) or crypto.timingSafeEqual (Node.js).

7. Timestamp Validation — Preventing Replay Attacks

Signature verification alone isn't enough. An attacker can capture a valid request and replay it later. The solution: include a timestamp in the signed payload and reject old requests.

public bool IsTimestampValid(long webhookTimestamp,
    int toleranceSeconds = 300)
{
    var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
    return Math.Abs(now - webhookTimestamp) <= toleranceSeconds;
}

// Combined: sign = HMAC(timestamp + "." + payload)
// Header: X-Webhook-Signature: t=1713700000,v1=abc123...

Stripe uses exactly this pattern — with a default tolerance of 5 minutes (300 seconds). If a request is older than 5 minutes, it's rejected immediately even if the signature is valid.

8. Consumer-Side Processing — Respond Fast, Process Later

The golden rule: return 200 OK within 5 seconds. If business logic is complex, don't process it in the request handler — enqueue and process asynchronously.

graph LR
    A[Webhook Request] --> B[Controller]
    B -->|Verify signature| C{Valid?}
    C -->|No| D[Return 401]
    C -->|Yes| E[Save to local queue]
    E --> F[Return 200 OK]
    E --> G[Background Worker]
    G --> H[Process business logic]
    H --> I[Update database]
    H --> J[Send notifications]

    style A fill:#e94560,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff

"Accept then Process" pattern — return 200 first, process later

// Minimal controller — verify + enqueue only
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
    var payload = await ReadAndVerifySignature();
    if (payload == null) return Unauthorized();

    // Save raw event to database/queue
    await _db.WebhookEvents.AddAsync(new WebhookEvent
    {
        EventId = payload.Id,
        EventType = payload.Type,
        RawPayload = payload.RawJson,
        Status = "pending",
        ReceivedAt = DateTime.UtcNow
    });
    await _db.SaveChangesAsync();

    return Ok(); // Respond ASAP
}

// Background service for async processing
public class WebhookProcessorService : BackgroundService
{
    protected override async Task ExecuteAsync(
        CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            var pending = await _db.WebhookEvents
                .Where(e => e.Status == "pending")
                .OrderBy(e => e.ReceivedAt)
                .Take(50)
                .ToListAsync(ct);

            foreach (var evt in pending)
            {
                try
                {
                    await ProcessEvent(evt);
                    evt.Status = "processed";
                }
                catch (Exception ex)
                {
                    evt.Status = "failed";
                    evt.ErrorMessage = ex.Message;
                    evt.RetryCount++;
                }
            }

            await _db.SaveChangesAsync(ct);
            await Task.Delay(1000, ct);
        }
    }
}

9. Building a Webhook Sender

If you're building a platform that needs to provide webhooks to customers, here are the essential components:

9.1 Subscription Management

CREATE TABLE webhook_subscriptions (
    id BIGINT IDENTITY PRIMARY KEY,
    tenant_id BIGINT NOT NULL,
    url NVARCHAR(2048) NOT NULL,
    secret NVARCHAR(256) NOT NULL,
    events NVARCHAR(MAX) NOT NULL,  -- ["order.created","payment.succeeded"]
    is_active BIT DEFAULT 1,
    created_at DATETIME2 DEFAULT GETUTCDATE(),
    INDEX ix_tenant_active (tenant_id, is_active)
);

CREATE TABLE webhook_deliveries (
    id BIGINT IDENTITY PRIMARY KEY,
    subscription_id BIGINT FOREIGN KEY REFERENCES webhook_subscriptions(id),
    event_id NVARCHAR(128) NOT NULL,
    event_type NVARCHAR(128) NOT NULL,
    payload NVARCHAR(MAX),
    status NVARCHAR(20) DEFAULT 'pending',
    attempt_count INT DEFAULT 0,
    next_retry_at DATETIME2,
    last_response_code INT,
    last_response_body NVARCHAR(MAX),
    created_at DATETIME2 DEFAULT GETUTCDATE(),
    INDEX ix_status_retry (status, next_retry_at)
);

9.2 Circuit Breaker per Subscription

When a consumer's endpoint continuously fails, you shouldn't retry forever — apply the circuit breaker pattern to temporarily pause delivery and notify the consumer.

State	Condition	Behavior
Closed (normal)	Failure rate < 50% in last 10 minutes	Send webhooks normally
Open (paused)	5 consecutive delivery failures	Skip delivery, queue events, send alert email to consumer
Half-Open (probing)	After 30 minutes in Open state	Try sending 1 event: if OK → Closed, if fail → Open again

10. Monitoring & Observability

A webhook system without monitoring is like driving at night without headlights. Critical metrics to track:

P99 Latency Time from event to delivery (target: <2s)

Success Rate % first-attempt delivery success (target: >95%)

DLQ Size Events in dead letter queue (target: near 0)

Active Circuits Subscriptions currently circuit-broken

// Using .NET Metrics API
public class WebhookMetrics
{
    private static readonly Meter Meter = new("Webhook.Delivery");

    public static readonly Counter<long> DeliveryAttempts =
        Meter.CreateCounter<long>("webhook.delivery.attempts");

    public static readonly Counter<long> DeliverySuccesses =
        Meter.CreateCounter<long>("webhook.delivery.successes");

    public static readonly Histogram<double> DeliveryDuration =
        Meter.CreateHistogram<double>("webhook.delivery.duration_ms");

    public static readonly UpDownCounter<long> DlqSize =
        Meter.CreateUpDownCounter<long>("webhook.dlq.size");
}

11. Build vs Buy — Self-Built vs Managed Services

You don't always need to build webhook infrastructure from scratch. Here's a comparison to help you decide:

Criteria	Self-built	Managed (Svix, Hookdeck)	Cloud-native (Azure Event Grid, AWS SNS)
Initial cost	High (2-4 weeks dev)	Low (integrate in <1 day)	Low (pay-per-use)
Customization	Full control	Limited to vendor API	Moderate
Retry & DLQ	Must implement yourself	Built-in	Built-in
Monitoring	Build your own dashboard	Dashboard + alerts included	Integrated with CloudWatch/Monitor
Scale	Depends on your infra	Auto-scale	Auto-scale, global
Best for	Large teams, specific requirements	Startups, fast MVPs	Already on AWS/Azure

12. Production-Ready Webhook Checklist

Webhook Sender Checklist

✓ HMAC-SHA256 signature for every delivery
✓ Timestamp in signed payload (anti-replay)
✓ Exponential backoff with jitter for retries
✓ Circuit breaker per subscription
✓ Dead letter queue + alerting
✓ Queryable delivery log (at least 30 days)
✓ Rate limiting per subscription (avoid overwhelming consumer)
✓ API for consumers to view delivery history and manually retry
✓ Webhook testing endpoint (echo server)

Webhook Consumer Checklist

✓ Verify signature BEFORE parsing payload
✓ Validate timestamp (reject requests older than 5 minutes)
✓ Idempotent processing based on event_id
✓ Respond 200 within <5 seconds, process async
✓ HTTPS endpoint mandatory
✓ Handle out-of-order delivery (event B arrives before A)
✓ Log every received webhook for debugging
✓ Alert when processing failure rate increases

Webhooks seem simple on the surface, but getting them right is far from easy. From idempotency and signature verification to retry strategies and circuit breakers — each layer has its own pitfalls. This guide gives you a complete blueprint for implementing production-grade webhooks in your systems.

References

#system design #API #Webhook #Security #.NET

# Webhook Design Patterns — Building Reliable Event Notification Systems

## 1. What Are Webhooks and Why Should You Care?

A webhook is an **HTTP callback** mechanism — when an event occurs in System A, it sends an HTTP POST request to a URL that System B has registered in advance. Unlike polling (where B repeatedly asks A "anything new?"), webhooks are **push-based**: A proactively notifies B the moment an event happens.

~85% SaaS platforms support webhooks (2026)

<500ms Average notification latency

10x Fewer API calls vs polling

99.9% Target delivery rate with retries

Major platforms like Stripe, GitHub, Shopify, and Twilio all use webhooks as the backbone of their integration ecosystem. When a payment succeeds on Stripe, a `payment_intent.succeeded` webhook fires to your server. When code is pushed to GitHub, a `push` webhook triggers your CI/CD pipeline.

#### Polling vs Webhook — The Classic Tradeoff

Suppose you need to know when an order is paid. With **polling**, your server calls the API every 5 seconds — that's 17,280 requests/day, 99% of which return nothing useful. With a **webhook**, you receive exactly 1 request when payment succeeds. Less load on both sides, near real-time response.

## 2. Webhook System Architecture Overview

A production-ready webhook system is far more than just "send an HTTP POST." It comprises multiple components working together to ensure **reliability**, **security**, and **observability**.

style A fill:#e94560,stroke:#fff,color:#fff
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style G fill:#ff9800,stroke:#fff,color:#fff

```
Webhook system architecture with retry and dead letter queue

Key components:

- **Event Source:** Where events originate (payment succeeded, user registered, file uploaded...)
- **Event Queue:** Buffer between event source and dispatcher, ensuring no events are lost during traffic spikes
- **Webhook Dispatcher:** Dequeues events, builds payloads, sends HTTP POST to consumers
- **Retry Queue:** Holds failed deliveries, schedules retries with exponential backoff
- **Dead Letter Queue (DLQ):** Stores events that exhausted all retry attempts — requires human intervention
- **Delivery Log:** Complete send/receive history for debugging and auditing

## 3. Designing Webhook Payloads — Not Too Much, Not Too Little

There are two main schools of thought when designing payloads:

| Approach | Description | Pros | Cons |
| --- | --- | --- | --- |
| **Fat payload** | Send all data in the webhook | Consumer doesn't need additional API calls | Large payload, risk of stale data |
| **Thin payload** | Send only event type + resource ID | Small payload, data always fresh | Consumer must call API for details |
| **Hybrid** ⭐ | Event type + ID + snapshot of most commonly used fields | Balance between convenience and performance | Must clearly document which fields are included |

Stripe uses the **hybrid approach** — sending an object snapshot in the webhook while recommending consumers call the API to verify:

```json
{
  "id": "evt_1R2x3Y4Z5",
  "type": "payment_intent.succeeded",
  "created": 1713700000,
  "data": {
    "object": {
      "id": "pi_abc123",
      "amount": 5000,
      "currency": "usd",
      "status": "succeeded",
      "metadata": { "order_id": "ORD-2026-001" }
    }
  }
}
```

#### Best Practice: Always Include These Fields

`id` (unique event ID for idempotency), `type` (event type), `created` (timestamp), `data` (resource snapshot or ID). Add `api_version` if your API supports versioning.

## 4. Idempotency — The Most Critical Problem When Receiving Webhooks

Webhooks can be delivered **more than once** (at-least-once delivery). Network timeouts, retries, or the consumer returning 200 but the sender not receiving the response — all lead to duplicate deliveries. Consumers **must** handle this idempotently.

```
sequenceDiagram
    participant S as Webhook Sender
    participant C as Consumer
    participant DB as Database

S->>C: POST /webhook (event_id: evt_001)
    C->>DB: Has evt_001 been processed?
    DB-->>C: Not found
    C->>DB: INSERT processed_events(evt_001)
    C->>DB: Execute business logic
    C-->>S: 200 OK

Note over S: Timeout — never received 200

S->>C: POST /webhook (event_id: evt_001) [RETRY]
    C->>DB: Has evt_001 been processed?
    DB-->>C: Already processed!
    C-->>S: 200 OK (skip processing)

```
Idempotency flow — event_id is the key to avoiding duplicate processing

Implementing idempotency in .NET:

```csharp
public class WebhookController : ControllerBase
{
    [HttpPost("webhook")]
    public async Task<IActionResult> HandleWebhook(
        [FromBody] WebhookPayload payload)
    {
        // Step 1: Check idempotency
        var alreadyProcessed = await _db.ProcessedEvents
            .AnyAsync(e => e.EventId == payload.Id);

if (alreadyProcessed)
            return Ok(); // Return 200 so sender doesn't retry

// Step 2: Process within a transaction
        await using var transaction = await _db.Database
            .BeginTransactionAsync();

try
        {
            _db.ProcessedEvents.Add(new ProcessedEvent
            {
                EventId = payload.Id,
                EventType = payload.Type,
                ProcessedAt = DateTime.UtcNow
            });

await ProcessEvent(payload);
            await _db.SaveChangesAsync();
            await transaction.CommitAsync();

return Ok();
        }
        catch
        {
            await transaction.RollbackAsync();
            return StatusCode(500);
        }
    }
}
```

#### Watch Out: Race Conditions with Concurrent Webhooks

If two requests with the same event_id arrive simultaneously, both check "not processed" → both INSERT. Solution: use a **UNIQUE constraint** on `EventId` and handle the duplicate key exception, or use a **distributed lock** (Redis SETNX) for more complex scenarios.

## 5. Retry Strategy — Exponential Backoff with Jitter

When delivery fails, you shouldn't retry immediately (thundering herd problem) or at fixed intervals (still causes spikes). The standard pattern is **exponential backoff with jitter**:

```csharp
public class RetryPolicy
{
    private static readonly int[] BaseDelaysSeconds = { 10, 30, 60, 300, 900, 3600, 7200 };

public static TimeSpan GetDelay(int attemptNumber)
    {
        var index = Math.Min(attemptNumber, BaseDelaysSeconds.Length - 1);
        var baseDelay = BaseDelaysSeconds[index];
        // Jitter: ±25% to prevent thundering herd
        var jitter = Random.Shared.NextDouble() * 0.5 + 0.75;
        return TimeSpan.FromSeconds(baseDelay * jitter);
    }
}
```

| Retry # | Base Delay | With Jitter (range) | Purpose |
| --- | --- | --- | --- |
| 1 | 10s | 7.5s – 12.5s | Transient error (network blip) |
| 2 | 30s | 22.5s – 37.5s | Service restarting |
| 3 | 1 min | 45s – 75s | Minor outage |
| 4 | 5 min | 3.75m – 6.25m | Deployment in progress |
| 5 | 15 min | 11.25m – 18.75m | Moderate outage |
| 6 | 1 hour | 45m – 75m | Extended outage |
| 7 | 2 hours | 1.5h – 2.5h | Last attempt before DLQ |

Total retry window: approximately **4-5 hours**. Stripe retries for 72 hours, GitHub for 3 days — adjust based on your SLA.

## 6. Webhook Security — Signature Verification

Your webhook endpoint is a public URL — anyone who knows it can send fake requests. You **must** verify that requests come from a legitimate sender.

The most common pattern: **HMAC-SHA256 signature**.

```
graph LR
    A[Sender] -->|1. HMAC-SHA256 payload + secret| B[Signature]
    A -->|2. Send payload + signature header| C[Consumer]
    C -->|3. HMAC-SHA256 payload + shared secret| D[Expected Signature]
    C -->|4. Compare B == D?| E{Match?}
    E -->|Yes| F[Process]
    E -->|No| G[Reject 401]

style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#ff9800,stroke:#fff,color:#fff

```
HMAC-SHA256 signature verification flow

```csharp
public class WebhookSignatureValidator
{
    public static bool Validate(string payload, string signature,
        string secret)
    {
        using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(secret));
        var computedHash = hmac.ComputeHash(
            Encoding.UTF8.GetBytes(payload));
        var computedSignature = "sha256=" +
            Convert.ToHexString(computedHash).ToLowerInvariant();

// Timing-safe comparison to prevent timing attacks
        return CryptographicOperations
            .FixedTimeEquals(
                Encoding.UTF8.GetBytes(computedSignature),
                Encoding.UTF8.GetBytes(signature));
    }
}

// In the controller:
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
    var payload = await new StreamReader(Request.Body)
        .ReadToEndAsync();
    var signature = Request.Headers["X-Webhook-Signature"]
        .FirstOrDefault();

if (!WebhookSignatureValidator.Validate(
        payload, signature, _config["WebhookSecret"]))
        return Unauthorized();

var data = JsonSerializer.Deserialize<WebhookPayload>(payload);
    // Continue processing...
}
```

#### Common Mistake: Using == to Compare Signatures

String comparison with `==` short-circuits at the first differing character — an attacker can measure response times to brute-force each byte (timing attack). Always use `CryptographicOperations.FixedTimeEquals` (.NET) or `crypto.timingSafeEqual` (Node.js).

## 7. Timestamp Validation — Preventing Replay Attacks

Signature verification alone isn't enough. An attacker can **capture a valid request** and replay it later. The solution: include a timestamp in the signed payload and reject old requests.

```csharp
public bool IsTimestampValid(long webhookTimestamp,
    int toleranceSeconds = 300)
{
    var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
    return Math.Abs(now - webhookTimestamp) <= toleranceSeconds;
}

// Combined: sign = HMAC(timestamp + "." + payload)
// Header: X-Webhook-Signature: t=1713700000,v1=abc123...
```
Stripe uses exactly this pattern — with a default tolerance of 5 minutes (300 seconds). If a request is older than 5 minutes, it's rejected immediately even if the signature is valid.

## 8. Consumer-Side Processing — Respond Fast, Process Later

The golden rule: **return 200 OK within 5 seconds**. If business logic is complex, don't process it in the request handler — enqueue and process asynchronously.

```
graph LR
    A[Webhook Request] --> B[Controller]
    B -->|Verify signature| C{Valid?}
    C -->|No| D[Return 401]
    C -->|Yes| E[Save to local queue]
    E --> F[Return 200 OK]
    E --> G[Background Worker]
    G --> H[Process business logic]
    H --> I[Update database]
    H --> J[Send notifications]

style A fill:#e94560,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff

```
"Accept then Process" pattern — return 200 first, process later

```csharp
// Minimal controller — verify + enqueue only
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
    var payload = await ReadAndVerifySignature();
    if (payload == null) return Unauthorized();

// Save raw event to database/queue
    await _db.WebhookEvents.AddAsync(new WebhookEvent
    {
        EventId = payload.Id,
        EventType = payload.Type,
        RawPayload = payload.RawJson,
        Status = "pending",
        ReceivedAt = DateTime.UtcNow
    });
    await _db.SaveChangesAsync();

return Ok(); // Respond ASAP
}

// Background service for async processing
public class WebhookProcessorService : BackgroundService
{
    protected override async Task ExecuteAsync(
        CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            var pending = await _db.WebhookEvents
                .Where(e => e.Status == "pending")
                .OrderBy(e => e.ReceivedAt)
                .Take(50)
                .ToListAsync(ct);

foreach (var evt in pending)
            {
                try
                {
                    await ProcessEvent(evt);
                    evt.Status = "processed";
                }
                catch (Exception ex)
                {
                    evt.Status = "failed";
                    evt.ErrorMessage = ex.Message;
                    evt.RetryCount++;
                }
            }

await _db.SaveChangesAsync(ct);
            await Task.Delay(1000, ct);
        }
    }
}
```

## 9. Building a Webhook Sender

If you're building a platform that needs to provide webhooks to customers, here are the essential components:

### 9.1 Subscription Management

```sql
CREATE TABLE webhook_subscriptions (
    id BIGINT IDENTITY PRIMARY KEY,
    tenant_id BIGINT NOT NULL,
    url NVARCHAR(2048) NOT NULL,
    secret NVARCHAR(256) NOT NULL,
    events NVARCHAR(MAX) NOT NULL,  -- ["order.created","payment.succeeded"]
    is_active BIT DEFAULT 1,
    created_at DATETIME2 DEFAULT GETUTCDATE(),
    INDEX ix_tenant_active (tenant_id, is_active)
);

CREATE TABLE webhook_deliveries (
    id BIGINT IDENTITY PRIMARY KEY,
    subscription_id BIGINT FOREIGN KEY REFERENCES webhook_subscriptions(id),
    event_id NVARCHAR(128) NOT NULL,
    event_type NVARCHAR(128) NOT NULL,
    payload NVARCHAR(MAX),
    status NVARCHAR(20) DEFAULT 'pending',
    attempt_count INT DEFAULT 0,
    next_retry_at DATETIME2,
    last_response_code INT,
    last_response_body NVARCHAR(MAX),
    created_at DATETIME2 DEFAULT GETUTCDATE(),
    INDEX ix_status_retry (status, next_retry_at)
);
```

### 9.2 Circuit Breaker per Subscription

When a consumer's endpoint continuously fails, you shouldn't retry forever — apply the **circuit breaker pattern** to temporarily pause delivery and notify the consumer.

| State | Condition | Behavior |
| --- | --- | --- |
| **Closed** (normal) | Failure rate < 50% in last 10 minutes | Send webhooks normally |
| **Open** (paused) | 5 consecutive delivery failures | Skip delivery, queue events, send alert email to consumer |
| **Half-Open** (probing) | After 30 minutes in Open state | Try sending 1 event: if OK → Closed, if fail → Open again |

## 10. Monitoring & Observability

A webhook system without monitoring is like driving at night without headlights. Critical metrics to track:

P99 Latency Time from event to delivery (target: <2s)

Success Rate % first-attempt delivery success (target: >95%)

DLQ Size Events in dead letter queue (target: near 0)

Active Circuits Subscriptions currently circuit-broken

```csharp
// Using .NET Metrics API
public class WebhookMetrics
{
    private static readonly Meter Meter = new("Webhook.Delivery");

public static readonly Counter<long> DeliveryAttempts =
        Meter.CreateCounter<long>("webhook.delivery.attempts");

public static readonly Counter<long> DeliverySuccesses =
        Meter.CreateCounter<long>("webhook.delivery.successes");

public static readonly Histogram<double> DeliveryDuration =
        Meter.CreateHistogram<double>("webhook.delivery.duration_ms");

public static readonly UpDownCounter<long> DlqSize =
        Meter.CreateUpDownCounter<long>("webhook.dlq.size");
}
```

## 11. Build vs Buy — Self-Built vs Managed Services

You don't always need to build webhook infrastructure from scratch. Here's a comparison to help you decide:

| Criteria | Self-built | Managed (Svix, Hookdeck) | Cloud-native (Azure Event Grid, AWS SNS) |
| --- | --- | --- | --- |
| **Initial cost** | High (2-4 weeks dev) | Low (integrate in <1 day) | Low (pay-per-use) |
| **Customization** | Full control | Limited to vendor API | Moderate |
| **Retry & DLQ** | Must implement yourself | Built-in | Built-in |
| **Monitoring** | Build your own dashboard | Dashboard + alerts included | Integrated with CloudWatch/Monitor |
| **Scale** | Depends on your infra | Auto-scale | Auto-scale, global |
| **Best for** | Large teams, specific requirements | Startups, fast MVPs | Already on AWS/Azure |

## 12. Production-Ready Webhook Checklist

#### Webhook Sender Checklist

✓ HMAC-SHA256 signature for every delivery  
✓ Timestamp in signed payload (anti-replay)  
✓ Exponential backoff with jitter for retries  
✓ Circuit breaker per subscription  
✓ Dead letter queue + alerting  
✓ Queryable delivery log (at least 30 days)  
✓ Rate limiting per subscription (avoid overwhelming consumer)  
✓ API for consumers to view delivery history and manually retry  
✓ Webhook testing endpoint (echo server)

#### Webhook Consumer Checklist

✓ Verify signature BEFORE parsing payload  
✓ Validate timestamp (reject requests older than 5 minutes)  
✓ Idempotent processing based on event_id  
✓ Respond 200 within <5 seconds, process async  
✓ HTTPS endpoint mandatory  
✓ Handle out-of-order delivery (event B arrives before A)  
✓ Log every received webhook for debugging  
✓ Alert when processing failure rate increases

## References

- [Stripe Webhook Documentation](https://docs.stripe.com/webhooks)
- [GitHub Webhooks Guide](https://docs.github.com/en/webhooks)
- [Svix — Webhook Infrastructure Blog](https://www.svix.com/blog/)
- [Azure Event Grid Overview — Microsoft Learn](https://learn.microsoft.com/en-us/azure/event-grid/overview)
- [Hookdeck — Webhook Best Practices Blog](https://hookdeck.com/blog)

Trunk-Based Development vs Git Flow: Choosing the Right Branching Strategy for Teams in 2026

Idempotency Pattern — Designing Duplicate-Proof APIs for Distributed Systems

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.