Webhook Design Patterns — Building Reliable Event Notification Systems
Posted on: 4/21/2026 2:13:47 PM
Table of contents
- 1. What Are Webhooks and Why Should You Care?
- 2. Webhook System Architecture Overview
- 3. Designing Webhook Payloads — Not Too Much, Not Too Little
- 4. Idempotency — The Most Critical Problem When Receiving Webhooks
- 5. Retry Strategy — Exponential Backoff with Jitter
- 6. Webhook Security — Signature Verification
- 7. Timestamp Validation — Preventing Replay Attacks
- 8. Consumer-Side Processing — Respond Fast, Process Later
- 9. Building a Webhook Sender
- 10. Monitoring & Observability
- 11. Build vs Buy — Self-Built vs Managed Services
- 12. Production-Ready Webhook Checklist
- References
1. What Are Webhooks and Why Should You Care?
A webhook is an HTTP callback mechanism — when an event occurs in System A, it sends an HTTP POST request to a URL that System B has registered in advance. Unlike polling (where B repeatedly asks A "anything new?"), webhooks are push-based: A proactively notifies B the moment an event happens.
Major platforms like Stripe, GitHub, Shopify, and Twilio all use webhooks as the backbone of their integration ecosystem. When a payment succeeds on Stripe, a payment_intent.succeeded webhook fires to your server. When code is pushed to GitHub, a push webhook triggers your CI/CD pipeline.
Polling vs Webhook — The Classic Tradeoff
Suppose you need to know when an order is paid. With polling, your server calls the API every 5 seconds — that's 17,280 requests/day, 99% of which return nothing useful. With a webhook, you receive exactly 1 request when payment succeeds. Less load on both sides, near real-time response.
2. Webhook System Architecture Overview
A production-ready webhook system is far more than just "send an HTTP POST." It comprises multiple components working together to ensure reliability, security, and observability.
graph LR
A[Event Source] -->|Publish| B[Event Queue]
B --> C[Webhook Dispatcher]
C -->|HTTP POST| D[Consumer Endpoint]
D -->|2xx OK| E[Mark Delivered]
D -->|4xx/5xx/Timeout| F[Retry Queue]
F -->|Exponential Backoff| C
F -->|Max retries exceeded| G[Dead Letter Queue]
C --> H[Delivery Log]
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style C fill:#2c3e50,stroke:#fff,color:#fff
style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style G fill:#ff9800,stroke:#fff,color:#fff
Webhook system architecture with retry and dead letter queue
Key components:
- Event Source: Where events originate (payment succeeded, user registered, file uploaded...)
- Event Queue: Buffer between event source and dispatcher, ensuring no events are lost during traffic spikes
- Webhook Dispatcher: Dequeues events, builds payloads, sends HTTP POST to consumers
- Retry Queue: Holds failed deliveries, schedules retries with exponential backoff
- Dead Letter Queue (DLQ): Stores events that exhausted all retry attempts — requires human intervention
- Delivery Log: Complete send/receive history for debugging and auditing
3. Designing Webhook Payloads — Not Too Much, Not Too Little
There are two main schools of thought when designing payloads:
| Approach | Description | Pros | Cons |
|---|---|---|---|
| Fat payload | Send all data in the webhook | Consumer doesn't need additional API calls | Large payload, risk of stale data |
| Thin payload | Send only event type + resource ID | Small payload, data always fresh | Consumer must call API for details |
| Hybrid ⭐ | Event type + ID + snapshot of most commonly used fields | Balance between convenience and performance | Must clearly document which fields are included |
Stripe uses the hybrid approach — sending an object snapshot in the webhook while recommending consumers call the API to verify:
{
"id": "evt_1R2x3Y4Z5",
"type": "payment_intent.succeeded",
"created": 1713700000,
"data": {
"object": {
"id": "pi_abc123",
"amount": 5000,
"currency": "usd",
"status": "succeeded",
"metadata": { "order_id": "ORD-2026-001" }
}
}
}
Best Practice: Always Include These Fields
id (unique event ID for idempotency), type (event type), created (timestamp), data (resource snapshot or ID). Add api_version if your API supports versioning.
4. Idempotency — The Most Critical Problem When Receiving Webhooks
Webhooks can be delivered more than once (at-least-once delivery). Network timeouts, retries, or the consumer returning 200 but the sender not receiving the response — all lead to duplicate deliveries. Consumers must handle this idempotently.
sequenceDiagram
participant S as Webhook Sender
participant C as Consumer
participant DB as Database
S->>C: POST /webhook (event_id: evt_001)
C->>DB: Has evt_001 been processed?
DB-->>C: Not found
C->>DB: INSERT processed_events(evt_001)
C->>DB: Execute business logic
C-->>S: 200 OK
Note over S: Timeout — never received 200
S->>C: POST /webhook (event_id: evt_001) [RETRY]
C->>DB: Has evt_001 been processed?
DB-->>C: Already processed!
C-->>S: 200 OK (skip processing)
Idempotency flow — event_id is the key to avoiding duplicate processing
Implementing idempotency in .NET:
public class WebhookController : ControllerBase
{
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook(
[FromBody] WebhookPayload payload)
{
// Step 1: Check idempotency
var alreadyProcessed = await _db.ProcessedEvents
.AnyAsync(e => e.EventId == payload.Id);
if (alreadyProcessed)
return Ok(); // Return 200 so sender doesn't retry
// Step 2: Process within a transaction
await using var transaction = await _db.Database
.BeginTransactionAsync();
try
{
_db.ProcessedEvents.Add(new ProcessedEvent
{
EventId = payload.Id,
EventType = payload.Type,
ProcessedAt = DateTime.UtcNow
});
await ProcessEvent(payload);
await _db.SaveChangesAsync();
await transaction.CommitAsync();
return Ok();
}
catch
{
await transaction.RollbackAsync();
return StatusCode(500);
}
}
}
Watch Out: Race Conditions with Concurrent Webhooks
If two requests with the same event_id arrive simultaneously, both check "not processed" → both INSERT. Solution: use a UNIQUE constraint on EventId and handle the duplicate key exception, or use a distributed lock (Redis SETNX) for more complex scenarios.
5. Retry Strategy — Exponential Backoff with Jitter
When delivery fails, you shouldn't retry immediately (thundering herd problem) or at fixed intervals (still causes spikes). The standard pattern is exponential backoff with jitter:
public class RetryPolicy
{
private static readonly int[] BaseDelaysSeconds = { 10, 30, 60, 300, 900, 3600, 7200 };
public static TimeSpan GetDelay(int attemptNumber)
{
var index = Math.Min(attemptNumber, BaseDelaysSeconds.Length - 1);
var baseDelay = BaseDelaysSeconds[index];
// Jitter: ±25% to prevent thundering herd
var jitter = Random.Shared.NextDouble() * 0.5 + 0.75;
return TimeSpan.FromSeconds(baseDelay * jitter);
}
}
| Retry # | Base Delay | With Jitter (range) | Purpose |
|---|---|---|---|
| 1 | 10s | 7.5s – 12.5s | Transient error (network blip) |
| 2 | 30s | 22.5s – 37.5s | Service restarting |
| 3 | 1 min | 45s – 75s | Minor outage |
| 4 | 5 min | 3.75m – 6.25m | Deployment in progress |
| 5 | 15 min | 11.25m – 18.75m | Moderate outage |
| 6 | 1 hour | 45m – 75m | Extended outage |
| 7 | 2 hours | 1.5h – 2.5h | Last attempt before DLQ |
Total retry window: approximately 4-5 hours. Stripe retries for 72 hours, GitHub for 3 days — adjust based on your SLA.
6. Webhook Security — Signature Verification
Your webhook endpoint is a public URL — anyone who knows it can send fake requests. You must verify that requests come from a legitimate sender.
The most common pattern: HMAC-SHA256 signature.
graph LR
A[Sender] -->|1. HMAC-SHA256 payload + secret| B[Signature]
A -->|2. Send payload + signature header| C[Consumer]
C -->|3. HMAC-SHA256 payload + shared secret| D[Expected Signature]
C -->|4. Compare B == D?| E{Match?}
E -->|Yes| F[Process]
E -->|No| G[Reject 401]
style A fill:#e94560,stroke:#fff,color:#fff
style C fill:#2c3e50,stroke:#fff,color:#fff
style F fill:#4CAF50,stroke:#fff,color:#fff
style G fill:#ff9800,stroke:#fff,color:#fff
HMAC-SHA256 signature verification flow
public class WebhookSignatureValidator
{
public static bool Validate(string payload, string signature,
string secret)
{
using var hmac = new HMACSHA256(Encoding.UTF8.GetBytes(secret));
var computedHash = hmac.ComputeHash(
Encoding.UTF8.GetBytes(payload));
var computedSignature = "sha256=" +
Convert.ToHexString(computedHash).ToLowerInvariant();
// Timing-safe comparison to prevent timing attacks
return CryptographicOperations
.FixedTimeEquals(
Encoding.UTF8.GetBytes(computedSignature),
Encoding.UTF8.GetBytes(signature));
}
}
// In the controller:
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
var payload = await new StreamReader(Request.Body)
.ReadToEndAsync();
var signature = Request.Headers["X-Webhook-Signature"]
.FirstOrDefault();
if (!WebhookSignatureValidator.Validate(
payload, signature, _config["WebhookSecret"]))
return Unauthorized();
var data = JsonSerializer.Deserialize<WebhookPayload>(payload);
// Continue processing...
}
Common Mistake: Using == to Compare Signatures
String comparison with == short-circuits at the first differing character — an attacker can measure response times to brute-force each byte (timing attack). Always use CryptographicOperations.FixedTimeEquals (.NET) or crypto.timingSafeEqual (Node.js).
7. Timestamp Validation — Preventing Replay Attacks
Signature verification alone isn't enough. An attacker can capture a valid request and replay it later. The solution: include a timestamp in the signed payload and reject old requests.
public bool IsTimestampValid(long webhookTimestamp,
int toleranceSeconds = 300)
{
var now = DateTimeOffset.UtcNow.ToUnixTimeSeconds();
return Math.Abs(now - webhookTimestamp) <= toleranceSeconds;
}
// Combined: sign = HMAC(timestamp + "." + payload)
// Header: X-Webhook-Signature: t=1713700000,v1=abc123...
Stripe uses exactly this pattern — with a default tolerance of 5 minutes (300 seconds). If a request is older than 5 minutes, it's rejected immediately even if the signature is valid.
8. Consumer-Side Processing — Respond Fast, Process Later
The golden rule: return 200 OK within 5 seconds. If business logic is complex, don't process it in the request handler — enqueue and process asynchronously.
graph LR
A[Webhook Request] --> B[Controller]
B -->|Verify signature| C{Valid?}
C -->|No| D[Return 401]
C -->|Yes| E[Save to local queue]
E --> F[Return 200 OK]
E --> G[Background Worker]
G --> H[Process business logic]
H --> I[Update database]
H --> J[Send notifications]
style A fill:#e94560,stroke:#fff,color:#fff
style F fill:#4CAF50,stroke:#fff,color:#fff
style G fill:#2c3e50,stroke:#fff,color:#fff
"Accept then Process" pattern — return 200 first, process later
// Minimal controller — verify + enqueue only
[HttpPost("webhook")]
public async Task<IActionResult> HandleWebhook()
{
var payload = await ReadAndVerifySignature();
if (payload == null) return Unauthorized();
// Save raw event to database/queue
await _db.WebhookEvents.AddAsync(new WebhookEvent
{
EventId = payload.Id,
EventType = payload.Type,
RawPayload = payload.RawJson,
Status = "pending",
ReceivedAt = DateTime.UtcNow
});
await _db.SaveChangesAsync();
return Ok(); // Respond ASAP
}
// Background service for async processing
public class WebhookProcessorService : BackgroundService
{
protected override async Task ExecuteAsync(
CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
var pending = await _db.WebhookEvents
.Where(e => e.Status == "pending")
.OrderBy(e => e.ReceivedAt)
.Take(50)
.ToListAsync(ct);
foreach (var evt in pending)
{
try
{
await ProcessEvent(evt);
evt.Status = "processed";
}
catch (Exception ex)
{
evt.Status = "failed";
evt.ErrorMessage = ex.Message;
evt.RetryCount++;
}
}
await _db.SaveChangesAsync(ct);
await Task.Delay(1000, ct);
}
}
}
9. Building a Webhook Sender
If you're building a platform that needs to provide webhooks to customers, here are the essential components:
9.1 Subscription Management
CREATE TABLE webhook_subscriptions (
id BIGINT IDENTITY PRIMARY KEY,
tenant_id BIGINT NOT NULL,
url NVARCHAR(2048) NOT NULL,
secret NVARCHAR(256) NOT NULL,
events NVARCHAR(MAX) NOT NULL, -- ["order.created","payment.succeeded"]
is_active BIT DEFAULT 1,
created_at DATETIME2 DEFAULT GETUTCDATE(),
INDEX ix_tenant_active (tenant_id, is_active)
);
CREATE TABLE webhook_deliveries (
id BIGINT IDENTITY PRIMARY KEY,
subscription_id BIGINT FOREIGN KEY REFERENCES webhook_subscriptions(id),
event_id NVARCHAR(128) NOT NULL,
event_type NVARCHAR(128) NOT NULL,
payload NVARCHAR(MAX),
status NVARCHAR(20) DEFAULT 'pending',
attempt_count INT DEFAULT 0,
next_retry_at DATETIME2,
last_response_code INT,
last_response_body NVARCHAR(MAX),
created_at DATETIME2 DEFAULT GETUTCDATE(),
INDEX ix_status_retry (status, next_retry_at)
);
9.2 Circuit Breaker per Subscription
When a consumer's endpoint continuously fails, you shouldn't retry forever — apply the circuit breaker pattern to temporarily pause delivery and notify the consumer.
| State | Condition | Behavior |
|---|---|---|
| Closed (normal) | Failure rate < 50% in last 10 minutes | Send webhooks normally |
| Open (paused) | 5 consecutive delivery failures | Skip delivery, queue events, send alert email to consumer |
| Half-Open (probing) | After 30 minutes in Open state | Try sending 1 event: if OK → Closed, if fail → Open again |
10. Monitoring & Observability
A webhook system without monitoring is like driving at night without headlights. Critical metrics to track:
// Using .NET Metrics API
public class WebhookMetrics
{
private static readonly Meter Meter = new("Webhook.Delivery");
public static readonly Counter<long> DeliveryAttempts =
Meter.CreateCounter<long>("webhook.delivery.attempts");
public static readonly Counter<long> DeliverySuccesses =
Meter.CreateCounter<long>("webhook.delivery.successes");
public static readonly Histogram<double> DeliveryDuration =
Meter.CreateHistogram<double>("webhook.delivery.duration_ms");
public static readonly UpDownCounter<long> DlqSize =
Meter.CreateUpDownCounter<long>("webhook.dlq.size");
}
11. Build vs Buy — Self-Built vs Managed Services
You don't always need to build webhook infrastructure from scratch. Here's a comparison to help you decide:
| Criteria | Self-built | Managed (Svix, Hookdeck) | Cloud-native (Azure Event Grid, AWS SNS) |
|---|---|---|---|
| Initial cost | High (2-4 weeks dev) | Low (integrate in <1 day) | Low (pay-per-use) |
| Customization | Full control | Limited to vendor API | Moderate |
| Retry & DLQ | Must implement yourself | Built-in | Built-in |
| Monitoring | Build your own dashboard | Dashboard + alerts included | Integrated with CloudWatch/Monitor |
| Scale | Depends on your infra | Auto-scale | Auto-scale, global |
| Best for | Large teams, specific requirements | Startups, fast MVPs | Already on AWS/Azure |
12. Production-Ready Webhook Checklist
Webhook Sender Checklist
✓ HMAC-SHA256 signature for every delivery
✓ Timestamp in signed payload (anti-replay)
✓ Exponential backoff with jitter for retries
✓ Circuit breaker per subscription
✓ Dead letter queue + alerting
✓ Queryable delivery log (at least 30 days)
✓ Rate limiting per subscription (avoid overwhelming consumer)
✓ API for consumers to view delivery history and manually retry
✓ Webhook testing endpoint (echo server)
Webhook Consumer Checklist
✓ Verify signature BEFORE parsing payload
✓ Validate timestamp (reject requests older than 5 minutes)
✓ Idempotent processing based on event_id
✓ Respond 200 within <5 seconds, process async
✓ HTTPS endpoint mandatory
✓ Handle out-of-order delivery (event B arrives before A)
✓ Log every received webhook for debugging
✓ Alert when processing failure rate increases
Webhooks seem simple on the surface, but getting them right is far from easy. From idempotency and signature verification to retry strategies and circuit breakers — each layer has its own pitfalls. This guide gives you a complete blueprint for implementing production-grade webhooks in your systems.
References
Trunk-Based Development vs Git Flow: Choosing the Right Branching Strategy for Teams in 2026
Idempotency Pattern — Designing Duplicate-Proof APIs for Distributed Systems
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.