Resilience Patterns on .NET 10 — Polly, Circuit Breaker, and Retry for Microservices

Posted on: 4/17/2026 1:09:18 PM

Table of contents

1. Why do we need Resilience Patterns?
1. Resilience ≠ just retry
2. Polly v8 — A brand-new architecture
3. Fast integration with AddStandardResilienceHandler
1. Why two timeouts?
4. Circuit Breaker — A smart electrical-breaker mechanism
1. 4.1. Detailed Circuit Breaker configuration
  1. Watch out for MinimumThroughput
2. 4.2. Dynamic Break Duration
5. The Retry Pattern — The art of retrying properly
1. 5.1. Exponential Backoff + Jitter
2. 5.2. Disable retries for unsafe methods
  1. When CAN you retry POST?
6. Hedging — Parallel requests to reduce latency
7. Timeout strategy — Two-layer protection
1. Order matters!
8. Integrating with OpenTelemetry
9. Dynamic reload — Changing config at runtime
10. Production best practices
11. A real-world example — Resilient e-commerce architecture
1. Fallback for Notification
12. Anti-patterns to avoid
Conclusion

In a microservices architecture, one service calling another over HTTP is a daily occurrence. But the network is never trustworthy — timeouts, overloaded servers, DNS flaps, or simply a deployment rolling update. Without a strategy for handling transient faults, a single slow service can bring the whole system down in a domino effect. This article digs into Resilience Patterns on .NET 10 with Polly and the Microsoft.Extensions.Http.Resilience package — the industry-standard toolkit that lets an application recover from failures on its own.

350M+ Polly downloads on NuGet

5 layers Standard Resilience Pipeline

v10.4 Microsoft.Extensions.Http.Resilience

< 3ms Average overhead per request

1. Why do we need Resilience Patterns?

Imagine an e-commerce system with 20 microservices. The Order service calls Payment, Payment calls Fraud Detection, Fraud Detection calls ML Scoring. When ML Scoring slows down to 10 seconds instead of the usual 200 ms, what happens?

graph LR
    A[Order Service] -->|HTTP| B[Payment Service]
    B -->|HTTP| C[Fraud Detection]
    C -->|HTTP| D["ML Scoring (slow ⚠️)"]
    D -.->|10s timeout| C
    C -.->|thread blocked| B
    B -.->|thread pool exhausted| A
    A -.->|503 to user| E[Client]
    style D fill:#ff9800,stroke:#e65100,color:#fff
    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#e94560,stroke:#fff,color:#fff

Cascading failure — one slow service drags the entire chain down

Without resilience patterns, each service's thread pool fills up with requests waiting on downstream. When the thread pool is exhausted, the service can't serve any request — even ones unrelated to ML Scoring. That's a cascading failure, and it can take down an entire system in minutes.

Resilience ≠ just retry

Many developers think resilience is simply "try again on failure". In reality, retrying incorrectly makes things worse — thousands of clients retrying in unison create a retry storm that crushes an already-overloaded service. Resilience patterns are the smart combination of retry, circuit breaker, timeout, rate limiter, and fallback.

2. Polly v8 — A brand-new architecture

Polly v8 (current on .NET 10) has been rewritten from scratch around the Resilience Pipeline architecture — completely replacing the old Policy-based API. Pipelines let you stack multiple strategies in a defined order, with each strategy operating independently.

graph TB
    subgraph Pipeline["Resilience Pipeline"]
        direction TB
        RL["1. Rate Limiter"] --> TT["2. Total Timeout (30s)"]
        TT --> RT["3. Retry (3 attempts, exponential)"]
        RT --> CB["4. Circuit Breaker"]
        CB --> AT["5. Attempt Timeout (10s)"]
    end
    REQ["HTTP Request"] --> RL
    AT --> SVC["Downstream Service"]
    style Pipeline fill:#f8f9fa,stroke:#e0e0e0
    style REQ fill:#e94560,stroke:#fff,color:#fff
    style SVC fill:#2c3e50,stroke:#fff,color:#fff
    style RL fill:#fff,stroke:#e94560,color:#2c3e50
    style TT fill:#fff,stroke:#e94560,color:#2c3e50
    style RT fill:#fff,stroke:#e94560,color:#2c3e50
    style CB fill:#fff,stroke:#e94560,color:#2c3e50
    style AT fill:#fff,stroke:#e94560,color:#2c3e50

The 5 strategies in the Standard Resilience Pipeline (outermost → innermost)

3. Fast integration with AddStandardResilienceHandler

The fastest way to add resilience to an HttpClient on .NET 10 is AddStandardResilienceHandler() from the Microsoft.Extensions.Http.Resilience package. One line of code gives you five production-tuned layers of protection.

Program.cs

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

var builder = Host.CreateApplicationBuilder(args);

// Register HttpClient with the standard resilience pipeline
builder.Services
    .AddHttpClient<PaymentClient>(client =>
    {
        client.BaseAddress = new Uri("https://payment-api.internal");
    })
    .AddStandardResilienceHandler();

var app = builder.Build();
await app.RunAsync();

With the defaults, the pipeline will:

Order	Strategy	Default	Effect
1	Rate Limiter	1,000 concurrent permits	Prevents the client from firing too many simultaneous requests
2	Total Timeout	30 seconds	Caps total time including retries
3	Retry	3 attempts, exponential backoff + jitter	Automatically retries on transient errors
4	Circuit Breaker	10% failure ratio, 100 min throughput	Trips when the downstream fails repeatedly
5	Attempt Timeout	10 seconds	Caps the time for a single attempt

Why two timeouts?

Attempt Timeout (10 s) bounds each attempt — if one request exceeds 10 s, it's cancelled to make room for the next retry. Total Timeout (30 s) is the overall "time budget" — regardless of retry count, total time stays under 30 s. Without a total timeout, 3 retries × 10 s = 30 s + backoff delay can stretch past 40 s.

4. Circuit Breaker — A smart electrical-breaker mechanism

Circuit Breaker is the most important resilience pattern. Rather than keep hitting an ailing service (piling on load), the circuit breaker auto-trips and fails fast — like an electrical breaker cutting the circuit to protect hardware.

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure ratio exceeds threshold
    Open --> HalfOpen : After break duration
    HalfOpen --> Closed : Probe request succeeds
    HalfOpen --> Open : Probe request fails
    Open --> Isolated : Manual isolate
    Isolated --> Closed : Manual close

    note right of Closed : Requests pass normally\nSampling failure ratio
    note right of Open : Requests rejected immediately\nThrows BrokenCircuitException
    note right of HalfOpen : Allows one probe request\nDecides close/reopen

The Circuit Breaker state machine — 4 states

4.1. Detailed Circuit Breaker configuration

PaymentClientResilience.cs

builder.Services
    .AddHttpClient<PaymentClient>(client =>
    {
        client.BaseAddress = new Uri("https://payment-api.internal");
    })
    .AddResilienceHandler("PaymentPipeline", static pipelineBuilder =>
    {
        // Circuit Breaker: trip when 20% of requests fail
        pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.2,                              // 20% failure → open circuit
            SamplingDuration = TimeSpan.FromSeconds(10),     // 10-second sampling window
            MinimumThroughput = 8,                           // Need at least 8 requests for stats
            BreakDuration = TimeSpan.FromSeconds(30),        // Keep circuit open for 30s
            ShouldHandle = static args => ValueTask.FromResult(args is
            {
                Outcome.Result.StatusCode:
                    HttpStatusCode.RequestTimeout or
                    HttpStatusCode.TooManyRequests or
                    HttpStatusCode.InternalServerError or
                    HttpStatusCode.ServiceUnavailable
            })
        });

        // Retry: 5 attempts with exponential backoff
        pipelineBuilder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 5,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromMilliseconds(500),
            UseJitter = true
        });

        // Timeout: 5 seconds per attempt
        pipelineBuilder.AddTimeout(TimeSpan.FromSeconds(5));
    });

Watch out for MinimumThroughput

If MinimumThroughput isn't reached within SamplingDuration, the circuit breaker ignores the failure ratio. This keeps the circuit from tripping under low traffic (e.g. 2 failed requests out of only 2 total in 10 s). Set the value to match your service's real traffic.

4.2. Dynamic Break Duration

Instead of a fixed 30-second break, you can increase the break duration as the circuit opens repeatedly — exponential backoff at the circuit-breaker level:

pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
    BreakDurationGenerator = static args =>
    {
        // 1st: 15s, 2nd: 30s, 3rd: 60s, max 120s
        var duration = TimeSpan.FromSeconds(
            Math.Min(15 * Math.Pow(2, args.FailureCount - 1), 120));
        return ValueTask.FromResult(duration);
    }
});

5. The Retry Pattern — The art of retrying properly

Retry sounds simple but done wrong it causes disasters. Three golden rules:

Exponential Backoff — grow the gap between retries

Jitter Add randomness to avoid thundering herds

Idempotent Only retry safe/idempotent operations

5.1. Exponential Backoff + Jitter

Without jitter, 1,000 clients retrying at the same 1 s, 2 s, 4 s, 8 s marks create periodic traffic spikes. Jitter spreads retries randomly, reducing server pressure.

graph LR
    subgraph Without_Jitter["No Jitter"]
        A1["t=1s: 1000 req"] --> A2["t=2s: 1000 req"] --> A3["t=4s: 1000 req"]
    end
    subgraph With_Jitter["With Jitter"]
        B1["t=0.8-1.2s: ~330 req"] --> B2["t=1.6-2.4s: ~330 req"] --> B3["t=3.2-4.8s: ~340 req"]
    end
    style Without_Jitter fill:#ffebee,stroke:#c62828
    style With_Jitter fill:#e8f5e9,stroke:#2e7d32
    style A1 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style A2 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style A3 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style B1 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
    style B2 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
    style B3 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50

Jitter spreads retry traffic, preventing retry storms

5.2. Disable retries for unsafe methods

A POST that creates an order, retried 3 times, could create 3 orders. .NET 10 provides a clean API:

builder.Services
    .AddHttpClient<OrderClient>()
    .AddStandardResilienceHandler(options =>
    {
        // Disable retries for POST, PUT, DELETE — only retry GET/HEAD
        options.Retry.DisableForUnsafeHttpMethods();
    });

When CAN you retry POST?

If the downstream API supports an idempotency key (e.g. Stripe, PayPal), you can retry POST safely because the server deduplicates based on the key. In that case, don't disable POST retries — just make sure every request carries a unique Idempotency-Key header.

6. Hedging — Parallel requests to reduce latency

Hedging is an advanced strategy: when the first request is slow, fire additional requests in parallel to another endpoint (or the same one) and take whichever responds first. Particularly useful with multiple replicas or a multi-region deployment.

Program.cs — Hedging with A/B routing

builder.Services
    .AddHttpClient<SearchClient>()
    .AddStandardHedgingHandler(routingBuilder =>
    {
        routingBuilder.ConfigureWeightedGroups(options =>
        {
            options.SelectionMode = WeightedGroupSelectionMode.EveryAttempt;
            options.Groups.Add(new WeightedUriEndpointGroup
            {
                Endpoints =
                {
                    new() { Uri = new("https://search-primary.internal"), Weight = 70 },
                    new() { Uri = new("https://search-secondary.internal"), Weight = 30 }
                }
            });
        });
    });

Characteristic	Retry	Hedging
When to fire the next request?	After the previous one fails	After a delay (default 2 s) regardless of the previous
Concurrent requests	1	Many (up to 10 by default)
Primary use case	Transient failures, 500/408/429 responses	High tail latency, multi-region, read-heavy
Cost	Low — sequential requests	Higher — parallel requests consume resources

7. Timeout strategy — Two-layer protection

Timeouts sound simple but are among the most common bugs. No timeout → threads block forever. Too short → legitimate requests get cancelled. .NET 10 + Polly solve this with two layers of timeout:

builder.Services
    .AddHttpClient<ReportClient>()
    .AddResilienceHandler("ReportPipeline", builder =>
    {
        // Total timeout: the whole operation (including retries) ≤ 60s
        builder.AddTimeout(new TimeoutStrategyOptions
        {
            Timeout = TimeSpan.FromSeconds(60),
            Name = "TotalTimeout"
        });

        // Retry 3 times
        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromSeconds(1)
        });

        // Per-attempt timeout: each individual request ≤ 15s
        builder.AddTimeout(new TimeoutStrategyOptions
        {
            Timeout = TimeSpan.FromSeconds(15),
            Name = "AttemptTimeout"
        });
    });

Order matters!

Polly applies strategies from the outside in. Total Timeout must be outermost (before Retry); Attempt Timeout must be innermost (after Retry). If reversed, the total timeout would only cover one attempt instead of the whole operation.

8. Integrating with OpenTelemetry

Resilience patterns mean nothing if you can't see how they're behaving. Polly v8 integrates with OpenTelemetry out of the box, exporting metrics and traces for every strategy.

Program.cs — Observability

builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddMeter("Polly");           // Polly metrics: retry count, circuit state, etc.
        metrics.AddMeter("System.Net.Http"); // HttpClient metrics
        metrics.AddPrometheusExporter();
    })
    .WithTracing(tracing =>
    {
        tracing.AddHttpClientInstrumentation();
        tracing.AddOtlpExporter();
    });

Key Polly metrics to watch:

Metric	Meaning	Alert when
`polly_strategy_attempt_count`	Total executions (including retries)	Spikes → downstream is struggling
`polly_strategy_attempt_duration`	Time per attempt	p99 > attempt timeout
`polly_circuit_breaker_state`	Circuit state (0=Closed, 1=Open, 2=HalfOpen)	State = 1 for more than 5 minutes
`polly_rate_limiter_queue_duration`	Time a request waits in the rate-limiter queue	> 500 ms

9. Dynamic reload — Changing config at runtime

One of Polly's most powerful features on .NET 10: you can change resilience configuration without restarting the app. Config binds from appsettings.json and auto-reloads when the file changes.

appsettings.json

{
  "PaymentResilience": {
    "Retry": {
      "BackoffType": "Exponential",
      "UseJitter": true,
      "MaxRetryAttempts": 3,
      "Delay": "00:00:01"
    },
    "CircuitBreaker": {
      "FailureRatio": 0.1,
      "SamplingDuration": "00:00:30",
      "MinimumThroughput": 100,
      "BreakDuration": "00:00:05"
    },
    "TotalRequestTimeout": {
      "Timeout": "00:00:30"
    }
  }
}

Program.cs — Binding the config

var resilienceSection = builder.Configuration.GetSection("PaymentResilience");
builder.Services.Configure<HttpStandardResilienceOptions>(resilienceSection);

builder.Services
    .AddHttpClient<PaymentClient>()
    .AddStandardResilienceHandler();

When ops need to bump retry from 3 to 5 or extend the timeout from 30 s to 60 s, they just update config and deploy — no code changes, no rebuilds.

10. Production best practices

Rule 1

Always set a timeout. Every HTTP call must have a timeout. No exceptions. HttpClient's default isn't "none" — it's 100 s, which is still way too long. Set a total timeout matching the downstream's SLA.

Rule 2

A circuit breaker for every dependency. Each downstream service needs its own circuit breaker. Use named HttpClient + AddResilienceHandler to keep them separate — Payment's circuit should never affect Inventory.

Rule 3

Don't retry write operations (unless you have an idempotency key). POST, PUT, DELETE — each retry can produce a side effect. DisableForUnsafeHttpMethods() is the safest choice.

Rule 4

Monitor circuit-breaker state. An open circuit is an important warning sign. Wire up OpenTelemetry + alerts when the circuit stays open more than 5 minutes — at that point it's not a transient fault, it's a real incident.

Rule 5

Test resilience behavior. Use chaos engineering (Simmy — Polly's fault-injection library) to inject timeouts and exceptions in staging. Don't wait for production to fail to find out whether your circuit breaker works.

Rule 6

Only one resilience handler per HttpClient. Don't stack AddStandardResilienceHandler() with AddResilienceHandler() — use RemoveAllResilienceHandlers() if you need a custom pipeline to replace the standard one.

11. A real-world example — Resilient e-commerce architecture

Below is a complete resilience setup for an e-commerce system with 3 downstream services, each with its own pipeline tuned to the traffic's characteristics:

Program.cs — Production-grade setup

var builder = WebApplication.CreateBuilder(args);

// Payment: critical, no POST retry, tight circuit breaker
builder.Services
    .AddHttpClient<PaymentClient>(c => c.BaseAddress = new("https://payment.internal"))
    .AddResilienceHandler("Payment", pipeline =>
    {
        pipeline.AddTimeout(TimeSpan.FromSeconds(20));
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 2,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromMilliseconds(500),
            UseJitter = true,
            DisableFor = [HttpMethod.Post]  // Don't retry payment POSTs
        });
        pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.15,
            MinimumThroughput = 20,
            SamplingDuration = TimeSpan.FromSeconds(15),
            BreakDuration = TimeSpan.FromSeconds(60)
        });
        pipeline.AddTimeout(TimeSpan.FromSeconds(8));
    });

// Inventory: read-heavy, retry freely
builder.Services
    .AddHttpClient<InventoryClient>(c => c.BaseAddress = new("https://inventory.internal"))
    .AddStandardResilienceHandler();

// Notification: non-critical, short timeout, fail silently
builder.Services
    .AddHttpClient<NotificationClient>(c => c.BaseAddress = new("https://notify.internal"))
    .AddResilienceHandler("Notification", pipeline =>
    {
        pipeline.AddTimeout(TimeSpan.FromSeconds(5));
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 1,
            Delay = TimeSpan.FromMilliseconds(200)
        });
        pipeline.AddTimeout(TimeSpan.FromSeconds(3));
    });

Fallback for Notification

Notification is non-critical — even if an email fails to send, the order must still succeed. In practice, wrap the notification call in try-catch and log the error instead of letting the exception bubble up. Better yet: push notifications to a message queue (RabbitMQ, Azure Service Bus) for async processing.

12. Anti-patterns to avoid

Anti-pattern	Problem	Solution
Retry without backoff	Immediate retries → retry storm crushes the server	Always use exponential backoff + jitter
Retry everything	Retrying 404 or 401 — errors that never self-heal	Only retry 408, 429, 500, 502, 503, TimeoutException
Shared circuit breaker	One circuit breaker for all downstreams → Payment failures also trip Inventory	Each downstream gets a named HttpClient + its own circuit
Checking circuit state before Execute	Race condition + blocks HalfOpen transition	Always call Execute and let Polly handle state
Too many retries	10 retries × 5 s timeout = 50 s — the user left long ago	3-5 retries max, total timeout ≤ SLA

Conclusion

Resilience patterns aren't a "nice-to-have" — they are a hard requirement for any microservices system running in production. With Polly v8 and Microsoft.Extensions.Http.Resilience on .NET 10, integration is easier than ever — one line of AddStandardResilienceHandler() covers 90% of use cases, and full customization covers the remaining 10%. Combined with OpenTelemetry for monitoring, you get a self-healing, observable system ready for any incident.

References:

#.NET 10 #Polly #Circuit Breaker #Microservices #Resilience Patterns #system design

# Resilience Patterns on .NET 10 — Polly, Circuit Breaker, and Retry for Microservices

In a microservices architecture, one service calling another over HTTP is a daily occurrence. But the network is never trustworthy — timeouts, overloaded servers, DNS flaps, or simply a deployment rolling update. Without a strategy for handling transient faults, a single slow service can bring the whole system down in a domino effect. This article digs into **Resilience Patterns** on .NET 10 with **Polly** and the **Microsoft.Extensions.Http.Resilience** package — the industry-standard toolkit that lets an application recover from failures on its own.

350M+ Polly downloads on NuGet

5 layers Standard Resilience Pipeline

v10.4 Microsoft.Extensions.Http.Resilience

< 3ms Average overhead per request

## 1. Why do we need Resilience Patterns?

Imagine an e-commerce system with 20 microservices. The *Order* service calls *Payment*, *Payment* calls *Fraud Detection*, *Fraud Detection* calls *ML Scoring*. When ML Scoring slows down to 10 seconds instead of the usual 200 ms, what happens?

```
Cascading failure — one slow service drags the entire chain down

Without resilience patterns, each service's thread pool fills up with requests waiting on downstream. When the thread pool is exhausted, the service can't serve any request — even ones unrelated to ML Scoring. That's a **cascading failure**, and it can take down an entire system in minutes.

#### Resilience ≠ just retry

Many developers think resilience is simply "try again on failure". In reality, retrying incorrectly makes things worse — thousands of clients retrying in unison create a **retry storm** that crushes an already-overloaded service. Resilience patterns are the smart combination of retry, circuit breaker, timeout, rate limiter, and fallback.

## 2. Polly v8 — A brand-new architecture

Polly v8 (current on .NET 10) has been rewritten from scratch around the **Resilience Pipeline** architecture — completely replacing the old Policy-based API. Pipelines let you stack multiple strategies in a defined order, with each strategy operating independently.

```
graph TB
    subgraph Pipeline["Resilience Pipeline"]
        direction TB
        RL["1. Rate Limiter"] --> TT["2. Total Timeout (30s)"]
        TT --> RT["3. Retry (3 attempts, exponential)"]
        RT --> CB["4. Circuit Breaker"]
        CB --> AT["5. Attempt Timeout (10s)"]
    end
    REQ["HTTP Request"] --> RL
    AT --> SVC["Downstream Service"]
    style Pipeline fill:#f8f9fa,stroke:#e0e0e0
    style REQ fill:#e94560,stroke:#fff,color:#fff
    style SVC fill:#2c3e50,stroke:#fff,color:#fff
    style RL fill:#fff,stroke:#e94560,color:#2c3e50
    style TT fill:#fff,stroke:#e94560,color:#2c3e50
    style RT fill:#fff,stroke:#e94560,color:#2c3e50
    style CB fill:#fff,stroke:#e94560,color:#2c3e50
    style AT fill:#fff,stroke:#e94560,color:#2c3e50

```
The 5 strategies in the Standard Resilience Pipeline (outermost → innermost)

## 3. Fast integration with AddStandardResilienceHandler

The fastest way to add resilience to an HttpClient on .NET 10 is `AddStandardResilienceHandler()` from the `Microsoft.Extensions.Http.Resilience` package. One line of code gives you five production-tuned layers of protection.

Program.cs

```csharp
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

var builder = Host.CreateApplicationBuilder(args);

// Register HttpClient with the standard resilience pipeline
builder.Services
    .AddHttpClient<PaymentClient>(client =>
    {
        client.BaseAddress = new Uri("https://payment-api.internal");
    })
    .AddStandardResilienceHandler();

var app = builder.Build();
await app.RunAsync();
```
With the defaults, the pipeline will:

| Order | Strategy | Default | Effect |
| --- | --- | --- | --- |
| 1 | **Rate Limiter** | 1,000 concurrent permits | Prevents the client from firing too many simultaneous requests |
| 2 | **Total Timeout** | 30 seconds | Caps total time including retries |
| 3 | **Retry** | 3 attempts, exponential backoff + jitter | Automatically retries on transient errors |
| 4 | **Circuit Breaker** | 10% failure ratio, 100 min throughput | Trips when the downstream fails repeatedly |
| 5 | **Attempt Timeout** | 10 seconds | Caps the time for a single attempt |

#### Why two timeouts?

**Attempt Timeout** (10 s) bounds each attempt — if one request exceeds 10 s, it's cancelled to make room for the next retry. **Total Timeout** (30 s) is the overall "time budget" — regardless of retry count, total time stays under 30 s. Without a total timeout, 3 retries × 10 s = 30 s + backoff delay can stretch past 40 s.

## 4. Circuit Breaker — A smart electrical-breaker mechanism

```
stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure ratio exceeds threshold
    Open --> HalfOpen : After break duration
    HalfOpen --> Closed : Probe request succeeds
    HalfOpen --> Open : Probe request fails
    Open --> Isolated : Manual isolate
    Isolated --> Closed : Manual close

note right of Closed : Requests pass normally\nSampling failure ratio
    note right of Open : Requests rejected immediately\nThrows BrokenCircuitException
    note right of HalfOpen : Allows one probe request\nDecides close/reopen

```
The Circuit Breaker state machine — 4 states

### 4.1. Detailed Circuit Breaker configuration

PaymentClientResilience.cs

```csharp
builder.Services
    .AddHttpClient<PaymentClient>(client =>
    {
        client.BaseAddress = new Uri("https://payment-api.internal");
    })
    .AddResilienceHandler("PaymentPipeline", static pipelineBuilder =>
    {
        // Circuit Breaker: trip when 20% of requests fail
        pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.2,                              // 20% failure → open circuit
            SamplingDuration = TimeSpan.FromSeconds(10),     // 10-second sampling window
            MinimumThroughput = 8,                           // Need at least 8 requests for stats
            BreakDuration = TimeSpan.FromSeconds(30),        // Keep circuit open for 30s
            ShouldHandle = static args => ValueTask.FromResult(args is
            {
                Outcome.Result.StatusCode:
                    HttpStatusCode.RequestTimeout or
                    HttpStatusCode.TooManyRequests or
                    HttpStatusCode.InternalServerError or
                    HttpStatusCode.ServiceUnavailable
            })
        });

// Retry: 5 attempts with exponential backoff
        pipelineBuilder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 5,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromMilliseconds(500),
            UseJitter = true
        });

// Timeout: 5 seconds per attempt
        pipelineBuilder.AddTimeout(TimeSpan.FromSeconds(5));
    });
```

#### Watch out for MinimumThroughput

If `MinimumThroughput` isn't reached within `SamplingDuration`, the circuit breaker **ignores the failure ratio**. This keeps the circuit from tripping under low traffic (e.g. 2 failed requests out of only 2 total in 10 s). Set the value to match your service's real traffic.

### 4.2. Dynamic Break Duration

Instead of a fixed 30-second break, you can increase the break duration as the circuit opens repeatedly — exponential backoff at the circuit-breaker level:

```csharp
pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
    BreakDurationGenerator = static args =>
    {
        // 1st: 15s, 2nd: 30s, 3rd: 60s, max 120s
        var duration = TimeSpan.FromSeconds(
            Math.Min(15 * Math.Pow(2, args.FailureCount - 1), 120));
        return ValueTask.FromResult(duration);
    }
});
```

## 5. The Retry Pattern — The art of retrying properly

Retry sounds simple but done wrong it causes disasters. Three golden rules:

Exponential Backoff — grow the gap between retries

Jitter Add randomness to avoid thundering herds

Idempotent Only retry safe/idempotent operations

### 5.1. Exponential Backoff + Jitter

Without jitter, 1,000 clients retrying at the same 1 s, 2 s, 4 s, 8 s marks create periodic traffic spikes. Jitter spreads retries randomly, reducing server pressure.

```
graph LR
    subgraph Without_Jitter["No Jitter"]
        A1["t=1s: 1000 req"] --> A2["t=2s: 1000 req"] --> A3["t=4s: 1000 req"]
    end
    subgraph With_Jitter["With Jitter"]
        B1["t=0.8-1.2s: ~330 req"] --> B2["t=1.6-2.4s: ~330 req"] --> B3["t=3.2-4.8s: ~340 req"]
    end
    style Without_Jitter fill:#ffebee,stroke:#c62828
    style With_Jitter fill:#e8f5e9,stroke:#2e7d32
    style A1 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style A2 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style A3 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style B1 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
    style B2 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
    style B3 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50

```
Jitter spreads retry traffic, preventing retry storms

### 5.2. Disable retries for unsafe methods

A POST that creates an order, retried 3 times, could create 3 orders. .NET 10 provides a clean API:

```csharp
builder.Services
    .AddHttpClient<OrderClient>()
    .AddStandardResilienceHandler(options =>
    {
        // Disable retries for POST, PUT, DELETE — only retry GET/HEAD
        options.Retry.DisableForUnsafeHttpMethods();
    });
```

#### When CAN you retry POST?

If the downstream API supports an **idempotency key** (e.g. Stripe, PayPal), you can retry POST safely because the server deduplicates based on the key. In that case, don't disable POST retries — just make sure every request carries a unique `Idempotency-Key` header.

## 6. Hedging — Parallel requests to reduce latency

Program.cs — Hedging with A/B routing

```csharp
builder.Services
    .AddHttpClient<SearchClient>()
    .AddStandardHedgingHandler(routingBuilder =>
    {
        routingBuilder.ConfigureWeightedGroups(options =>
        {
            options.SelectionMode = WeightedGroupSelectionMode.EveryAttempt;
            options.Groups.Add(new WeightedUriEndpointGroup
            {
                Endpoints =
                {
                    new() { Uri = new("https://search-primary.internal"), Weight = 70 },
                    new() { Uri = new("https://search-secondary.internal"), Weight = 30 }
                }
            });
        });
    });
```

| Characteristic | Retry | Hedging |
| --- | --- | --- |
| When to fire the next request? | After the previous one fails | After a delay (default 2 s) regardless of the previous |
| Concurrent requests | 1 | Many (up to 10 by default) |
| Primary use case | Transient failures, 500/408/429 responses | High tail latency, multi-region, read-heavy |
| Cost | Low — sequential requests | Higher — parallel requests consume resources |

## 7. Timeout strategy — Two-layer protection

Timeouts sound simple but are among the most common bugs. No timeout → threads block forever. Too short → legitimate requests get cancelled. .NET 10 + Polly solve this with two layers of timeout:

```csharp
builder.Services
    .AddHttpClient<ReportClient>()
    .AddResilienceHandler("ReportPipeline", builder =>
    {
        // Total timeout: the whole operation (including retries) ≤ 60s
        builder.AddTimeout(new TimeoutStrategyOptions
        {
            Timeout = TimeSpan.FromSeconds(60),
            Name = "TotalTimeout"
        });

// Retry 3 times
        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromSeconds(1)
        });

// Per-attempt timeout: each individual request ≤ 15s
        builder.AddTimeout(new TimeoutStrategyOptions
        {
            Timeout = TimeSpan.FromSeconds(15),
            Name = "AttemptTimeout"
        });
    });
```

#### Order matters!

## 8. Integrating with OpenTelemetry

Resilience patterns mean nothing if you can't see how they're behaving. Polly v8 integrates with OpenTelemetry out of the box, exporting metrics and traces for every strategy.

Program.cs — Observability

```csharp
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddMeter("Polly");           // Polly metrics: retry count, circuit state, etc.
        metrics.AddMeter("System.Net.Http"); // HttpClient metrics
        metrics.AddPrometheusExporter();
    })
    .WithTracing(tracing =>
    {
        tracing.AddHttpClientInstrumentation();
        tracing.AddOtlpExporter();
    });
```
Key Polly metrics to watch:

| Metric | Meaning | Alert when |
| --- | --- | --- |
| `polly_strategy_attempt_count` | Total executions (including retries) | Spikes → downstream is struggling |
| `polly_strategy_attempt_duration` | Time per attempt | p99 > attempt timeout |
| `polly_circuit_breaker_state` | Circuit state (0=Closed, 1=Open, 2=HalfOpen) | State = 1 for more than 5 minutes |
| `polly_rate_limiter_queue_duration` | Time a request waits in the rate-limiter queue | > 500 ms |

## 9. Dynamic reload — Changing config at runtime

One of Polly's most powerful features on .NET 10: you can change resilience configuration without restarting the app. Config binds from `appsettings.json` and auto-reloads when the file changes.

appsettings.json

```json
{
  "PaymentResilience": {
    "Retry": {
      "BackoffType": "Exponential",
      "UseJitter": true,
      "MaxRetryAttempts": 3,
      "Delay": "00:00:01"
    },
    "CircuitBreaker": {
      "FailureRatio": 0.1,
      "SamplingDuration": "00:00:30",
      "MinimumThroughput": 100,
      "BreakDuration": "00:00:05"
    },
    "TotalRequestTimeout": {
      "Timeout": "00:00:30"
    }
  }
}
```

Program.cs — Binding the config

```csharp
var resilienceSection = builder.Configuration.GetSection("PaymentResilience");
builder.Services.Configure<HttpStandardResilienceOptions>(resilienceSection);

builder.Services
    .AddHttpClient<PaymentClient>()
    .AddStandardResilienceHandler();
```
When ops need to bump retry from 3 to 5 or extend the timeout from 30 s to 60 s, they just update config and deploy — no code changes, no rebuilds.

## 10. Production best practices

Rule 1

**Always set a timeout.** Every HTTP call must have a timeout. No exceptions. HttpClient's default isn't "none" — it's 100 s, which is still way too long. Set a total timeout matching the downstream's SLA.

Rule 2

**A circuit breaker for every dependency.** Each downstream service needs its own circuit breaker. Use named HttpClient + `AddResilienceHandler` to keep them separate — Payment's circuit should never affect Inventory.

Rule 3

**Don't retry write operations** (unless you have an idempotency key). POST, PUT, DELETE — each retry can produce a side effect. `DisableForUnsafeHttpMethods()` is the safest choice.

Rule 4

**Monitor circuit-breaker state.** An open circuit is an important warning sign. Wire up OpenTelemetry + alerts when the circuit stays open more than 5 minutes — at that point it's not a transient fault, it's a real incident.

Rule 5

**Test resilience behavior.** Use chaos engineering (Simmy — Polly's fault-injection library) to inject timeouts and exceptions in staging. Don't wait for production to fail to find out whether your circuit breaker works.

Rule 6

**Only one resilience handler per HttpClient.** Don't stack `AddStandardResilienceHandler()` with `AddResilienceHandler()` — use `RemoveAllResilienceHandlers()` if you need a custom pipeline to replace the standard one.

## 11. A real-world example — Resilient e-commerce architecture

Below is a complete resilience setup for an e-commerce system with 3 downstream services, each with its own pipeline tuned to the traffic's characteristics:

Program.cs — Production-grade setup

```csharp
var builder = WebApplication.CreateBuilder(args);

// Payment: critical, no POST retry, tight circuit breaker
builder.Services
    .AddHttpClient<PaymentClient>(c => c.BaseAddress = new("https://payment.internal"))
    .AddResilienceHandler("Payment", pipeline =>
    {
        pipeline.AddTimeout(TimeSpan.FromSeconds(20));
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 2,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromMilliseconds(500),
            UseJitter = true,
            DisableFor = [HttpMethod.Post]  // Don't retry payment POSTs
        });
        pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.15,
            MinimumThroughput = 20,
            SamplingDuration = TimeSpan.FromSeconds(15),
            BreakDuration = TimeSpan.FromSeconds(60)
        });
        pipeline.AddTimeout(TimeSpan.FromSeconds(8));
    });

// Inventory: read-heavy, retry freely
builder.Services
    .AddHttpClient<InventoryClient>(c => c.BaseAddress = new("https://inventory.internal"))
    .AddStandardResilienceHandler();

// Notification: non-critical, short timeout, fail silently
builder.Services
    .AddHttpClient<NotificationClient>(c => c.BaseAddress = new("https://notify.internal"))
    .AddResilienceHandler("Notification", pipeline =>
    {
        pipeline.AddTimeout(TimeSpan.FromSeconds(5));
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 1,
            Delay = TimeSpan.FromMilliseconds(200)
        });
        pipeline.AddTimeout(TimeSpan.FromSeconds(3));
    });
```

#### Fallback for Notification

## 12. Anti-patterns to avoid

| Anti-pattern | Problem | Solution |
| --- | --- | --- |
| Retry without backoff | Immediate retries → retry storm crushes the server | Always use exponential backoff + jitter |
| Retry everything | Retrying 404 or 401 — errors that never self-heal | Only retry 408, 429, 500, 502, 503, TimeoutException |
| Shared circuit breaker | One circuit breaker for all downstreams → Payment failures also trip Inventory | Each downstream gets a named HttpClient + its own circuit |
| Checking circuit state before Execute | Race condition + blocks HalfOpen transition | Always call Execute and let Polly handle state |
| Too many retries | 10 retries × 5 s timeout = 50 s — the user left long ago | 3-5 retries max, total timeout ≤ SLA |

## Conclusion

Resilience patterns aren't a "nice-to-have" — they are a **hard requirement** for any microservices system running in production. With Polly v8 and `Microsoft.Extensions.Http.Resilience` on .NET 10, integration is easier than ever — one line of `AddStandardResilienceHandler()` covers 90% of use cases, and full customization covers the remaining 10%. Combined with OpenTelemetry for monitoring, you get a self-healing, observable system ready for any incident.

**References:**

- [Polly Documentation — pollydocs.org](https://www.pollydocs.org/)
- [Build resilient HTTP apps — Microsoft Learn](https://learn.microsoft.com/en-us/dotnet/core/resilience/http-resilience)
- [Polly GitHub Repository](https://github.com/App-vNext/Polly)
- [Microsoft.Extensions.Http.Resilience — NuGet](https://www.nuget.org/packages/Microsoft.Extensions.Http.Resilience)
- [Introduction to resilient app development — Microsoft Learn](https://learn.microsoft.com/en-us/dotnet/core/resilience/)

Multi-Tenancy on .NET 10 — Data Isolation Strategies, Row-Level Security, and Caching for Production SaaS 2026

AWS Lambda Serverless 2026: Architecture, SnapStart, Event-Driven Patterns, and the Production Free Tier

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.