Resilience Patterns on .NET 10 — Polly, Circuit Breaker, and Retry for Microservices

Posted on: 4/17/2026 1:09:18 PM

In a microservices architecture, one service calling another over HTTP is a daily occurrence. But the network is never trustworthy — timeouts, overloaded servers, DNS flaps, or simply a deployment rolling update. Without a strategy for handling transient faults, a single slow service can bring the whole system down in a domino effect. This article digs into Resilience Patterns on .NET 10 with Polly and the Microsoft.Extensions.Http.Resilience package — the industry-standard toolkit that lets an application recover from failures on its own.

350M+ Polly downloads on NuGet
5 layers Standard Resilience Pipeline
v10.4 Microsoft.Extensions.Http.Resilience
< 3ms Average overhead per request

1. Why do we need Resilience Patterns?

Imagine an e-commerce system with 20 microservices. The Order service calls Payment, Payment calls Fraud Detection, Fraud Detection calls ML Scoring. When ML Scoring slows down to 10 seconds instead of the usual 200 ms, what happens?

graph LR
    A[Order Service] -->|HTTP| B[Payment Service]
    B -->|HTTP| C[Fraud Detection]
    C -->|HTTP| D["ML Scoring (slow ⚠️)"]
    D -.->|10s timeout| C
    C -.->|thread blocked| B
    B -.->|thread pool exhausted| A
    A -.->|503 to user| E[Client]
    style D fill:#ff9800,stroke:#e65100,color:#fff
    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style E fill:#e94560,stroke:#fff,color:#fff

Cascading failure — one slow service drags the entire chain down

Without resilience patterns, each service's thread pool fills up with requests waiting on downstream. When the thread pool is exhausted, the service can't serve any request — even ones unrelated to ML Scoring. That's a cascading failure, and it can take down an entire system in minutes.

Resilience ≠ just retry

Many developers think resilience is simply "try again on failure". In reality, retrying incorrectly makes things worse — thousands of clients retrying in unison create a retry storm that crushes an already-overloaded service. Resilience patterns are the smart combination of retry, circuit breaker, timeout, rate limiter, and fallback.

2. Polly v8 — A brand-new architecture

Polly v8 (current on .NET 10) has been rewritten from scratch around the Resilience Pipeline architecture — completely replacing the old Policy-based API. Pipelines let you stack multiple strategies in a defined order, with each strategy operating independently.

graph TB
    subgraph Pipeline["Resilience Pipeline"]
        direction TB
        RL["1. Rate Limiter"] --> TT["2. Total Timeout (30s)"]
        TT --> RT["3. Retry (3 attempts, exponential)"]
        RT --> CB["4. Circuit Breaker"]
        CB --> AT["5. Attempt Timeout (10s)"]
    end
    REQ["HTTP Request"] --> RL
    AT --> SVC["Downstream Service"]
    style Pipeline fill:#f8f9fa,stroke:#e0e0e0
    style REQ fill:#e94560,stroke:#fff,color:#fff
    style SVC fill:#2c3e50,stroke:#fff,color:#fff
    style RL fill:#fff,stroke:#e94560,color:#2c3e50
    style TT fill:#fff,stroke:#e94560,color:#2c3e50
    style RT fill:#fff,stroke:#e94560,color:#2c3e50
    style CB fill:#fff,stroke:#e94560,color:#2c3e50
    style AT fill:#fff,stroke:#e94560,color:#2c3e50

The 5 strategies in the Standard Resilience Pipeline (outermost → innermost)

3. Fast integration with AddStandardResilienceHandler

The fastest way to add resilience to an HttpClient on .NET 10 is AddStandardResilienceHandler() from the Microsoft.Extensions.Http.Resilience package. One line of code gives you five production-tuned layers of protection.

Program.cs
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;

var builder = Host.CreateApplicationBuilder(args);

// Register HttpClient with the standard resilience pipeline
builder.Services
    .AddHttpClient<PaymentClient>(client =>
    {
        client.BaseAddress = new Uri("https://payment-api.internal");
    })
    .AddStandardResilienceHandler();

var app = builder.Build();
await app.RunAsync();

With the defaults, the pipeline will:

Order Strategy Default Effect
1 Rate Limiter 1,000 concurrent permits Prevents the client from firing too many simultaneous requests
2 Total Timeout 30 seconds Caps total time including retries
3 Retry 3 attempts, exponential backoff + jitter Automatically retries on transient errors
4 Circuit Breaker 10% failure ratio, 100 min throughput Trips when the downstream fails repeatedly
5 Attempt Timeout 10 seconds Caps the time for a single attempt

Why two timeouts?

Attempt Timeout (10 s) bounds each attempt — if one request exceeds 10 s, it's cancelled to make room for the next retry. Total Timeout (30 s) is the overall "time budget" — regardless of retry count, total time stays under 30 s. Without a total timeout, 3 retries × 10 s = 30 s + backoff delay can stretch past 40 s.

4. Circuit Breaker — A smart electrical-breaker mechanism

Circuit Breaker is the most important resilience pattern. Rather than keep hitting an ailing service (piling on load), the circuit breaker auto-trips and fails fast — like an electrical breaker cutting the circuit to protect hardware.

stateDiagram-v2
    [*] --> Closed
    Closed --> Open : Failure ratio exceeds threshold
    Open --> HalfOpen : After break duration
    HalfOpen --> Closed : Probe request succeeds
    HalfOpen --> Open : Probe request fails
    Open --> Isolated : Manual isolate
    Isolated --> Closed : Manual close

    note right of Closed : Requests pass normally\nSampling failure ratio
    note right of Open : Requests rejected immediately\nThrows BrokenCircuitException
    note right of HalfOpen : Allows one probe request\nDecides close/reopen

The Circuit Breaker state machine — 4 states

4.1. Detailed Circuit Breaker configuration

PaymentClientResilience.cs
builder.Services
    .AddHttpClient<PaymentClient>(client =>
    {
        client.BaseAddress = new Uri("https://payment-api.internal");
    })
    .AddResilienceHandler("PaymentPipeline", static pipelineBuilder =>
    {
        // Circuit Breaker: trip when 20% of requests fail
        pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.2,                              // 20% failure → open circuit
            SamplingDuration = TimeSpan.FromSeconds(10),     // 10-second sampling window
            MinimumThroughput = 8,                           // Need at least 8 requests for stats
            BreakDuration = TimeSpan.FromSeconds(30),        // Keep circuit open for 30s
            ShouldHandle = static args => ValueTask.FromResult(args is
            {
                Outcome.Result.StatusCode:
                    HttpStatusCode.RequestTimeout or
                    HttpStatusCode.TooManyRequests or
                    HttpStatusCode.InternalServerError or
                    HttpStatusCode.ServiceUnavailable
            })
        });

        // Retry: 5 attempts with exponential backoff
        pipelineBuilder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 5,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromMilliseconds(500),
            UseJitter = true
        });

        // Timeout: 5 seconds per attempt
        pipelineBuilder.AddTimeout(TimeSpan.FromSeconds(5));
    });

Watch out for MinimumThroughput

If MinimumThroughput isn't reached within SamplingDuration, the circuit breaker ignores the failure ratio. This keeps the circuit from tripping under low traffic (e.g. 2 failed requests out of only 2 total in 10 s). Set the value to match your service's real traffic.

4.2. Dynamic Break Duration

Instead of a fixed 30-second break, you can increase the break duration as the circuit opens repeatedly — exponential backoff at the circuit-breaker level:

pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
    BreakDurationGenerator = static args =>
    {
        // 1st: 15s, 2nd: 30s, 3rd: 60s, max 120s
        var duration = TimeSpan.FromSeconds(
            Math.Min(15 * Math.Pow(2, args.FailureCount - 1), 120));
        return ValueTask.FromResult(duration);
    }
});

5. The Retry Pattern — The art of retrying properly

Retry sounds simple but done wrong it causes disasters. Three golden rules:

Exponential Backoff — grow the gap between retries
Jitter Add randomness to avoid thundering herds
Idempotent Only retry safe/idempotent operations

5.1. Exponential Backoff + Jitter

Without jitter, 1,000 clients retrying at the same 1 s, 2 s, 4 s, 8 s marks create periodic traffic spikes. Jitter spreads retries randomly, reducing server pressure.

graph LR
    subgraph Without_Jitter["No Jitter"]
        A1["t=1s: 1000 req"] --> A2["t=2s: 1000 req"] --> A3["t=4s: 1000 req"]
    end
    subgraph With_Jitter["With Jitter"]
        B1["t=0.8-1.2s: ~330 req"] --> B2["t=1.6-2.4s: ~330 req"] --> B3["t=3.2-4.8s: ~340 req"]
    end
    style Without_Jitter fill:#ffebee,stroke:#c62828
    style With_Jitter fill:#e8f5e9,stroke:#2e7d32
    style A1 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style A2 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style A3 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
    style B1 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
    style B2 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
    style B3 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50

Jitter spreads retry traffic, preventing retry storms

5.2. Disable retries for unsafe methods

A POST that creates an order, retried 3 times, could create 3 orders. .NET 10 provides a clean API:

builder.Services
    .AddHttpClient<OrderClient>()
    .AddStandardResilienceHandler(options =>
    {
        // Disable retries for POST, PUT, DELETE — only retry GET/HEAD
        options.Retry.DisableForUnsafeHttpMethods();
    });

When CAN you retry POST?

If the downstream API supports an idempotency key (e.g. Stripe, PayPal), you can retry POST safely because the server deduplicates based on the key. In that case, don't disable POST retries — just make sure every request carries a unique Idempotency-Key header.

6. Hedging — Parallel requests to reduce latency

Hedging is an advanced strategy: when the first request is slow, fire additional requests in parallel to another endpoint (or the same one) and take whichever responds first. Particularly useful with multiple replicas or a multi-region deployment.

Program.cs — Hedging with A/B routing
builder.Services
    .AddHttpClient<SearchClient>()
    .AddStandardHedgingHandler(routingBuilder =>
    {
        routingBuilder.ConfigureWeightedGroups(options =>
        {
            options.SelectionMode = WeightedGroupSelectionMode.EveryAttempt;
            options.Groups.Add(new WeightedUriEndpointGroup
            {
                Endpoints =
                {
                    new() { Uri = new("https://search-primary.internal"), Weight = 70 },
                    new() { Uri = new("https://search-secondary.internal"), Weight = 30 }
                }
            });
        });
    });
Characteristic Retry Hedging
When to fire the next request? After the previous one fails After a delay (default 2 s) regardless of the previous
Concurrent requests 1 Many (up to 10 by default)
Primary use case Transient failures, 500/408/429 responses High tail latency, multi-region, read-heavy
Cost Low — sequential requests Higher — parallel requests consume resources

7. Timeout strategy — Two-layer protection

Timeouts sound simple but are among the most common bugs. No timeout → threads block forever. Too short → legitimate requests get cancelled. .NET 10 + Polly solve this with two layers of timeout:

builder.Services
    .AddHttpClient<ReportClient>()
    .AddResilienceHandler("ReportPipeline", builder =>
    {
        // Total timeout: the whole operation (including retries) ≤ 60s
        builder.AddTimeout(new TimeoutStrategyOptions
        {
            Timeout = TimeSpan.FromSeconds(60),
            Name = "TotalTimeout"
        });

        // Retry 3 times
        builder.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 3,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromSeconds(1)
        });

        // Per-attempt timeout: each individual request ≤ 15s
        builder.AddTimeout(new TimeoutStrategyOptions
        {
            Timeout = TimeSpan.FromSeconds(15),
            Name = "AttemptTimeout"
        });
    });

Order matters!

Polly applies strategies from the outside in. Total Timeout must be outermost (before Retry); Attempt Timeout must be innermost (after Retry). If reversed, the total timeout would only cover one attempt instead of the whole operation.

8. Integrating with OpenTelemetry

Resilience patterns mean nothing if you can't see how they're behaving. Polly v8 integrates with OpenTelemetry out of the box, exporting metrics and traces for every strategy.

Program.cs — Observability
builder.Services.AddOpenTelemetry()
    .WithMetrics(metrics =>
    {
        metrics.AddMeter("Polly");           // Polly metrics: retry count, circuit state, etc.
        metrics.AddMeter("System.Net.Http"); // HttpClient metrics
        metrics.AddPrometheusExporter();
    })
    .WithTracing(tracing =>
    {
        tracing.AddHttpClientInstrumentation();
        tracing.AddOtlpExporter();
    });

Key Polly metrics to watch:

Metric Meaning Alert when
polly_strategy_attempt_count Total executions (including retries) Spikes → downstream is struggling
polly_strategy_attempt_duration Time per attempt p99 > attempt timeout
polly_circuit_breaker_state Circuit state (0=Closed, 1=Open, 2=HalfOpen) State = 1 for more than 5 minutes
polly_rate_limiter_queue_duration Time a request waits in the rate-limiter queue > 500 ms

9. Dynamic reload — Changing config at runtime

One of Polly's most powerful features on .NET 10: you can change resilience configuration without restarting the app. Config binds from appsettings.json and auto-reloads when the file changes.

appsettings.json
{
  "PaymentResilience": {
    "Retry": {
      "BackoffType": "Exponential",
      "UseJitter": true,
      "MaxRetryAttempts": 3,
      "Delay": "00:00:01"
    },
    "CircuitBreaker": {
      "FailureRatio": 0.1,
      "SamplingDuration": "00:00:30",
      "MinimumThroughput": 100,
      "BreakDuration": "00:00:05"
    },
    "TotalRequestTimeout": {
      "Timeout": "00:00:30"
    }
  }
}
Program.cs — Binding the config
var resilienceSection = builder.Configuration.GetSection("PaymentResilience");
builder.Services.Configure<HttpStandardResilienceOptions>(resilienceSection);

builder.Services
    .AddHttpClient<PaymentClient>()
    .AddStandardResilienceHandler();

When ops need to bump retry from 3 to 5 or extend the timeout from 30 s to 60 s, they just update config and deploy — no code changes, no rebuilds.

10. Production best practices

Rule 1
Always set a timeout. Every HTTP call must have a timeout. No exceptions. HttpClient's default isn't "none" — it's 100 s, which is still way too long. Set a total timeout matching the downstream's SLA.
Rule 2
A circuit breaker for every dependency. Each downstream service needs its own circuit breaker. Use named HttpClient + AddResilienceHandler to keep them separate — Payment's circuit should never affect Inventory.
Rule 3
Don't retry write operations (unless you have an idempotency key). POST, PUT, DELETE — each retry can produce a side effect. DisableForUnsafeHttpMethods() is the safest choice.
Rule 4
Monitor circuit-breaker state. An open circuit is an important warning sign. Wire up OpenTelemetry + alerts when the circuit stays open more than 5 minutes — at that point it's not a transient fault, it's a real incident.
Rule 5
Test resilience behavior. Use chaos engineering (Simmy — Polly's fault-injection library) to inject timeouts and exceptions in staging. Don't wait for production to fail to find out whether your circuit breaker works.
Rule 6
Only one resilience handler per HttpClient. Don't stack AddStandardResilienceHandler() with AddResilienceHandler() — use RemoveAllResilienceHandlers() if you need a custom pipeline to replace the standard one.

11. A real-world example — Resilient e-commerce architecture

Below is a complete resilience setup for an e-commerce system with 3 downstream services, each with its own pipeline tuned to the traffic's characteristics:

Program.cs — Production-grade setup
var builder = WebApplication.CreateBuilder(args);

// Payment: critical, no POST retry, tight circuit breaker
builder.Services
    .AddHttpClient<PaymentClient>(c => c.BaseAddress = new("https://payment.internal"))
    .AddResilienceHandler("Payment", pipeline =>
    {
        pipeline.AddTimeout(TimeSpan.FromSeconds(20));
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 2,
            BackoffType = DelayBackoffType.Exponential,
            Delay = TimeSpan.FromMilliseconds(500),
            UseJitter = true,
            DisableFor = [HttpMethod.Post]  // Don't retry payment POSTs
        });
        pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
        {
            FailureRatio = 0.15,
            MinimumThroughput = 20,
            SamplingDuration = TimeSpan.FromSeconds(15),
            BreakDuration = TimeSpan.FromSeconds(60)
        });
        pipeline.AddTimeout(TimeSpan.FromSeconds(8));
    });

// Inventory: read-heavy, retry freely
builder.Services
    .AddHttpClient<InventoryClient>(c => c.BaseAddress = new("https://inventory.internal"))
    .AddStandardResilienceHandler();

// Notification: non-critical, short timeout, fail silently
builder.Services
    .AddHttpClient<NotificationClient>(c => c.BaseAddress = new("https://notify.internal"))
    .AddResilienceHandler("Notification", pipeline =>
    {
        pipeline.AddTimeout(TimeSpan.FromSeconds(5));
        pipeline.AddRetry(new HttpRetryStrategyOptions
        {
            MaxRetryAttempts = 1,
            Delay = TimeSpan.FromMilliseconds(200)
        });
        pipeline.AddTimeout(TimeSpan.FromSeconds(3));
    });

Fallback for Notification

Notification is non-critical — even if an email fails to send, the order must still succeed. In practice, wrap the notification call in try-catch and log the error instead of letting the exception bubble up. Better yet: push notifications to a message queue (RabbitMQ, Azure Service Bus) for async processing.

12. Anti-patterns to avoid

Anti-pattern Problem Solution
Retry without backoff Immediate retries → retry storm crushes the server Always use exponential backoff + jitter
Retry everything Retrying 404 or 401 — errors that never self-heal Only retry 408, 429, 500, 502, 503, TimeoutException
Shared circuit breaker One circuit breaker for all downstreams → Payment failures also trip Inventory Each downstream gets a named HttpClient + its own circuit
Checking circuit state before Execute Race condition + blocks HalfOpen transition Always call Execute and let Polly handle state
Too many retries 10 retries × 5 s timeout = 50 s — the user left long ago 3-5 retries max, total timeout ≤ SLA

Conclusion

Resilience patterns aren't a "nice-to-have" — they are a hard requirement for any microservices system running in production. With Polly v8 and Microsoft.Extensions.Http.Resilience on .NET 10, integration is easier than ever — one line of AddStandardResilienceHandler() covers 90% of use cases, and full customization covers the remaining 10%. Combined with OpenTelemetry for monitoring, you get a self-healing, observable system ready for any incident.

References: