Resilience Patterns on .NET 10 — Polly, Circuit Breaker, and Retry for Microservices
Posted on: 4/17/2026 1:09:18 PM
Table of contents
- 1. Why do we need Resilience Patterns?
- 2. Polly v8 — A brand-new architecture
- 3. Fast integration with AddStandardResilienceHandler
- 4. Circuit Breaker — A smart electrical-breaker mechanism
- 5. The Retry Pattern — The art of retrying properly
- 6. Hedging — Parallel requests to reduce latency
- 7. Timeout strategy — Two-layer protection
- 8. Integrating with OpenTelemetry
- 9. Dynamic reload — Changing config at runtime
- 10. Production best practices
- 11. A real-world example — Resilient e-commerce architecture
- 12. Anti-patterns to avoid
- Conclusion
In a microservices architecture, one service calling another over HTTP is a daily occurrence. But the network is never trustworthy — timeouts, overloaded servers, DNS flaps, or simply a deployment rolling update. Without a strategy for handling transient faults, a single slow service can bring the whole system down in a domino effect. This article digs into Resilience Patterns on .NET 10 with Polly and the Microsoft.Extensions.Http.Resilience package — the industry-standard toolkit that lets an application recover from failures on its own.
1. Why do we need Resilience Patterns?
Imagine an e-commerce system with 20 microservices. The Order service calls Payment, Payment calls Fraud Detection, Fraud Detection calls ML Scoring. When ML Scoring slows down to 10 seconds instead of the usual 200 ms, what happens?
graph LR
A[Order Service] -->|HTTP| B[Payment Service]
B -->|HTTP| C[Fraud Detection]
C -->|HTTP| D["ML Scoring (slow ⚠️)"]
D -.->|10s timeout| C
C -.->|thread blocked| B
B -.->|thread pool exhausted| A
A -.->|503 to user| E[Client]
style D fill:#ff9800,stroke:#e65100,color:#fff
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style E fill:#e94560,stroke:#fff,color:#fff
Cascading failure — one slow service drags the entire chain down
Without resilience patterns, each service's thread pool fills up with requests waiting on downstream. When the thread pool is exhausted, the service can't serve any request — even ones unrelated to ML Scoring. That's a cascading failure, and it can take down an entire system in minutes.
Resilience ≠ just retry
Many developers think resilience is simply "try again on failure". In reality, retrying incorrectly makes things worse — thousands of clients retrying in unison create a retry storm that crushes an already-overloaded service. Resilience patterns are the smart combination of retry, circuit breaker, timeout, rate limiter, and fallback.
2. Polly v8 — A brand-new architecture
Polly v8 (current on .NET 10) has been rewritten from scratch around the Resilience Pipeline architecture — completely replacing the old Policy-based API. Pipelines let you stack multiple strategies in a defined order, with each strategy operating independently.
graph TB
subgraph Pipeline["Resilience Pipeline"]
direction TB
RL["1. Rate Limiter"] --> TT["2. Total Timeout (30s)"]
TT --> RT["3. Retry (3 attempts, exponential)"]
RT --> CB["4. Circuit Breaker"]
CB --> AT["5. Attempt Timeout (10s)"]
end
REQ["HTTP Request"] --> RL
AT --> SVC["Downstream Service"]
style Pipeline fill:#f8f9fa,stroke:#e0e0e0
style REQ fill:#e94560,stroke:#fff,color:#fff
style SVC fill:#2c3e50,stroke:#fff,color:#fff
style RL fill:#fff,stroke:#e94560,color:#2c3e50
style TT fill:#fff,stroke:#e94560,color:#2c3e50
style RT fill:#fff,stroke:#e94560,color:#2c3e50
style CB fill:#fff,stroke:#e94560,color:#2c3e50
style AT fill:#fff,stroke:#e94560,color:#2c3e50
The 5 strategies in the Standard Resilience Pipeline (outermost → innermost)
3. Fast integration with AddStandardResilienceHandler
The fastest way to add resilience to an HttpClient on .NET 10 is AddStandardResilienceHandler() from the Microsoft.Extensions.Http.Resilience package. One line of code gives you five production-tuned layers of protection.
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
var builder = Host.CreateApplicationBuilder(args);
// Register HttpClient with the standard resilience pipeline
builder.Services
.AddHttpClient<PaymentClient>(client =>
{
client.BaseAddress = new Uri("https://payment-api.internal");
})
.AddStandardResilienceHandler();
var app = builder.Build();
await app.RunAsync();
With the defaults, the pipeline will:
| Order | Strategy | Default | Effect |
|---|---|---|---|
| 1 | Rate Limiter | 1,000 concurrent permits | Prevents the client from firing too many simultaneous requests |
| 2 | Total Timeout | 30 seconds | Caps total time including retries |
| 3 | Retry | 3 attempts, exponential backoff + jitter | Automatically retries on transient errors |
| 4 | Circuit Breaker | 10% failure ratio, 100 min throughput | Trips when the downstream fails repeatedly |
| 5 | Attempt Timeout | 10 seconds | Caps the time for a single attempt |
Why two timeouts?
Attempt Timeout (10 s) bounds each attempt — if one request exceeds 10 s, it's cancelled to make room for the next retry. Total Timeout (30 s) is the overall "time budget" — regardless of retry count, total time stays under 30 s. Without a total timeout, 3 retries × 10 s = 30 s + backoff delay can stretch past 40 s.
4. Circuit Breaker — A smart electrical-breaker mechanism
Circuit Breaker is the most important resilience pattern. Rather than keep hitting an ailing service (piling on load), the circuit breaker auto-trips and fails fast — like an electrical breaker cutting the circuit to protect hardware.
stateDiagram-v2
[*] --> Closed
Closed --> Open : Failure ratio exceeds threshold
Open --> HalfOpen : After break duration
HalfOpen --> Closed : Probe request succeeds
HalfOpen --> Open : Probe request fails
Open --> Isolated : Manual isolate
Isolated --> Closed : Manual close
note right of Closed : Requests pass normally\nSampling failure ratio
note right of Open : Requests rejected immediately\nThrows BrokenCircuitException
note right of HalfOpen : Allows one probe request\nDecides close/reopen
The Circuit Breaker state machine — 4 states
4.1. Detailed Circuit Breaker configuration
builder.Services
.AddHttpClient<PaymentClient>(client =>
{
client.BaseAddress = new Uri("https://payment-api.internal");
})
.AddResilienceHandler("PaymentPipeline", static pipelineBuilder =>
{
// Circuit Breaker: trip when 20% of requests fail
pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
FailureRatio = 0.2, // 20% failure → open circuit
SamplingDuration = TimeSpan.FromSeconds(10), // 10-second sampling window
MinimumThroughput = 8, // Need at least 8 requests for stats
BreakDuration = TimeSpan.FromSeconds(30), // Keep circuit open for 30s
ShouldHandle = static args => ValueTask.FromResult(args is
{
Outcome.Result.StatusCode:
HttpStatusCode.RequestTimeout or
HttpStatusCode.TooManyRequests or
HttpStatusCode.InternalServerError or
HttpStatusCode.ServiceUnavailable
})
});
// Retry: 5 attempts with exponential backoff
pipelineBuilder.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 5,
BackoffType = DelayBackoffType.Exponential,
Delay = TimeSpan.FromMilliseconds(500),
UseJitter = true
});
// Timeout: 5 seconds per attempt
pipelineBuilder.AddTimeout(TimeSpan.FromSeconds(5));
});
Watch out for MinimumThroughput
If MinimumThroughput isn't reached within SamplingDuration, the circuit breaker ignores the failure ratio. This keeps the circuit from tripping under low traffic (e.g. 2 failed requests out of only 2 total in 10 s). Set the value to match your service's real traffic.
4.2. Dynamic Break Duration
Instead of a fixed 30-second break, you can increase the break duration as the circuit opens repeatedly — exponential backoff at the circuit-breaker level:
pipelineBuilder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
BreakDurationGenerator = static args =>
{
// 1st: 15s, 2nd: 30s, 3rd: 60s, max 120s
var duration = TimeSpan.FromSeconds(
Math.Min(15 * Math.Pow(2, args.FailureCount - 1), 120));
return ValueTask.FromResult(duration);
}
});
5. The Retry Pattern — The art of retrying properly
Retry sounds simple but done wrong it causes disasters. Three golden rules:
5.1. Exponential Backoff + Jitter
Without jitter, 1,000 clients retrying at the same 1 s, 2 s, 4 s, 8 s marks create periodic traffic spikes. Jitter spreads retries randomly, reducing server pressure.
graph LR
subgraph Without_Jitter["No Jitter"]
A1["t=1s: 1000 req"] --> A2["t=2s: 1000 req"] --> A3["t=4s: 1000 req"]
end
subgraph With_Jitter["With Jitter"]
B1["t=0.8-1.2s: ~330 req"] --> B2["t=1.6-2.4s: ~330 req"] --> B3["t=3.2-4.8s: ~340 req"]
end
style Without_Jitter fill:#ffebee,stroke:#c62828
style With_Jitter fill:#e8f5e9,stroke:#2e7d32
style A1 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
style A2 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
style A3 fill:#ffcdd2,stroke:#c62828,color:#2c3e50
style B1 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
style B2 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
style B3 fill:#c8e6c9,stroke:#2e7d32,color:#2c3e50
Jitter spreads retry traffic, preventing retry storms
5.2. Disable retries for unsafe methods
A POST that creates an order, retried 3 times, could create 3 orders. .NET 10 provides a clean API:
builder.Services
.AddHttpClient<OrderClient>()
.AddStandardResilienceHandler(options =>
{
// Disable retries for POST, PUT, DELETE — only retry GET/HEAD
options.Retry.DisableForUnsafeHttpMethods();
});
When CAN you retry POST?
If the downstream API supports an idempotency key (e.g. Stripe, PayPal), you can retry POST safely because the server deduplicates based on the key. In that case, don't disable POST retries — just make sure every request carries a unique Idempotency-Key header.
6. Hedging — Parallel requests to reduce latency
Hedging is an advanced strategy: when the first request is slow, fire additional requests in parallel to another endpoint (or the same one) and take whichever responds first. Particularly useful with multiple replicas or a multi-region deployment.
builder.Services
.AddHttpClient<SearchClient>()
.AddStandardHedgingHandler(routingBuilder =>
{
routingBuilder.ConfigureWeightedGroups(options =>
{
options.SelectionMode = WeightedGroupSelectionMode.EveryAttempt;
options.Groups.Add(new WeightedUriEndpointGroup
{
Endpoints =
{
new() { Uri = new("https://search-primary.internal"), Weight = 70 },
new() { Uri = new("https://search-secondary.internal"), Weight = 30 }
}
});
});
});
| Characteristic | Retry | Hedging |
|---|---|---|
| When to fire the next request? | After the previous one fails | After a delay (default 2 s) regardless of the previous |
| Concurrent requests | 1 | Many (up to 10 by default) |
| Primary use case | Transient failures, 500/408/429 responses | High tail latency, multi-region, read-heavy |
| Cost | Low — sequential requests | Higher — parallel requests consume resources |
7. Timeout strategy — Two-layer protection
Timeouts sound simple but are among the most common bugs. No timeout → threads block forever. Too short → legitimate requests get cancelled. .NET 10 + Polly solve this with two layers of timeout:
builder.Services
.AddHttpClient<ReportClient>()
.AddResilienceHandler("ReportPipeline", builder =>
{
// Total timeout: the whole operation (including retries) ≤ 60s
builder.AddTimeout(new TimeoutStrategyOptions
{
Timeout = TimeSpan.FromSeconds(60),
Name = "TotalTimeout"
});
// Retry 3 times
builder.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 3,
BackoffType = DelayBackoffType.Exponential,
Delay = TimeSpan.FromSeconds(1)
});
// Per-attempt timeout: each individual request ≤ 15s
builder.AddTimeout(new TimeoutStrategyOptions
{
Timeout = TimeSpan.FromSeconds(15),
Name = "AttemptTimeout"
});
});
Order matters!
Polly applies strategies from the outside in. Total Timeout must be outermost (before Retry); Attempt Timeout must be innermost (after Retry). If reversed, the total timeout would only cover one attempt instead of the whole operation.
8. Integrating with OpenTelemetry
Resilience patterns mean nothing if you can't see how they're behaving. Polly v8 integrates with OpenTelemetry out of the box, exporting metrics and traces for every strategy.
builder.Services.AddOpenTelemetry()
.WithMetrics(metrics =>
{
metrics.AddMeter("Polly"); // Polly metrics: retry count, circuit state, etc.
metrics.AddMeter("System.Net.Http"); // HttpClient metrics
metrics.AddPrometheusExporter();
})
.WithTracing(tracing =>
{
tracing.AddHttpClientInstrumentation();
tracing.AddOtlpExporter();
});
Key Polly metrics to watch:
| Metric | Meaning | Alert when |
|---|---|---|
polly_strategy_attempt_count |
Total executions (including retries) | Spikes → downstream is struggling |
polly_strategy_attempt_duration |
Time per attempt | p99 > attempt timeout |
polly_circuit_breaker_state |
Circuit state (0=Closed, 1=Open, 2=HalfOpen) | State = 1 for more than 5 minutes |
polly_rate_limiter_queue_duration |
Time a request waits in the rate-limiter queue | > 500 ms |
9. Dynamic reload — Changing config at runtime
One of Polly's most powerful features on .NET 10: you can change resilience configuration without restarting the app. Config binds from appsettings.json and auto-reloads when the file changes.
{
"PaymentResilience": {
"Retry": {
"BackoffType": "Exponential",
"UseJitter": true,
"MaxRetryAttempts": 3,
"Delay": "00:00:01"
},
"CircuitBreaker": {
"FailureRatio": 0.1,
"SamplingDuration": "00:00:30",
"MinimumThroughput": 100,
"BreakDuration": "00:00:05"
},
"TotalRequestTimeout": {
"Timeout": "00:00:30"
}
}
}
var resilienceSection = builder.Configuration.GetSection("PaymentResilience");
builder.Services.Configure<HttpStandardResilienceOptions>(resilienceSection);
builder.Services
.AddHttpClient<PaymentClient>()
.AddStandardResilienceHandler();
When ops need to bump retry from 3 to 5 or extend the timeout from 30 s to 60 s, they just update config and deploy — no code changes, no rebuilds.
10. Production best practices
AddResilienceHandler to keep them separate — Payment's circuit should never affect Inventory.DisableForUnsafeHttpMethods() is the safest choice.AddStandardResilienceHandler() with AddResilienceHandler() — use RemoveAllResilienceHandlers() if you need a custom pipeline to replace the standard one.11. A real-world example — Resilient e-commerce architecture
Below is a complete resilience setup for an e-commerce system with 3 downstream services, each with its own pipeline tuned to the traffic's characteristics:
var builder = WebApplication.CreateBuilder(args);
// Payment: critical, no POST retry, tight circuit breaker
builder.Services
.AddHttpClient<PaymentClient>(c => c.BaseAddress = new("https://payment.internal"))
.AddResilienceHandler("Payment", pipeline =>
{
pipeline.AddTimeout(TimeSpan.FromSeconds(20));
pipeline.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 2,
BackoffType = DelayBackoffType.Exponential,
Delay = TimeSpan.FromMilliseconds(500),
UseJitter = true,
DisableFor = [HttpMethod.Post] // Don't retry payment POSTs
});
pipeline.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
FailureRatio = 0.15,
MinimumThroughput = 20,
SamplingDuration = TimeSpan.FromSeconds(15),
BreakDuration = TimeSpan.FromSeconds(60)
});
pipeline.AddTimeout(TimeSpan.FromSeconds(8));
});
// Inventory: read-heavy, retry freely
builder.Services
.AddHttpClient<InventoryClient>(c => c.BaseAddress = new("https://inventory.internal"))
.AddStandardResilienceHandler();
// Notification: non-critical, short timeout, fail silently
builder.Services
.AddHttpClient<NotificationClient>(c => c.BaseAddress = new("https://notify.internal"))
.AddResilienceHandler("Notification", pipeline =>
{
pipeline.AddTimeout(TimeSpan.FromSeconds(5));
pipeline.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 1,
Delay = TimeSpan.FromMilliseconds(200)
});
pipeline.AddTimeout(TimeSpan.FromSeconds(3));
});
Fallback for Notification
Notification is non-critical — even if an email fails to send, the order must still succeed. In practice, wrap the notification call in try-catch and log the error instead of letting the exception bubble up. Better yet: push notifications to a message queue (RabbitMQ, Azure Service Bus) for async processing.
12. Anti-patterns to avoid
| Anti-pattern | Problem | Solution |
|---|---|---|
| Retry without backoff | Immediate retries → retry storm crushes the server | Always use exponential backoff + jitter |
| Retry everything | Retrying 404 or 401 — errors that never self-heal | Only retry 408, 429, 500, 502, 503, TimeoutException |
| Shared circuit breaker | One circuit breaker for all downstreams → Payment failures also trip Inventory | Each downstream gets a named HttpClient + its own circuit |
| Checking circuit state before Execute | Race condition + blocks HalfOpen transition | Always call Execute and let Polly handle state |
| Too many retries | 10 retries × 5 s timeout = 50 s — the user left long ago | 3-5 retries max, total timeout ≤ SLA |
Conclusion
Resilience patterns aren't a "nice-to-have" — they are a hard requirement for any microservices system running in production. With Polly v8 and Microsoft.Extensions.Http.Resilience on .NET 10, integration is easier than ever — one line of AddStandardResilienceHandler() covers 90% of use cases, and full customization covers the remaining 10%. Combined with OpenTelemetry for monitoring, you get a self-healing, observable system ready for any incident.
References:
Multi-Tenancy on .NET 10 — Data Isolation Strategies, Row-Level Security, and Caching for Production SaaS 2026
AWS Lambda Serverless 2026: Architecture, SnapStart, Event-Driven Patterns, and the Production Free Tier
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.