OpenTelemetry in .NET: Traces, Metrics, Logs
How to wire OpenTelemetry into ASP.NET Core for distributed traces, metrics, structured logs. The single instrumentation pipeline that powers alerts.
Table of contents
- When does observability become non-negotiable?
- What numbers should I budget for the observability tier?
- What does the minimal observability pipeline look like?
- What is the .NET 10 wiring for OpenTelemetry?
- What custom traces and metrics should I add?
- How do I turn observability into useful alerts?
- What failure modes does observability itself introduce?
- When is observability investment premature?
- Where should you go from here?
The first time you have to debug a production issue without observability, you understand why every chapter in this series quotes a metric. Without traces, metrics, and logs you are guessing. With them, the post-mortem writes itself. This chapter wires OpenTelemetry into ASP.NET Core in the shape that has become the 2026 standard.
When does observability become non-negotiable?
Three signals.
The service is in production. Even one customer means an outage costs reputation. Without metrics you have no alert; without traces you have no fix; without logs you have no explanation.
The service has more than one instance. Logging to a local file on each box stops working the moment requests are spread across replicas. You need a central pipeline.
The service depends on other services. A slow checkout that turns out to be a slow third-party payment provider takes hours to diagnose without distributed traces and minutes with them.
If your code is a one-off script that runs once and you read the console, you do not need this chapter. Anything else, you do.
What numbers should I budget for the observability tier?
Signal Volume per req Cost driver Storage cost
Metrics ~constant unique label values cheap
Traces 1 trace sampling rate expensive
Logs ~5-20 lines line size + count medium
For a service at 10K QPS:
- Metrics: a few thousand series x 30s scrape = trivial.
- Traces at 100% = 36M trace/hour. With 1% sampling = 360K, much more affordable.
- Logs at 10/req = 100K log/s. Structured logs at 200 bytes each = 20 MB/s = 1.7 TB/day. Filter and sample.
Never log at INFO level on the hot path. WARN and above for steady state; DEBUG only when investigating.
What does the minimal observability pipeline look like?
flowchart LR
App[ASP.NET Core] -->|OTel SDK| Collector[OTel Collector]
Collector -->|metrics| Prom[(Prometheus)]
Collector -->|traces| Jaeger[(Jaeger / Tempo)]
Collector -->|logs| Loki[(Loki / OpenSearch)]
Prom --> Grafana
Jaeger --> Grafana
Loki --> Grafana
Grafana --> Alert[Alerts -> PagerDuty]
The application emits via OTel; the Collector demultiplexes by signal type into purpose-built backends; Grafana is the unified dashboard. The same pipeline supports Datadog, Honeycomb, Azure Monitor - swap the backend without touching app code.
What is the .NET 10 wiring for OpenTelemetry?
One block in Program.cs:
builder.Services.AddOpenTelemetry()
.ConfigureResource(r => r.AddService(
serviceName: builder.Environment.ApplicationName,
serviceVersion: Assembly.GetExecutingAssembly().GetName().Version?.ToString()))
.WithTracing(t => t
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddEntityFrameworkCoreInstrumentation()
.AddSource("MyService.*")
.SetSampler(new TraceIdRatioBasedSampler(0.01)) // 1% head-based
.AddOtlpExporter())
.WithMetrics(m => m
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation() // GC, thread pool
.AddProcessInstrumentation()
.AddMeter("MyService.*")
.AddOtlpExporter());
// Logging through Serilog with OTel sink
builder.Host.UseSerilog((ctx, lc) => lc
.ReadFrom.Configuration(ctx.Configuration)
.Enrich.FromLogContext()
.WriteTo.Console()
.WriteTo.OpenTelemetry(o => o.Endpoint = "http://collector:4317"));
Three details. The default ASP.NET Core instrumentation gives you the entire RED method without writing code. EF Core instrumentation includes the SQL text - sanitise PII out of it before exporting. Sampling at 1% head-based is the most cost-effective default; combine with tail sampling on errors for richer error traces.
What custom traces and metrics should I add?
Three types worth the effort:
// Custom span around a business operation
using var activity = ActivitySource.StartActivity("checkout.process");
activity?.SetTag("order.id", orderId);
activity?.SetTag("order.amount", amount);
// ... business code
// Custom metric for a business KPI
private static readonly Meter Meter = new("MyService.Business");
private static readonly Counter<long> OrdersPlaced =
Meter.CreateCounter<long>("orders_placed_total");
OrdersPlaced.Add(1, new KeyValuePair<string, object?>("payment_method", "stripe"));
// Structured log with correlation
log.LogInformation(
"Order {OrderId} placed by user {UserId} for {Amount:0.00}",
orderId, userId, amount);
The ActivitySource ties the custom span into the trace tree
automatically; the Meter exposes a counter that Prometheus scrapes.
The structured log preserves the correlation ID injected by ASP.NET
Core - Grafana can jump from a trace to its logs.
How do I turn observability into useful alerts?
Three rules.
One: alert on user-visible symptoms, not internal causes. "5xx rate above 0.5% for 5 minutes" is good. "Garbage collection time above 1%" is not - it might be fine, depending on the workload. SRE's "symptom-based alerting" is the right framing. The signals from the caching layer - hit rate, miss-per-second - are the right type: visible to the user as latency.
Two: alert on SLO burn rate, not raw thresholds. A 1% error rate is fine if your SLO is 99% over 30 days; it is a fire if your SLO is 99.99%. Compute the burn rate against the SLO and alert when you are spending error budget too fast.
Three: every alert must have a runbook. If the on-call engineer cannot answer "what do I do?", the alert is noise. Pair every alert with a Confluence page or wiki link explaining diagnosis and fix.
What failure modes does observability itself introduce?
- Cardinality explosion - a metric label with
userIdproduces one series per user, melting Prometheus. Mitigation: never label with high-cardinality fields; aggregate. - Trace sampling bias - 1% sampling misses rare slow requests. Mitigation: tail-based sampling that keeps all errors and all slow traces.
- Log PII leakage - logging full request bodies dumps emails,
tokens, payment data. Mitigation: structured logging with explicit
fields; no
LogInformation("Body: {Body}", body). - Observability outage drops alerts - your alert pipeline fails silently and you do not know your service is down. Mitigation: dead-man's-switch alerts (the alert should fire daily; if it doesn't, something is wrong).
When is observability investment premature?
When the service is single-instance, low-traffic, and not in production. A local dev box reading console logs is fine. Add OpenTelemetry the day you deploy to staging - not before, not after. Wiring it into a service that already serves users is harder than wiring it from day one. The QPS estimate from chapter 2 tells you when "production" is real enough to require this.
Where should you go from here?
Next chapter: rate limiting in .NET - the second ops building block. Together with observability, rate limiting is what keeps your service alive when something upstream goes wrong. After that, the case-study chapters compose every block from foundations through ops into real, complete designs.