Observability & Ops Intermediate 5 min read

OpenTelemetry in .NET: Traces, Metrics, Logs

Q: OpenTelemetry vs Serilog vs Application Insights?

Not vs - layered. OpenTelemetry is the *protocol* and *instrumentation*. Serilog is a logging library that emits via OTel. Application Insights is a *backend* that ingests OTel data. The choice is mostly 'which backend' (App Insights, Datadog, Honeycomb, Grafana Cloud, self-hosted Jaeger+Prometheus). Code stays the same; OTel decouples instrumentation from vendor.

How to wire OpenTelemetry into ASP.NET Core for distributed traces, metrics, structured logs. The single instrumentation pipeline that powers alerts.

Phùng Anh Tú · May 18, 2026

Table of contents

When does observability become non-negotiable?
What numbers should I budget for the observability tier?
What does the minimal observability pipeline look like?
What is the .NET 10 wiring for OpenTelemetry?
What custom traces and metrics should I add?
How do I turn observability into useful alerts?
What failure modes does observability itself introduce?
When is observability investment premature?
Where should you go from here?

The first time you have to debug a production issue without observability, you understand why every chapter in this series quotes a metric. Without traces, metrics, and logs you are guessing. With them, the post-mortem writes itself. This chapter wires OpenTelemetry into ASP.NET Core in the shape that has become the 2026 standard.

When does observability become non-negotiable?

Three signals.

The service is in production. Even one customer means an outage costs reputation. Without metrics you have no alert; without traces you have no fix; without logs you have no explanation.

The service has more than one instance. Logging to a local file on each box stops working the moment requests are spread across replicas. You need a central pipeline.

The service depends on other services. A slow checkout that turns out to be a slow third-party payment provider takes hours to diagnose without distributed traces and minutes with them.

If your code is a one-off script that runs once and you read the console, you do not need this chapter. Anything else, you do.

What numbers should I budget for the observability tier?

Signal     Volume per req     Cost driver           Storage cost
Metrics    ~constant          unique label values   cheap
Traces     1 trace            sampling rate         expensive
Logs       ~5-20 lines        line size + count     medium

For a service at 10K QPS:

Metrics: a few thousand series x 30s scrape = trivial.
Traces at 100% = 36M trace/hour. With 1% sampling = 360K, much more affordable.
Logs at 10/req = 100K log/s. Structured logs at 200 bytes each = 20 MB/s = 1.7 TB/day. Filter and sample.

Never log at INFO level on the hot path. WARN and above for steady state; DEBUG only when investigating.

What does the minimal observability pipeline look like?

flowchart LR
    App[ASP.NET Core] -->|OTel SDK| Collector[OTel Collector]
    Collector -->|metrics| Prom[(Prometheus)]
    Collector -->|traces| Jaeger[(Jaeger / Tempo)]
    Collector -->|logs| Loki[(Loki / OpenSearch)]
    Prom --> Grafana
    Jaeger --> Grafana
    Loki --> Grafana
    Grafana --> Alert[Alerts -> PagerDuty]

The application emits via OTel; the Collector demultiplexes by signal type into purpose-built backends; Grafana is the unified dashboard. The same pipeline supports Datadog, Honeycomb, Azure Monitor - swap the backend without touching app code.

What is the .NET 10 wiring for OpenTelemetry?

One block in Program.cs:

builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService(
        serviceName: builder.Environment.ApplicationName,
        serviceVersion: Assembly.GetExecutingAssembly().GetName().Version?.ToString()))
    .WithTracing(t => t
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddEntityFrameworkCoreInstrumentation()
        .AddSource("MyService.*")
        .SetSampler(new TraceIdRatioBasedSampler(0.01))   // 1% head-based
        .AddOtlpExporter())
    .WithMetrics(m => m
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()                      // GC, thread pool
        .AddProcessInstrumentation()
        .AddMeter("MyService.*")
        .AddOtlpExporter());

// Logging through Serilog with OTel sink
builder.Host.UseSerilog((ctx, lc) => lc
    .ReadFrom.Configuration(ctx.Configuration)
    .Enrich.FromLogContext()
    .WriteTo.Console()
    .WriteTo.OpenTelemetry(o => o.Endpoint = "http://collector:4317"));

Three details. The default ASP.NET Core instrumentation gives you the entire RED method without writing code. EF Core instrumentation includes the SQL text - sanitise PII out of it before exporting. Sampling at 1% head-based is the most cost-effective default; combine with tail sampling on errors for richer error traces.

What custom traces and metrics should I add?

Three types worth the effort:

// Custom span around a business operation
using var activity = ActivitySource.StartActivity("checkout.process");
activity?.SetTag("order.id", orderId);
activity?.SetTag("order.amount", amount);
// ... business code

// Custom metric for a business KPI
private static readonly Meter Meter = new("MyService.Business");
private static readonly Counter<long> OrdersPlaced =
    Meter.CreateCounter<long>("orders_placed_total");

OrdersPlaced.Add(1, new KeyValuePair<string, object?>("payment_method", "stripe"));

// Structured log with correlation
log.LogInformation(
    "Order {OrderId} placed by user {UserId} for {Amount:0.00}",
    orderId, userId, amount);

The ActivitySource ties the custom span into the trace tree automatically; the Meter exposes a counter that Prometheus scrapes. The structured log preserves the correlation ID injected by ASP.NET Core - Grafana can jump from a trace to its logs.

How do I turn observability into useful alerts?

Three rules.

One: alert on user-visible symptoms, not internal causes. "5xx rate above 0.5% for 5 minutes" is good. "Garbage collection time above 1%" is not - it might be fine, depending on the workload. SRE's "symptom-based alerting" is the right framing. The signals from the caching layer - hit rate, miss-per-second - are the right type: visible to the user as latency.

Two: alert on SLO burn rate, not raw thresholds. A 1% error rate is fine if your SLO is 99% over 30 days; it is a fire if your SLO is 99.99%. Compute the burn rate against the SLO and alert when you are spending error budget too fast.

Three: every alert must have a runbook. If the on-call engineer cannot answer "what do I do?", the alert is noise. Pair every alert with a Confluence page or wiki link explaining diagnosis and fix.

What failure modes does observability itself introduce?

Cardinality explosion - a metric label with userId produces one series per user, melting Prometheus. Mitigation: never label with high-cardinality fields; aggregate.
Trace sampling bias - 1% sampling misses rare slow requests. Mitigation: tail-based sampling that keeps all errors and all slow traces.
Log PII leakage - logging full request bodies dumps emails, tokens, payment data. Mitigation: structured logging with explicit fields; no LogInformation("Body: {Body}", body).
Observability outage drops alerts - your alert pipeline fails silently and you do not know your service is down. Mitigation: dead-man's-switch alerts (the alert should fire daily; if it doesn't, something is wrong).

When is observability investment premature?

When the service is single-instance, low-traffic, and not in production. A local dev box reading console logs is fine. Add OpenTelemetry the day you deploy to staging - not before, not after. Wiring it into a service that already serves users is harder than wiring it from day one. The QPS estimate from chapter 2 tells you when "production" is real enough to require this.

Where should you go from here?

Next chapter: rate limiting in .NET - the second ops building block. Together with observability, rate limiting is what keeps your service alive when something upstream goes wrong. After that, the case-study chapters compose every block from foundations through ops into real, complete designs.

Frequently asked questions

Three signals - which one matters most?

Metrics for alerting (rate, latency p99, error count - cheap to aggregate, fast to query). Traces for debugging (where did this request slow down). Logs for context (the exact exception, the input that broke). Most teams over-invest in logs and under-invest in metrics; the right ratio is 'metrics first, traces second, logs as supporting evidence'.

OpenTelemetry vs Serilog vs Application Insights?

Not vs - layered. OpenTelemetry is the protocol and instrumentation. Serilog is a logging library that emits via OTel. Application Insights is a backend that ingests OTel data. The choice is mostly 'which backend' (App Insights, Datadog, Honeycomb, Grafana Cloud, self-hosted Jaeger+Prometheus). Code stays the same; OTel decouples instrumentation from vendor.

What metrics should I emit by default?

RED for every endpoint: Rate (req/s), Errors (req/s with status >=500), Duration (latency histogram). Plus saturation signals: CPU, memory, GC pause time, thread pool queue length. The .NET instrumentation packages emit all of this for free. Custom metrics are for business KPIs (orders/min, signups/hour) - rarely needed for technical alerting.

How much does observability cost in production?

Mostly storage. Traces are expensive at scale - 1 trace per request at 10K QPS is 25M traces/hour. Sampling cuts that 100x with little loss; 1% head-based sampling plus error sampling at 100% is the standard. Metrics are cheap (aggregated by the time they hit storage). Logs are middle - structured, sampled at the source for chatty paths.