OpenTelemetry — The Open Observability Standard Dominating Distributed Systems

Posted on: 4/27/2026 1:11:58 PM

95% Adoption rate for new cloud-native (2026)
12+ Programming languages with SDK support
1.0 Declarative Config reaches Stable
v0.8 OBI (eBPF) Beta — KubeCon EU 2026

1. What is OpenTelemetry and Why Should You Care?

OpenTelemetry (OTel) is a CNCF (Cloud Native Computing Foundation) open-source project that provides standardized APIs, SDKs, and tools for collecting telemetry data — including traces, metrics, and logs — from distributed applications and infrastructure.

Before OTel, each observability vendor (Datadog, New Relic, Dynatrace, Jaeger, Zipkin...) had its own proprietary agent and SDK. This created severe vendor lock-in: switching backends meant rewriting instrumentation code across every service. OTel solves this by providing a vendor-neutral abstraction layer — you instrument once and export to any backend via the OTLP (OpenTelemetry Protocol) standard.

Why 2026 is the Golden Moment

April 2026 marks two major milestones: Declarative Configuration officially reaching stable (v1.0.0) and OBI (eBPF Instrumentation) launching in beta at KubeCon EU. Combined with .NET 10 auto-instrumentation v1.15.0, OTel is now production-ready across all major languages.

2. The Three Pillars: Traces, Metrics, Logs

OTel unifies the three most critical signal types in observability:

graph LR
    subgraph Signals["Three OTel Pillars"]
        T["🔍 Traces
Track requests
across services"] M["📊 Metrics
Measure performance
over time"] L["📝 Logs
Detailed events
with context"] end T -->|"trace_id"| C["Correlation
Engine"] M -->|"exemplar"| C L -->|"trace_id"| C C --> I["Complete
Insight"] style T fill:#e94560,stroke:#fff,color:#fff style M fill:#2c3e50,stroke:#fff,color:#fff style L fill:#4CAF50,stroke:#fff,color:#fff style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50

OpenTelemetry's three observability pillars, correlated via trace_id

Traces (Distributed Tracing)

A trace records the complete journey of a request as it travels through multiple services. Each trace contains multiple spans — each span represents a unit of work (API call, database query, message queue processing...). Spans are linked via trace_id and parent_span_id, forming a call tree.

Metrics

Metrics are numerical measurements over time: request latency (histogram), error rate (counter), active connections (gauge), CPU usage... OTel supports both push (OTLP export) and pull (Prometheus scrape) models.

Logs

OTel doesn't replace existing logging frameworks (Serilog, NLog, log4j...) but adds context — injecting trace_id and span_id into every log entry. When a request fails, you can jump straight from trace to detailed logs without manual grep.

3. OpenTelemetry Architecture Overview

graph TB
    subgraph Apps["Applications"]
        A1["Service A
.NET 10"] A2["Service B
Node.js"] A3["Service C
Go"] end subgraph SDK["OTel SDK / Auto-Instrumentation"] S1["SDK .NET"] S2["SDK JS"] S3["SDK Go"] end subgraph OBI_Layer["OBI (eBPF)"] OBI["Zero-Code
eBPF Probes"] end A1 --> S1 A2 --> S2 A3 --> S3 A1 -.->|"kernel hooks"| OBI A2 -.->|"kernel hooks"| OBI A3 -.->|"kernel hooks"| OBI S1 -->|"OTLP"| COL["OTel Collector"] S2 -->|"OTLP"| COL S3 -->|"OTLP"| COL OBI -->|"OTLP"| COL subgraph Backends["Backends"] J["Jaeger / Tempo"] P["Prometheus / Mimir"] L["Loki / Elasticsearch"] end COL --> J COL --> P COL --> L J --> G["Grafana Dashboard"] P --> G L --> G style A1 fill:#e94560,stroke:#fff,color:#fff style A2 fill:#2c3e50,stroke:#fff,color:#fff style A3 fill:#4CAF50,stroke:#fff,color:#fff style OBI fill:#ff9800,stroke:#fff,color:#fff style COL fill:#16213e,stroke:#fff,color:#fff style G fill:#e94560,stroke:#fff,color:#fff

End-to-end architecture: SDK + OBI → Collector → Backends → Visualization

The OTel architecture consists of 4 main layers:

  • Instrumentation Layer: SDK integrated into code (manual/auto) or OBI hooking at the kernel level
  • OTLP Protocol: Standard transport protocol (gRPC or HTTP/protobuf)
  • Collector: Receives, processes (filter, enrich, batch), then exports telemetry
  • Backend: Storage and query — Jaeger, Tempo, Prometheus, Loki, Elasticsearch...

4. Declarative Configuration — Now Stable

In April 2026, OpenTelemetry officially marked Declarative Configuration as stable (v1.0.0). This allows configuring OTel via a YAML file instead of dozens of scattered environment variables.

Key Benefit

A single YAML file replaces dozens of environment variables. Easy to version control, easy to review, easy to replicate across environments.

YAML Configuration Example

# otel-config.yaml — Declarative Configuration v1.0
file_format: "0.4"

tracer_provider:
  processors:
    - batch:
        schedule_delay: 5000
        export_timeout: 30000
        max_queue_size: 2048
        max_export_batch_size: 512
      exporter:
        otlp:
          protocol: grpc
          endpoint: http://otel-collector:4317
          compression: gzip

meter_provider:
  readers:
    - periodic:
        interval: 60000
        exporter:
          otlp:
            protocol: grpc
            endpoint: http://otel-collector:4317

logger_provider:
  processors:
    - batch:
        exporter:
          otlp:
            protocol: grpc
            endpoint: http://otel-collector:4317

resource:
  attributes:
    service.name: my-api
    service.version: "2.1.0"
    deployment.environment: production

Activate with a single environment variable:

export OTEL_CONFIG_FILE=/etc/otel/otel-config.yaml

Stable Components

Component Description Status
JSON Schema (opentelemetry-configuration) Data model schema version 1.0.0 ✅ Stable
YAML Representation File-based configuration format ✅ Stable
In-Memory Model SDK in-memory config representation ✅ Stable
ConfigProperties Generic YAML mapping node ✅ Stable
PluginComponentProvider Custom plugin reference mechanism ✅ Stable
OTEL_CONFIG_FILE Activation environment variable ✅ Stable

Language Support

Language Declarative Config Status
Java✅ Complete (agent + SDK)
Go✅ Complete (Collector internal)
C++✅ Complete
JavaScript✅ Complete
PHP✅ Complete
.NET🔄 In development
Python🔄 In development

5. OBI: eBPF Instrumentation — Zero-Code Observability

OBI (OpenTelemetry eBPF Instrumentation) is the biggest leap in observability for 2026. Inherited from Grafana Beyla (donated to OTel in late 2025), OBI uses eBPF to hook directly into the Linux kernel — collecting traces and metrics without modifying a single line of code.

graph TB
    subgraph KernelSpace["Kernel Space"]
        UP["uprobes
SSL_read/SSL_write"] KP["kprobes
tcp_sendmsg/recvmsg"] TP["tracepoints
scheduling, fs events"] end subgraph Maps["eBPF Maps"] PEA["perf_event_array"] RB["ring_buffer"] end subgraph UserSpace["User Space — OBI Agent (Go)"] MR["Map Reader"] SB["Span Builder"] FE["Filter & Enrich"] EX["OTLP Exporter"] end UP --> PEA KP --> PEA TP --> RB PEA --> MR RB --> MR MR --> SB SB --> FE FE --> EX EX -->|"gRPC/HTTP"| COL["OTel Collector"] style UP fill:#e94560,stroke:#fff,color:#fff style KP fill:#e94560,stroke:#fff,color:#fff style TP fill:#e94560,stroke:#fff,color:#fff style PEA fill:#2c3e50,stroke:#fff,color:#fff style RB fill:#2c3e50,stroke:#fff,color:#fff style EX fill:#4CAF50,stroke:#fff,color:#fff style COL fill:#16213e,stroke:#fff,color:#fff

OBI's two-layer architecture: kernel probes → user-space agent → OTLP export

How Does OBI Work?

OBI operates on a two-layer model:

  • Kernel Space: uprobes intercept SSL_read/SSL_write (reading TLS traffic), kprobes monitor tcp_sendmsg/tcp_recvmsg, tracepoints record scheduling and filesystem events
  • User Space: A Go agent reads data from eBPF maps, builds spans, applies filtering/enrichment, then exports via OTLP

System Requirements

Requirement Details
KernelLinux 5.8+ (RHEL/Rocky 4.18+ with backport); BTF mandatory
Architectureamd64, arm64 (Graviton, Ampere)
Privilegesroot or CAP_BPF + CAP_SYS_PTRACE
Pod confighostPID: true
Resources (typical)CPU: 100m–500m, Memory: 256Mi–512Mi

Supported Protocols

Application Layer

HTTP/gRPC with automatic RED metrics, TLS-encrypted traffic (kernel-level SSL hooks), protocol-agnostic span generation

Database Protocols

PostgreSQL (pgx), MySQL, MongoDB, Redis, Couchbase — native server spans without ORM instrumentation

Emerging (planned for 1.0)

GenAI APIs (OpenAI, Anthropic), Message brokers (MQTT, AMQP, NATS, Redis Pub/Sub)

SDK vs. OBI — When to Use What?

Criteria SDK (Traditional) OBI (eBPF)
Code changesRequired per serviceNone — kernel-level hooks
Third-party binariesNo visibilityAutomatic visibility
Custom business events✅ Flexible❌ Protocol-level only
Payload accessApp-definedKernel-level SSL/DB capture
Deploy workflowRebuild + redeployConfig-driven, DaemonSet
OS supportAny OSLinux only (kernel 5.8+)

Important Note

OBI complements rather than replaces SDK instrumentation. In production, run both in parallel: OBI for automatic infrastructure visibility, SDK for business-level events and custom spans.

6. OpenTelemetry Integration in .NET 10 / ASP.NET Core

.NET has one of the best OTel support ecosystems thanks to System.Diagnostics — the native foundation that the OTel .NET SDK builds upon. With auto-instrumentation v1.15.0 (April 2026), you can instrument an ASP.NET Core app in minutes.

Basic Setup

// Program.cs — .NET 10 + OpenTelemetry
using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;
using OpenTelemetry.Logs;
using OpenTelemetry.Resources;

var builder = WebApplication.CreateBuilder(args);

// Resource: identify the service
var resource = ResourceBuilder.CreateDefault()
    .AddService("my-api", serviceVersion: "2.1.0");

// Traces
builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService("my-api"))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSqlClientInstrumentation(o => o.SetDbStatementForText = true)
        .AddRedisInstrumentation()
        .AddOtlpExporter(o =>
        {
            o.Endpoint = new Uri("http://otel-collector:4317");
            o.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.Grpc;
        }))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddOtlpExporter())
    .WithLogging(logging => logging
        .AddOtlpExporter());

var app = builder.Build();
app.MapControllers();
app.Run();

Zero-Code Auto-Instrumentation

If you don't want to modify Program.cs, use zero-code instrumentation via environment variables:

# Dockerfile or docker-compose.yml
ENV CORECLR_ENABLE_PROFILING=1
ENV CORECLR_PROFILER={918728DD-259F-4A6A-AC2B-B85E1B658318}
ENV CORECLR_PROFILER_PATH=/otel-dotnet/linux-x64/OpenTelemetry.AutoInstrumentation.Native.so
ENV DOTNET_ADDITIONAL_DEPS=/otel-dotnet/AdditionalDeps
ENV DOTNET_SHARED_STORE=/otel-dotnet/store
ENV OTEL_DOTNET_AUTO_HOME=/otel-dotnet
ENV OTEL_SERVICE_NAME=my-api
ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

Custom Spans for Business Logic

using System.Diagnostics;

public class OrderService
{
    private static readonly ActivitySource Source = new("MyApp.OrderService");

    public async Task<Order> PlaceOrderAsync(OrderRequest request)
    {
        using var activity = Source.StartActivity("PlaceOrder");
        activity?.SetTag("order.customer_id", request.CustomerId);
        activity?.SetTag("order.items_count", request.Items.Count);

        var order = await _repository.CreateAsync(request);

        activity?.SetTag("order.id", order.Id);
        activity?.SetTag("order.total", order.Total);
        activity?.AddEvent(new ActivityEvent("OrderCreated"));

        return order;
    }
}

7. OpenTelemetry Collector — The Telemetry Processing Hub

The OTel Collector is an essential production component. It acts as a gateway — receiving telemetry from multiple sources, processing (filter, transform, batch, sample), then exporting to multiple backends.

graph LR
    subgraph Receivers["Receivers"]
        R1["OTLP
(gRPC/HTTP)"] R2["Prometheus
scrape"] R3["Jaeger
thrift"] end subgraph Processors["Processors"] P1["Batch"] P2["Filter"] P3["Attributes
Enrich"] P4["Tail Sampling"] end subgraph Exporters["Exporters"] E1["OTLP → Tempo"] E2["Prometheus
Remote Write"] E3["Loki"] end R1 --> P1 R2 --> P1 R3 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> E1 P4 --> E2 P4 --> E3 style R1 fill:#e94560,stroke:#fff,color:#fff style R2 fill:#2c3e50,stroke:#fff,color:#fff style R3 fill:#4CAF50,stroke:#fff,color:#fff style P4 fill:#ff9800,stroke:#fff,color:#fff style E1 fill:#16213e,stroke:#fff,color:#fff style E2 fill:#16213e,stroke:#fff,color:#fff style E3 fill:#16213e,stroke:#fff,color:#fff

Collector pipeline: Receivers → Processors → Exporters

Production Collector Configuration

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    send_batch_size: 1024
    timeout: 5s

  filter:
    error_mode: ignore
    traces:
      span:
        - 'attributes["http.route"] == "/health"'
        - 'attributes["http.route"] == "/ready"'

  tail_sampling:
    decision_wait: 10s
    policies:
      - name: error-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: latency-policy
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

  resource:
    attributes:
      - key: deployment.environment
        value: production
        action: upsert

exporters:
  otlp/tempo:
    endpoint: tempo:4317
    tls:
      insecure: true

  prometheusremotewrite:
    endpoint: http://mimir:9009/api/v1/push

  loki:
    endpoint: http://loki:3100/loki/api/v1/push

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, filter, tail_sampling, resource]
      exporters: [otlp/tempo]
    metrics:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [prometheusremotewrite]
    logs:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [loki]

Tail Sampling — Cost Optimization Strategy

Instead of random head sampling at the SDK level, tail sampling at the Collector lets you keep 100% of error traces and 100% of slow traces (>1s), while only sampling 10% of normal traces. Dramatically reduces storage costs without losing critical information.

8. Production-Grade Deployment Strategy

Deployment Patterns

graph TB
    subgraph Pattern1["Pattern 1: Sidecar"]
        P1A["App Container"] --> P1C["Collector Sidecar"]
        P1C --> P1B["Backend"]
    end

    subgraph Pattern2["Pattern 2: DaemonSet"]
        P2A1["App Pod 1"] --> P2C["Collector
DaemonSet"] P2A2["App Pod 2"] --> P2C P2C --> P2B["Backend"] end subgraph Pattern3["Pattern 3: Gateway"] P3A["Collector
DaemonSet"] --> P3G["Collector
Gateway"] P3G --> P3B1["Backend 1"] P3G --> P3B2["Backend 2"] end style P1C fill:#e94560,stroke:#fff,color:#fff style P2C fill:#e94560,stroke:#fff,color:#fff style P3G fill:#e94560,stroke:#fff,color:#fff

Three common deployment patterns for OTel Collector

Pattern Pros Cons Best For
Sidecar Good isolation, per-service config High resource overhead Multi-tenant, strict compliance
DaemonSet Resource efficient, easy to manage Shared config, noisy neighbor Most K8s workloads
Gateway Central control, multi-backend routing Single point of failure Large orgs, multiple backends

Production Deployment Checklist

  1. Start with auto-instrumentation — SDK or zero-code. No need for custom spans right away
  2. Deploy Collector before backend — always go through the Collector, never export directly from SDK to backend
  3. Configure tail sampling early — keep 100% error traces, sample normal traces to control costs
  4. Filter health check spans — /health, /ready, /metrics create enormous noise
  5. Add resource attributes — service.name, service.version, deployment.environment are mandatory
  6. Monitor the monitor — the Collector itself needs observability (self-telemetry, /healthz endpoint)
  7. Consider OBI — run in parallel for third-party services and baseline visibility

9. Comparing OTel with Other Solutions

Criteria OpenTelemetry Datadog Agent Elastic APM AWS X-Ray
Vendor lock-in❌ None✅ HighMedium✅ AWS only
License costFree (OSS)PaidBasic freePay per usage
Language support12+ languages10+ languages7 languages5 languages
eBPF instrumentation✅ OBI✅ USM
Declarative config✅ StableCustom YAMLCustom YAMLCustom JSON
CommunityCNCF, 1000+ contributorsProprietaryElastic communityAWS only
Backend flexibilityAny OTLP backendDatadog onlyPrimarily ElasticAWS only

10. Conclusion

OpenTelemetry has evolved from a "promising project" to an indispensable standard for every distributed system. With stable declarative configuration, OBI eBPF instrumentation in beta, and a mature SDK ecosystem across all major languages — the cost of adopting OTel has never been lower, while the value it delivers grows clearer by the day.

Whether you're building on .NET, Node.js, Go, or any other stack — start with auto-instrumentation + Collector. Once you have baseline visibility, add custom spans for business events and consider OBI for the infrastructure layer. Observability isn't a luxury — it's the foundation for operating distributed systems effectively.

Action Summary

  • Step 1: Install OTel SDK / auto-instrumentation for your current stack
  • Step 2: Deploy OTel Collector (DaemonSet or sidecar)
  • Step 3: Configure tail sampling + filter health checks
  • Step 4: Connect to a backend (Grafana LGTM stack is free for self-hosting)
  • Step 5: Try OBI on a Linux cluster for zero-code visibility

References