OpenTelemetry — The Open Observability Standard Dominating Distributed Systems
Posted on: 4/27/2026 1:11:58 PM
Table of contents
- Table of Contents
- 1. What is OpenTelemetry and Why Should You Care?
- 2. The Three Pillars: Traces, Metrics, Logs
- 3. OpenTelemetry Architecture Overview
- 4. Declarative Configuration — Now Stable
- 5. OBI: eBPF Instrumentation — Zero-Code Observability
- 6. OpenTelemetry Integration in .NET 10 / ASP.NET Core
- 7. OpenTelemetry Collector — The Telemetry Processing Hub
- 8. Production-Grade Deployment Strategy
- 9. Comparing OTel with Other Solutions
- 10. Conclusion
1. What is OpenTelemetry and Why Should You Care?
OpenTelemetry (OTel) is a CNCF (Cloud Native Computing Foundation) open-source project that provides standardized APIs, SDKs, and tools for collecting telemetry data — including traces, metrics, and logs — from distributed applications and infrastructure.
Before OTel, each observability vendor (Datadog, New Relic, Dynatrace, Jaeger, Zipkin...) had its own proprietary agent and SDK. This created severe vendor lock-in: switching backends meant rewriting instrumentation code across every service. OTel solves this by providing a vendor-neutral abstraction layer — you instrument once and export to any backend via the OTLP (OpenTelemetry Protocol) standard.
Why 2026 is the Golden Moment
April 2026 marks two major milestones: Declarative Configuration officially reaching stable (v1.0.0) and OBI (eBPF Instrumentation) launching in beta at KubeCon EU. Combined with .NET 10 auto-instrumentation v1.15.0, OTel is now production-ready across all major languages.
2. The Three Pillars: Traces, Metrics, Logs
OTel unifies the three most critical signal types in observability:
graph LR
subgraph Signals["Three OTel Pillars"]
T["🔍 Traces
Track requests
across services"]
M["📊 Metrics
Measure performance
over time"]
L["📝 Logs
Detailed events
with context"]
end
T -->|"trace_id"| C["Correlation
Engine"]
M -->|"exemplar"| C
L -->|"trace_id"| C
C --> I["Complete
Insight"]
style T fill:#e94560,stroke:#fff,color:#fff
style M fill:#2c3e50,stroke:#fff,color:#fff
style L fill:#4CAF50,stroke:#fff,color:#fff
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style I fill:#f8f9fa,stroke:#e94560,color:#2c3e50
OpenTelemetry's three observability pillars, correlated via trace_id
Traces (Distributed Tracing)
A trace records the complete journey of a request as it travels through multiple services. Each trace contains multiple spans — each span represents a unit of work (API call, database query, message queue processing...). Spans are linked via trace_id and parent_span_id, forming a call tree.
Metrics
Metrics are numerical measurements over time: request latency (histogram), error rate (counter), active connections (gauge), CPU usage... OTel supports both push (OTLP export) and pull (Prometheus scrape) models.
Logs
OTel doesn't replace existing logging frameworks (Serilog, NLog, log4j...) but adds context — injecting trace_id and span_id into every log entry. When a request fails, you can jump straight from trace to detailed logs without manual grep.
3. OpenTelemetry Architecture Overview
graph TB
subgraph Apps["Applications"]
A1["Service A
.NET 10"]
A2["Service B
Node.js"]
A3["Service C
Go"]
end
subgraph SDK["OTel SDK / Auto-Instrumentation"]
S1["SDK .NET"]
S2["SDK JS"]
S3["SDK Go"]
end
subgraph OBI_Layer["OBI (eBPF)"]
OBI["Zero-Code
eBPF Probes"]
end
A1 --> S1
A2 --> S2
A3 --> S3
A1 -.->|"kernel hooks"| OBI
A2 -.->|"kernel hooks"| OBI
A3 -.->|"kernel hooks"| OBI
S1 -->|"OTLP"| COL["OTel Collector"]
S2 -->|"OTLP"| COL
S3 -->|"OTLP"| COL
OBI -->|"OTLP"| COL
subgraph Backends["Backends"]
J["Jaeger / Tempo"]
P["Prometheus / Mimir"]
L["Loki / Elasticsearch"]
end
COL --> J
COL --> P
COL --> L
J --> G["Grafana Dashboard"]
P --> G
L --> G
style A1 fill:#e94560,stroke:#fff,color:#fff
style A2 fill:#2c3e50,stroke:#fff,color:#fff
style A3 fill:#4CAF50,stroke:#fff,color:#fff
style OBI fill:#ff9800,stroke:#fff,color:#fff
style COL fill:#16213e,stroke:#fff,color:#fff
style G fill:#e94560,stroke:#fff,color:#fff
End-to-end architecture: SDK + OBI → Collector → Backends → Visualization
The OTel architecture consists of 4 main layers:
- Instrumentation Layer: SDK integrated into code (manual/auto) or OBI hooking at the kernel level
- OTLP Protocol: Standard transport protocol (gRPC or HTTP/protobuf)
- Collector: Receives, processes (filter, enrich, batch), then exports telemetry
- Backend: Storage and query — Jaeger, Tempo, Prometheus, Loki, Elasticsearch...
4. Declarative Configuration — Now Stable
In April 2026, OpenTelemetry officially marked Declarative Configuration as stable (v1.0.0). This allows configuring OTel via a YAML file instead of dozens of scattered environment variables.
Key Benefit
A single YAML file replaces dozens of environment variables. Easy to version control, easy to review, easy to replicate across environments.
YAML Configuration Example
# otel-config.yaml — Declarative Configuration v1.0
file_format: "0.4"
tracer_provider:
processors:
- batch:
schedule_delay: 5000
export_timeout: 30000
max_queue_size: 2048
max_export_batch_size: 512
exporter:
otlp:
protocol: grpc
endpoint: http://otel-collector:4317
compression: gzip
meter_provider:
readers:
- periodic:
interval: 60000
exporter:
otlp:
protocol: grpc
endpoint: http://otel-collector:4317
logger_provider:
processors:
- batch:
exporter:
otlp:
protocol: grpc
endpoint: http://otel-collector:4317
resource:
attributes:
service.name: my-api
service.version: "2.1.0"
deployment.environment: production
Activate with a single environment variable:
export OTEL_CONFIG_FILE=/etc/otel/otel-config.yaml
Stable Components
| Component | Description | Status |
|---|---|---|
| JSON Schema (opentelemetry-configuration) | Data model schema version 1.0.0 | ✅ Stable |
| YAML Representation | File-based configuration format | ✅ Stable |
| In-Memory Model | SDK in-memory config representation | ✅ Stable |
| ConfigProperties | Generic YAML mapping node | ✅ Stable |
| PluginComponentProvider | Custom plugin reference mechanism | ✅ Stable |
| OTEL_CONFIG_FILE | Activation environment variable | ✅ Stable |
Language Support
| Language | Declarative Config Status |
|---|---|
| Java | ✅ Complete (agent + SDK) |
| Go | ✅ Complete (Collector internal) |
| C++ | ✅ Complete |
| JavaScript | ✅ Complete |
| PHP | ✅ Complete |
| .NET | 🔄 In development |
| Python | 🔄 In development |
5. OBI: eBPF Instrumentation — Zero-Code Observability
OBI (OpenTelemetry eBPF Instrumentation) is the biggest leap in observability for 2026. Inherited from Grafana Beyla (donated to OTel in late 2025), OBI uses eBPF to hook directly into the Linux kernel — collecting traces and metrics without modifying a single line of code.
graph TB
subgraph KernelSpace["Kernel Space"]
UP["uprobes
SSL_read/SSL_write"]
KP["kprobes
tcp_sendmsg/recvmsg"]
TP["tracepoints
scheduling, fs events"]
end
subgraph Maps["eBPF Maps"]
PEA["perf_event_array"]
RB["ring_buffer"]
end
subgraph UserSpace["User Space — OBI Agent (Go)"]
MR["Map Reader"]
SB["Span Builder"]
FE["Filter & Enrich"]
EX["OTLP Exporter"]
end
UP --> PEA
KP --> PEA
TP --> RB
PEA --> MR
RB --> MR
MR --> SB
SB --> FE
FE --> EX
EX -->|"gRPC/HTTP"| COL["OTel Collector"]
style UP fill:#e94560,stroke:#fff,color:#fff
style KP fill:#e94560,stroke:#fff,color:#fff
style TP fill:#e94560,stroke:#fff,color:#fff
style PEA fill:#2c3e50,stroke:#fff,color:#fff
style RB fill:#2c3e50,stroke:#fff,color:#fff
style EX fill:#4CAF50,stroke:#fff,color:#fff
style COL fill:#16213e,stroke:#fff,color:#fff
OBI's two-layer architecture: kernel probes → user-space agent → OTLP export
How Does OBI Work?
OBI operates on a two-layer model:
- Kernel Space: uprobes intercept SSL_read/SSL_write (reading TLS traffic), kprobes monitor tcp_sendmsg/tcp_recvmsg, tracepoints record scheduling and filesystem events
- User Space: A Go agent reads data from eBPF maps, builds spans, applies filtering/enrichment, then exports via OTLP
System Requirements
| Requirement | Details |
|---|---|
| Kernel | Linux 5.8+ (RHEL/Rocky 4.18+ with backport); BTF mandatory |
| Architecture | amd64, arm64 (Graviton, Ampere) |
| Privileges | root or CAP_BPF + CAP_SYS_PTRACE |
| Pod config | hostPID: true |
| Resources (typical) | CPU: 100m–500m, Memory: 256Mi–512Mi |
Supported Protocols
Application Layer
HTTP/gRPC with automatic RED metrics, TLS-encrypted traffic (kernel-level SSL hooks), protocol-agnostic span generation
Database Protocols
PostgreSQL (pgx), MySQL, MongoDB, Redis, Couchbase — native server spans without ORM instrumentation
Emerging (planned for 1.0)
GenAI APIs (OpenAI, Anthropic), Message brokers (MQTT, AMQP, NATS, Redis Pub/Sub)
SDK vs. OBI — When to Use What?
| Criteria | SDK (Traditional) | OBI (eBPF) |
|---|---|---|
| Code changes | Required per service | None — kernel-level hooks |
| Third-party binaries | No visibility | Automatic visibility |
| Custom business events | ✅ Flexible | ❌ Protocol-level only |
| Payload access | App-defined | Kernel-level SSL/DB capture |
| Deploy workflow | Rebuild + redeploy | Config-driven, DaemonSet |
| OS support | Any OS | Linux only (kernel 5.8+) |
Important Note
OBI complements rather than replaces SDK instrumentation. In production, run both in parallel: OBI for automatic infrastructure visibility, SDK for business-level events and custom spans.
6. OpenTelemetry Integration in .NET 10 / ASP.NET Core
.NET has one of the best OTel support ecosystems thanks to System.Diagnostics — the native foundation that the OTel .NET SDK builds upon. With auto-instrumentation v1.15.0 (April 2026), you can instrument an ASP.NET Core app in minutes.
Basic Setup
// Program.cs — .NET 10 + OpenTelemetry
using OpenTelemetry;
using OpenTelemetry.Trace;
using OpenTelemetry.Metrics;
using OpenTelemetry.Logs;
using OpenTelemetry.Resources;
var builder = WebApplication.CreateBuilder(args);
// Resource: identify the service
var resource = ResourceBuilder.CreateDefault()
.AddService("my-api", serviceVersion: "2.1.0");
// Traces
builder.Services.AddOpenTelemetry()
.ConfigureResource(r => r.AddService("my-api"))
.WithTracing(tracing => tracing
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSqlClientInstrumentation(o => o.SetDbStatementForText = true)
.AddRedisInstrumentation()
.AddOtlpExporter(o =>
{
o.Endpoint = new Uri("http://otel-collector:4317");
o.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.Grpc;
}))
.WithMetrics(metrics => metrics
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation()
.AddOtlpExporter())
.WithLogging(logging => logging
.AddOtlpExporter());
var app = builder.Build();
app.MapControllers();
app.Run();
Zero-Code Auto-Instrumentation
If you don't want to modify Program.cs, use zero-code instrumentation via environment variables:
# Dockerfile or docker-compose.yml
ENV CORECLR_ENABLE_PROFILING=1
ENV CORECLR_PROFILER={918728DD-259F-4A6A-AC2B-B85E1B658318}
ENV CORECLR_PROFILER_PATH=/otel-dotnet/linux-x64/OpenTelemetry.AutoInstrumentation.Native.so
ENV DOTNET_ADDITIONAL_DEPS=/otel-dotnet/AdditionalDeps
ENV DOTNET_SHARED_STORE=/otel-dotnet/store
ENV OTEL_DOTNET_AUTO_HOME=/otel-dotnet
ENV OTEL_SERVICE_NAME=my-api
ENV OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
Custom Spans for Business Logic
using System.Diagnostics;
public class OrderService
{
private static readonly ActivitySource Source = new("MyApp.OrderService");
public async Task<Order> PlaceOrderAsync(OrderRequest request)
{
using var activity = Source.StartActivity("PlaceOrder");
activity?.SetTag("order.customer_id", request.CustomerId);
activity?.SetTag("order.items_count", request.Items.Count);
var order = await _repository.CreateAsync(request);
activity?.SetTag("order.id", order.Id);
activity?.SetTag("order.total", order.Total);
activity?.AddEvent(new ActivityEvent("OrderCreated"));
return order;
}
}
7. OpenTelemetry Collector — The Telemetry Processing Hub
The OTel Collector is an essential production component. It acts as a gateway — receiving telemetry from multiple sources, processing (filter, transform, batch, sample), then exporting to multiple backends.
graph LR
subgraph Receivers["Receivers"]
R1["OTLP
(gRPC/HTTP)"]
R2["Prometheus
scrape"]
R3["Jaeger
thrift"]
end
subgraph Processors["Processors"]
P1["Batch"]
P2["Filter"]
P3["Attributes
Enrich"]
P4["Tail Sampling"]
end
subgraph Exporters["Exporters"]
E1["OTLP → Tempo"]
E2["Prometheus
Remote Write"]
E3["Loki"]
end
R1 --> P1
R2 --> P1
R3 --> P1
P1 --> P2
P2 --> P3
P3 --> P4
P4 --> E1
P4 --> E2
P4 --> E3
style R1 fill:#e94560,stroke:#fff,color:#fff
style R2 fill:#2c3e50,stroke:#fff,color:#fff
style R3 fill:#4CAF50,stroke:#fff,color:#fff
style P4 fill:#ff9800,stroke:#fff,color:#fff
style E1 fill:#16213e,stroke:#fff,color:#fff
style E2 fill:#16213e,stroke:#fff,color:#fff
style E3 fill:#16213e,stroke:#fff,color:#fff
Collector pipeline: Receivers → Processors → Exporters
Production Collector Configuration
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
send_batch_size: 1024
timeout: 5s
filter:
error_mode: ignore
traces:
span:
- 'attributes["http.route"] == "/health"'
- 'attributes["http.route"] == "/ready"'
tail_sampling:
decision_wait: 10s
policies:
- name: error-policy
type: status_code
status_code: {status_codes: [ERROR]}
- name: latency-policy
type: latency
latency: {threshold_ms: 1000}
- name: probabilistic-policy
type: probabilistic
probabilistic: {sampling_percentage: 10}
resource:
attributes:
- key: deployment.environment
value: production
action: upsert
exporters:
otlp/tempo:
endpoint: tempo:4317
tls:
insecure: true
prometheusremotewrite:
endpoint: http://mimir:9009/api/v1/push
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, filter, tail_sampling, resource]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [batch, resource]
exporters: [prometheusremotewrite]
logs:
receivers: [otlp]
processors: [batch, resource]
exporters: [loki]
Tail Sampling — Cost Optimization Strategy
Instead of random head sampling at the SDK level, tail sampling at the Collector lets you keep 100% of error traces and 100% of slow traces (>1s), while only sampling 10% of normal traces. Dramatically reduces storage costs without losing critical information.
8. Production-Grade Deployment Strategy
Deployment Patterns
graph TB
subgraph Pattern1["Pattern 1: Sidecar"]
P1A["App Container"] --> P1C["Collector Sidecar"]
P1C --> P1B["Backend"]
end
subgraph Pattern2["Pattern 2: DaemonSet"]
P2A1["App Pod 1"] --> P2C["Collector
DaemonSet"]
P2A2["App Pod 2"] --> P2C
P2C --> P2B["Backend"]
end
subgraph Pattern3["Pattern 3: Gateway"]
P3A["Collector
DaemonSet"] --> P3G["Collector
Gateway"]
P3G --> P3B1["Backend 1"]
P3G --> P3B2["Backend 2"]
end
style P1C fill:#e94560,stroke:#fff,color:#fff
style P2C fill:#e94560,stroke:#fff,color:#fff
style P3G fill:#e94560,stroke:#fff,color:#fff
Three common deployment patterns for OTel Collector
| Pattern | Pros | Cons | Best For |
|---|---|---|---|
| Sidecar | Good isolation, per-service config | High resource overhead | Multi-tenant, strict compliance |
| DaemonSet | Resource efficient, easy to manage | Shared config, noisy neighbor | Most K8s workloads |
| Gateway | Central control, multi-backend routing | Single point of failure | Large orgs, multiple backends |
Production Deployment Checklist
- Start with auto-instrumentation — SDK or zero-code. No need for custom spans right away
- Deploy Collector before backend — always go through the Collector, never export directly from SDK to backend
- Configure tail sampling early — keep 100% error traces, sample normal traces to control costs
- Filter health check spans — /health, /ready, /metrics create enormous noise
- Add resource attributes — service.name, service.version, deployment.environment are mandatory
- Monitor the monitor — the Collector itself needs observability (self-telemetry, /healthz endpoint)
- Consider OBI — run in parallel for third-party services and baseline visibility
9. Comparing OTel with Other Solutions
| Criteria | OpenTelemetry | Datadog Agent | Elastic APM | AWS X-Ray |
|---|---|---|---|---|
| Vendor lock-in | ❌ None | ✅ High | Medium | ✅ AWS only |
| License cost | Free (OSS) | Paid | Basic free | Pay per usage |
| Language support | 12+ languages | 10+ languages | 7 languages | 5 languages |
| eBPF instrumentation | ✅ OBI | ✅ USM | ❌ | ❌ |
| Declarative config | ✅ Stable | Custom YAML | Custom YAML | Custom JSON |
| Community | CNCF, 1000+ contributors | Proprietary | Elastic community | AWS only |
| Backend flexibility | Any OTLP backend | Datadog only | Primarily Elastic | AWS only |
10. Conclusion
OpenTelemetry has evolved from a "promising project" to an indispensable standard for every distributed system. With stable declarative configuration, OBI eBPF instrumentation in beta, and a mature SDK ecosystem across all major languages — the cost of adopting OTel has never been lower, while the value it delivers grows clearer by the day.
Whether you're building on .NET, Node.js, Go, or any other stack — start with auto-instrumentation + Collector. Once you have baseline visibility, add custom spans for business events and consider OBI for the infrastructure layer. Observability isn't a luxury — it's the foundation for operating distributed systems effectively.
Action Summary
- Step 1: Install OTel SDK / auto-instrumentation for your current stack
- Step 2: Deploy OTel Collector (DaemonSet or sidecar)
- Step 3: Configure tail sampling + filter health checks
- Step 4: Connect to a backend (Grafana LGTM stack is free for self-hosting)
- Step 5: Try OBI on a Linux cluster for zero-code visibility
References
- OpenTelemetry — Declarative Configuration is Stable!
- OpenTelemetry eBPF Instrumentation (OBI) Documentation
- InfoQ — OpenTelemetry Declarative Configuration Reaches Stability Milestone
- DEV Community — OBI Complete Guide: KubeCon EU 2026 Beta Launch
- Microsoft Learn — .NET Observability with OpenTelemetry
- Grafana Labs — OpenTelemetry: What's New and Next in 2026
- OpenTelemetry — eBPF Instrumentation 2026 Goals
Cloudflare R2 — Zero Egress Object Storage for Developers
WebTransport — The Next-Gen Real-Time Protocol Now Available in All Browsers
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.