Grafana 13 & LGTM Stack — Building a Comprehensive Observability System in 2026
Posted on: 4/26/2026 11:14:56 PM
Table of contents
- 1. What is the LGTM Stack?
- 2. Grafana Alloy — The Next-Gen Collector
- 3. Grafana 13 — Key Improvements
- 4. Loki — Redesigned Architecture with Kafka-backed Ingestion
- 5. Deploying the LGTM Stack with Docker Compose
- 6. Instrumenting .NET Applications with OpenTelemetry
- 7. Beyla — Auto-instrumentation Without Code Changes
- 8. Effective Alerting Strategy
- 9. Running LGTM Stack in Production
- 10. Cost Comparison: LGTM Stack vs. SaaS
- 11. Conclusion
As software systems grow more complex — microservices, containers, serverless, edge computing — observability becomes a survival requirement. It's no longer just about checking logs or CPU usage — you need to correlate metrics, logs, traces, and profiles within a single unified interface. Grafana 13, just unveiled at GrafanaCON 2026 (April 21, 2026), along with the LGTM stack (Loki + Grafana + Tempo + Mimir) and Grafana Alloy, provides the most powerful open-source answer to this challenge.
1. What is the LGTM Stack?
LGTM is the informal name for Grafana Labs' open-source observability toolkit, where each component handles a specific signal:
| Component | Signal | Role | Equivalent |
|---|---|---|---|
| Loki | Logs | Log storage and query system that only indexes labels instead of full-text — dramatically reducing storage costs | Elasticsearch, Splunk |
| Grafana | Visualization | Dashboards, alerting, exploration — unified interface for all signals | Kibana, Datadog Dashboard |
| Tempo | Traces | Distributed tracing backend, stores traces without indexing — low cost | Jaeger, Zipkin, Datadog APM |
| Mimir | Metrics | Long-term Prometheus storage, horizontal scaling, multi-tenant | Thanos, Cortex, VictoriaMetrics |
Beyond these four core components, the stack also includes Pyroscope (continuous profiling), Beyla (eBPF auto-instrumentation), Faro (frontend observability), and most importantly — Grafana Alloy as the central collector.
graph TD
subgraph Applications
A1[".NET App"]
A2["Vue.js SPA"]
A3["Background Workers"]
end
subgraph "Grafana Alloy (Collector)"
AL["Alloy
OTLP + Prometheus"]
end
subgraph "LGTM Backend"
M["Mimir
Metrics"]
L["Loki
Logs"]
T["Tempo
Traces"]
P["Pyroscope
Profiles"]
end
G["Grafana 13
Dashboard + Alerting"]
A1 -->|OTLP| AL
A2 -->|Faro SDK| AL
A3 -->|OTLP| AL
AL -->|remote_write| M
AL -->|loki.write| L
AL -->|otlp| T
AL -->|pyroscope.write| P
M --> G
L --> G
T --> G
P --> G
style AL fill:#e94560,stroke:#fff,color:#fff
style G fill:#2c3e50,stroke:#fff,color:#fff
style M fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style L fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style T fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style P fill:#f8f9fa,stroke:#e94560,color:#2c3e50
LGTM Stack architecture with Grafana Alloy as the central collector
2. Grafana Alloy — The Next-Gen Collector
Grafana Alloy is an open-source distribution of the OpenTelemetry Collector, succeeding Grafana Agent with significantly more power. Alloy was voted the most-used vendor distribution in the 2026 OpenTelemetry community survey.
2.1. Why Choose Alloy Over the Vanilla OTel Collector?
| Criteria | Vanilla OTel Collector | Grafana Alloy |
|---|---|---|
| Configuration | Static YAML | River language (programmable) + YAML via OTel Engine mode |
| Pipeline | Receivers → Processors → Exporters | Flexible component graph with branching/merging |
| Prometheus | Requires additional receiver | Native Prometheus scraping + remote_write |
| Auto-discovery | Limited | Built-in Kubernetes service discovery, Docker, Consul |
| Debugging | CLI flags | Built-in UI at port 12345 showing component graph |
| Profiles | Not supported | Native Pyroscope integration |
2.2. Basic Alloy Configuration
Alloy configuration uses the River language — declarative yet programmable (variables, conditions, functions):
// Receive telemetry via OTLP (gRPC + HTTP)
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
metrics = [otelcol.processor.batch.default.input]
logs = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.batch.default.input]
}
}
// Batch to reduce network calls
otelcol.processor.batch "default" {
timeout = "5s"
send_batch_size = 1000
output {
metrics = [otelcol.exporter.otlphttp.mimir.input]
logs = [otelcol.exporter.otlphttp.loki.input]
traces = [otelcol.exporter.otlp.tempo.input]
}
}
// Export metrics to Mimir
otelcol.exporter.otlphttp "mimir" {
client {
endpoint = "http://mimir:9009/otlp"
}
}
// Export logs to Loki
otelcol.exporter.otlphttp "loki" {
client {
endpoint = "http://loki:3100/otlp"
}
}
// Export traces to Tempo
otelcol.exporter.otlp "tempo" {
client {
endpoint = "tempo:4317"
tls { insecure = true }
}
}
OTel Engine Mode — New in 2026
If your team is already familiar with OTel Collector YAML, Alloy now supports OpenTelemetry Engine mode — allowing you to use standard OTel Collector YAML configuration directly without rewriting to River. This makes migration from OTel Collector to Alloy nearly zero-effort.
3. Grafana 13 — Key Improvements
Grafana 13 was announced at GrafanaCON 2026 with a wave of new features focusing on three pillars: faster time-to-value, governance at scale, and ecosystem expansion.
3.1. Dynamic Dashboards (GA)
Previously, you had to clone and manually edit dashboards for each environment/team/service. Dynamic Dashboards are now the default — dashboards automatically adapt based on variables and user context. The new layout engine automatically migrates all existing dashboards to the new schema.
3.2. Git Sync (GA) — Dashboard as Code
The most anticipated feature: bidirectional sync between Grafana and Git repositories (GitHub, GitLab, Bitbucket). Every UI dashboard change auto-commits to Git, and vice versa — pushing from Git updates dashboards. Combined with the new dashboard schema and versioned API, this is a game-changer for teams adopting GitOps for observability.
sequenceDiagram
participant Dev as Developer
participant Git as GitHub/GitLab
participant G13 as Grafana 13
participant Alert as Alert Manager
Dev->>Git: Push dashboard JSON
Git->>G13: Webhook trigger sync
G13->>G13: Validate & deploy dashboard
G13->>Alert: Update alert rules
Note over G13: Dashboard goes live immediately
G13->>G13: User edits on UI
G13->>Git: Auto-commit changes
Git->>Dev: PR notification
Bidirectional Git Sync workflow in Grafana 13
3.3. Suggested Dashboards & Templates
Grafana 13 solves the "blank page" problem — when you connect a new data source, the system suggests dashboard templates based on the data type. Built-in support for standard methodologies:
- USE Method (Utilization, Saturation, Errors) — for infrastructure monitoring
- RED Method (Rate, Errors, Duration) — for service monitoring
- DORA Metrics — for DevOps performance (deployment frequency, lead time, MTTR, change failure rate)
3.4. Other Notable Features
| Feature | Status | Description |
|---|---|---|
| Grafana Advisor | GA | Automated health checks: detect failing data sources, outdated plugins, misconfigured SSO |
| Panel Styles | Preview | Apply preset styles to time series, gauge, stat, bar chart with one click |
| Annotation Clustering | GA | Group dense annotations into scrollable tooltips |
| Graphviz Panel | Private Preview | DOT language diagrams with live data mapping |
| Assistant On-Premises | GA | AI assistant for Enterprise/OSS, supporting SQL expressions |
| IBM DB2 Data Source | Preview | Direct Grafana connection to IBM DB2 — expanding into enterprise legacy |
4. Loki — Redesigned Architecture with Kafka-backed Ingestion
Loki has always stood out with its "like Prometheus, but for logs" philosophy — only indexing labels, not log content, dramatically reducing storage costs compared to Elasticsearch. At GrafanaCON 2026, Grafana Labs announced a major Loki architecture redesign:
4.1. Kafka-backed Ingestion
The new ingestion layer uses Kafka as an intermediate buffer. Benefits:
- Durability: Logs won't be lost when Loki ingesters restart or crash
- Backpressure handling: Kafka naturally handles burst traffic without over-provisioning ingesters
- Replay: Re-index logs from Kafka offsets when needed
4.2. New Query Engine & Scheduler
The new query planner distributes work across partitions and executes in parallel, delivering:
Logline Acquisition
Grafana Labs recently acquired Logline — a precision search technology for large-scale log datasets. This capability is expected to integrate into Loki in upcoming releases, bringing full-text search without traditional full-text indexing overhead.
5. Deploying the LGTM Stack with Docker Compose
Here's a minimal Docker Compose configuration to run the entire LGTM stack on a single server:
version: "3.8"
services:
# Grafana Alloy - Collector
alloy:
image: grafana/alloy:latest
volumes:
- ./alloy-config.river:/etc/alloy/config.river
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "12345:12345" # Alloy UI
command: run --server.http.listen-addr=0.0.0.0:12345 /etc/alloy/config.river
# Mimir - Metrics
mimir:
image: grafana/mimir:latest
command: -config.file=/etc/mimir/mimir.yaml
volumes:
- ./mimir.yaml:/etc/mimir/mimir.yaml
- mimir-data:/data
ports:
- "9009:9009"
# Loki - Logs
loki:
image: grafana/loki:latest
command: -config.file=/etc/loki/loki.yaml
volumes:
- ./loki.yaml:/etc/loki/loki.yaml
- loki-data:/loki
ports:
- "3100:3100"
# Tempo - Traces
tempo:
image: grafana/tempo:latest
command: -config.file=/etc/tempo/tempo.yaml
volumes:
- ./tempo.yaml:/etc/tempo/tempo.yaml
- tempo-data:/var/tempo
ports:
- "3200:3200" # Tempo API
- "9095:9095" # gRPC
# Grafana - Visualization
grafana:
image: grafana/grafana:13.0.0
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
volumes:
- ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/ds.yaml
- grafana-data:/var/lib/grafana
ports:
- "3000:3000"
volumes:
mimir-data:
loki-data:
tempo-data:
grafana-data:
5.1. Provisioning Data Sources
The grafana-datasources.yaml file auto-connects Grafana to backends:
apiVersion: 1
datasources:
- name: Mimir
type: prometheus
access: proxy
url: http://mimir:9009/prometheus
isDefault: true
jsonData:
httpMethod: POST
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
derivedFields:
- name: TraceID
matcherRegex: "traceID=(\\w+)"
url: "$${__value.raw}"
datasourceUid: tempo
urlDisplayLabel: "View Trace"
- name: Tempo
type: tempo
access: proxy
uid: tempo
url: http://tempo:3200
jsonData:
tracesToLogsV2:
datasourceUid: loki
filterByTraceID: true
tracesToMetrics:
datasourceUid: mimir
spanStartTimeShift: "-1h"
spanEndTimeShift: "1h"
Cross-signal Correlation
The configuration above creates bidirectional links between logs ↔ traces ↔ metrics. When viewing a trace in Tempo, you can jump to the corresponding log in Loki (via traceID) and to metrics in Mimir (via time range). This is the core power of LGTM — signal correlation within a single interface.
6. Instrumenting .NET Applications with OpenTelemetry
To send telemetry from a .NET application to the LGTM stack via Alloy, install these NuGet packages:
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.SqlClient
Configure in Program.cs:
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddOpenTelemetry()
.ConfigureResource(r => r.AddService("my-api"))
.WithTracing(tracing => tracing
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSqlClientInstrumentation(o => o.SetDbStatementForText = true)
.AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")))
.WithMetrics(metrics => metrics
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddRuntimeInstrumentation()
.AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")));
builder.Logging.AddOpenTelemetry(logging =>
{
logging.IncludeFormattedMessage = true;
logging.AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317"));
});
With just this code, your application will automatically send metrics, traces, and logs via OTLP to Alloy, which then fans out to Mimir, Tempo, and Loki.
7. Beyla — Auto-instrumentation Without Code Changes
It's not always possible (or desirable) to add SDKs to applications. Grafana Beyla uses eBPF to automatically collect metrics and traces at the kernel level — with absolutely no source code changes or service restarts required.
graph LR
subgraph "Host / Kubernetes Node"
K["Kernel (eBPF probes)"]
APP1["Service A
(any language)"]
APP2["Service B
(any language)"]
B["Beyla Agent"]
end
K -.->|hook syscalls| B
APP1 -.->|"HTTP/gRPC calls"| K
APP2 -.->|"HTTP/gRPC calls"| K
B -->|OTLP| AL["Grafana Alloy"]
style B fill:#e94560,stroke:#fff,color:#fff
style AL fill:#2c3e50,stroke:#fff,color:#fff
style K fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Beyla uses eBPF hooks to collect telemetry without instrumentation
Beyla is particularly useful when:
- You need to monitor third-party services without source code access
- You want baseline metrics/traces immediately before adding detailed instrumentation
- Running polyglot microservices (Go, Java, .NET, Node.js, Python...) and wanting uniform telemetry
8. Effective Alerting Strategy
Beautiful dashboards without good alerting are just "eye candy." Grafana 13 improves alerting with provenance support (Kubernetes-style API) and tighter Git Sync integration. Here are alert design principles:
8.1. Alert Pyramid
graph TD
P1["P1 — Page immediately
Service down, error rate > 5%
Phone call + Slack"]
P2["P2 — Handle within the hour
Latency p99 > 2s, disk > 85%
Slack channel"]
P3["P3 — Review when free
Memory trending up, cert expiring
Email digest"]
P4["P4 — Informational
Deployment success, scaling events
Dashboard annotation"]
P1 --> P2 --> P3 --> P4
style P1 fill:#e94560,stroke:#fff,color:#fff
style P2 fill:#ff9800,stroke:#fff,color:#fff
style P3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style P4 fill:#f8f9fa,stroke:#e0e0e0,color:#888
Alert Pyramid — severity tiering strategy
Alert Fatigue — Enemy #1
If your team receives more than 10 P1/P2 alerts per day, people will start ignoring them all — including the truly critical ones. Rule of thumb: every P1 alert must include a clear runbook link and should only fire when immediate action is required. If no action is needed, it's not a P1.
9. Running LGTM Stack in Production
9.1. Reference Sizing
| Scale | Metrics/s | Logs GB/day | Traces span/s | Recommended Setup |
|---|---|---|---|---|
| Small (≤ 20 services) | 50K | 10 | 5K | Single-node Docker Compose, 4 vCPU, 16GB RAM |
| Medium (20-100 services) | 500K | 100 | 50K | Kubernetes, 3-node cluster, object storage (S3/GCS) |
| Large (100+ services) | 5M+ | 1TB+ | 500K+ | Microservices mode, dedicated read/write path, Kafka ingestion |
9.2. Object Storage for Long-term Retention
Loki, Mimir, and Tempo all support object storage (S3, GCS, Azure Blob, MinIO) for long-term retention. This is key to keeping costs low — local disk is only used for cache/WAL, while primary data sits on object storage at ~$0.023/GB/month (S3 Standard).
9.3. Retention Strategy
# Loki — retain logs for 30 days
limits_config:
retention_period: 720h
# Mimir — retain metrics for 1 year
limits:
compactor_blocks_retention_period: 8760h
# Tempo — retain traces for 14 days (traces are typically queried recent)
compactor:
compaction:
block_retention: 336h
10. Cost Comparison: LGTM Stack vs. SaaS
| Solution | Estimated Cost / Month (Medium) | Notes |
|---|---|---|
| Datadog | $3,000 - $8,000 | Per host + log volume + APM span pricing |
| New Relic | $2,000 - $5,000 | Per GB ingested + user seat pricing |
| Elastic Cloud | $1,500 - $4,000 | Per capacity (RAM + storage) |
| Self-hosted LGTM | $200 - $500 | Infrastructure only (VMs/K8s + object storage). Requires ops team |
| Grafana Cloud (Free tier) | $0 | 10K metrics, 50GB logs, 50GB traces/month — sufficient for small projects |
Grafana Cloud Free Tier
If you're not ready to self-host, Grafana Cloud offers a generous free tier: 10,000 active metrics, 50GB logs, 50GB traces per month. Enough for side projects or early-stage startups — and you can migrate to self-hosted anytime since the entire stack is open-source.
11. Conclusion
The LGTM Stack with Grafana 13 delivers a comprehensive, open-source observability solution at significantly lower cost than SaaS platforms. The improvements in Grafana 13 — Dynamic Dashboards, Git Sync, Suggested Templates — reduce setup time and accelerate data exploitation. Loki with its new Kafka-backed architecture and parallel query engine has narrowed the performance gap with Elasticsearch while keeping storage costs many times cheaper.
The key to success: start small with Docker Compose, instrument using standard OpenTelemetry (to avoid vendor lock-in), use Alloy as the central collector, and scale to Kubernetes when needed. Observability isn't something you "add later" — it should be a first-class citizen in your system architecture.
References:
Tauri v2 — Building Ultra-Lightweight Desktop Apps with Vue.js and Rust
SQL Server 2025 — The AI-Ready Database with Vector Search, Native JSON, and RegEx
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.