Grafana 13 & LGTM Stack — Building a Comprehensive Observability System in 2026

Posted on: 4/26/2026 11:14:56 PM

As software systems grow more complex — microservices, containers, serverless, edge computing — observability becomes a survival requirement. It's no longer just about checking logs or CPU usage — you need to correlate metrics, logs, traces, and profiles within a single unified interface. Grafana 13, just unveiled at GrafanaCON 2026 (April 21, 2026), along with the LGTM stack (Loki + Grafana + Tempo + Mimir) and Grafana Alloy, provides the most powerful open-source answer to this challenge.

35M+Grafana users worldwide
7,000+Enterprise customers (NVIDIA, Microsoft, Anthropic...)
170+Supported data sources
77%Orgs choosing open-source for observability

1. What is the LGTM Stack?

LGTM is the informal name for Grafana Labs' open-source observability toolkit, where each component handles a specific signal:

ComponentSignalRoleEquivalent
LokiLogsLog storage and query system that only indexes labels instead of full-text — dramatically reducing storage costsElasticsearch, Splunk
GrafanaVisualizationDashboards, alerting, exploration — unified interface for all signalsKibana, Datadog Dashboard
TempoTracesDistributed tracing backend, stores traces without indexing — low costJaeger, Zipkin, Datadog APM
MimirMetricsLong-term Prometheus storage, horizontal scaling, multi-tenantThanos, Cortex, VictoriaMetrics

Beyond these four core components, the stack also includes Pyroscope (continuous profiling), Beyla (eBPF auto-instrumentation), Faro (frontend observability), and most importantly — Grafana Alloy as the central collector.

graph TD
    subgraph Applications
        A1[".NET App"]
        A2["Vue.js SPA"]
        A3["Background Workers"]
    end

    subgraph "Grafana Alloy (Collector)"
        AL["Alloy
OTLP + Prometheus"] end subgraph "LGTM Backend" M["Mimir
Metrics"] L["Loki
Logs"] T["Tempo
Traces"] P["Pyroscope
Profiles"] end G["Grafana 13
Dashboard + Alerting"] A1 -->|OTLP| AL A2 -->|Faro SDK| AL A3 -->|OTLP| AL AL -->|remote_write| M AL -->|loki.write| L AL -->|otlp| T AL -->|pyroscope.write| P M --> G L --> G T --> G P --> G style AL fill:#e94560,stroke:#fff,color:#fff style G fill:#2c3e50,stroke:#fff,color:#fff style M fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style L fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style T fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style P fill:#f8f9fa,stroke:#e94560,color:#2c3e50

LGTM Stack architecture with Grafana Alloy as the central collector

2. Grafana Alloy — The Next-Gen Collector

Grafana Alloy is an open-source distribution of the OpenTelemetry Collector, succeeding Grafana Agent with significantly more power. Alloy was voted the most-used vendor distribution in the 2026 OpenTelemetry community survey.

2.1. Why Choose Alloy Over the Vanilla OTel Collector?

CriteriaVanilla OTel CollectorGrafana Alloy
ConfigurationStatic YAMLRiver language (programmable) + YAML via OTel Engine mode
PipelineReceivers → Processors → ExportersFlexible component graph with branching/merging
PrometheusRequires additional receiverNative Prometheus scraping + remote_write
Auto-discoveryLimitedBuilt-in Kubernetes service discovery, Docker, Consul
DebuggingCLI flagsBuilt-in UI at port 12345 showing component graph
ProfilesNot supportedNative Pyroscope integration

2.2. Basic Alloy Configuration

Alloy configuration uses the River language — declarative yet programmable (variables, conditions, functions):

// Receive telemetry via OTLP (gRPC + HTTP)
otelcol.receiver.otlp "default" {
  grpc { endpoint = "0.0.0.0:4317" }
  http { endpoint = "0.0.0.0:4318" }

  output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// Batch to reduce network calls
otelcol.processor.batch "default" {
  timeout = "5s"
  send_batch_size = 1000

  output {
    metrics = [otelcol.exporter.otlphttp.mimir.input]
    logs    = [otelcol.exporter.otlphttp.loki.input]
    traces  = [otelcol.exporter.otlp.tempo.input]
  }
}

// Export metrics to Mimir
otelcol.exporter.otlphttp "mimir" {
  client {
    endpoint = "http://mimir:9009/otlp"
  }
}

// Export logs to Loki
otelcol.exporter.otlphttp "loki" {
  client {
    endpoint = "http://loki:3100/otlp"
  }
}

// Export traces to Tempo
otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls { insecure = true }
  }
}

OTel Engine Mode — New in 2026

If your team is already familiar with OTel Collector YAML, Alloy now supports OpenTelemetry Engine mode — allowing you to use standard OTel Collector YAML configuration directly without rewriting to River. This makes migration from OTel Collector to Alloy nearly zero-effort.

3. Grafana 13 — Key Improvements

Grafana 13 was announced at GrafanaCON 2026 with a wave of new features focusing on three pillars: faster time-to-value, governance at scale, and ecosystem expansion.

3.1. Dynamic Dashboards (GA)

Previously, you had to clone and manually edit dashboards for each environment/team/service. Dynamic Dashboards are now the default — dashboards automatically adapt based on variables and user context. The new layout engine automatically migrates all existing dashboards to the new schema.

3.2. Git Sync (GA) — Dashboard as Code

The most anticipated feature: bidirectional sync between Grafana and Git repositories (GitHub, GitLab, Bitbucket). Every UI dashboard change auto-commits to Git, and vice versa — pushing from Git updates dashboards. Combined with the new dashboard schema and versioned API, this is a game-changer for teams adopting GitOps for observability.

sequenceDiagram
    participant Dev as Developer
    participant Git as GitHub/GitLab
    participant G13 as Grafana 13
    participant Alert as Alert Manager

    Dev->>Git: Push dashboard JSON
    Git->>G13: Webhook trigger sync
    G13->>G13: Validate & deploy dashboard
    G13->>Alert: Update alert rules
    Note over G13: Dashboard goes live immediately

    G13->>G13: User edits on UI
    G13->>Git: Auto-commit changes
    Git->>Dev: PR notification

Bidirectional Git Sync workflow in Grafana 13

3.3. Suggested Dashboards & Templates

Grafana 13 solves the "blank page" problem — when you connect a new data source, the system suggests dashboard templates based on the data type. Built-in support for standard methodologies:

  • USE Method (Utilization, Saturation, Errors) — for infrastructure monitoring
  • RED Method (Rate, Errors, Duration) — for service monitoring
  • DORA Metrics — for DevOps performance (deployment frequency, lead time, MTTR, change failure rate)

3.4. Other Notable Features

FeatureStatusDescription
Grafana AdvisorGAAutomated health checks: detect failing data sources, outdated plugins, misconfigured SSO
Panel StylesPreviewApply preset styles to time series, gauge, stat, bar chart with one click
Annotation ClusteringGAGroup dense annotations into scrollable tooltips
Graphviz PanelPrivate PreviewDOT language diagrams with live data mapping
Assistant On-PremisesGAAI assistant for Enterprise/OSS, supporting SQL expressions
IBM DB2 Data SourcePreviewDirect Grafana connection to IBM DB2 — expanding into enterprise legacy

4. Loki — Redesigned Architecture with Kafka-backed Ingestion

Loki has always stood out with its "like Prometheus, but for logs" philosophy — only indexing labels, not log content, dramatically reducing storage costs compared to Elasticsearch. At GrafanaCON 2026, Grafana Labs announced a major Loki architecture redesign:

4.1. Kafka-backed Ingestion

The new ingestion layer uses Kafka as an intermediate buffer. Benefits:

  • Durability: Logs won't be lost when Loki ingesters restart or crash
  • Backpressure handling: Kafka naturally handles burst traffic without over-provisioning ingesters
  • Replay: Re-index logs from Kafka offsets when needed

4.2. New Query Engine & Scheduler

The new query planner distributes work across partitions and executes in parallel, delivering:

20xLess data scanned
10xFaster aggregated queries

Logline Acquisition

Grafana Labs recently acquired Logline — a precision search technology for large-scale log datasets. This capability is expected to integrate into Loki in upcoming releases, bringing full-text search without traditional full-text indexing overhead.

5. Deploying the LGTM Stack with Docker Compose

Here's a minimal Docker Compose configuration to run the entire LGTM stack on a single server:

version: "3.8"

services:
  # Grafana Alloy - Collector
  alloy:
    image: grafana/alloy:latest
    volumes:
      - ./alloy-config.river:/etc/alloy/config.river
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "12345:12345" # Alloy UI
    command: run --server.http.listen-addr=0.0.0.0:12345 /etc/alloy/config.river

  # Mimir - Metrics
  mimir:
    image: grafana/mimir:latest
    command: -config.file=/etc/mimir/mimir.yaml
    volumes:
      - ./mimir.yaml:/etc/mimir/mimir.yaml
      - mimir-data:/data
    ports:
      - "9009:9009"

  # Loki - Logs
  loki:
    image: grafana/loki:latest
    command: -config.file=/etc/loki/loki.yaml
    volumes:
      - ./loki.yaml:/etc/loki/loki.yaml
      - loki-data:/loki
    ports:
      - "3100:3100"

  # Tempo - Traces
  tempo:
    image: grafana/tempo:latest
    command: -config.file=/etc/tempo/tempo.yaml
    volumes:
      - ./tempo.yaml:/etc/tempo/tempo.yaml
      - tempo-data:/var/tempo
    ports:
      - "3200:3200"   # Tempo API
      - "9095:9095"   # gRPC

  # Grafana - Visualization
  grafana:
    image: grafana/grafana:13.0.0
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      - ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/ds.yaml
      - grafana-data:/var/lib/grafana
    ports:
      - "3000:3000"

volumes:
  mimir-data:
  loki-data:
  tempo-data:
  grafana-data:

5.1. Provisioning Data Sources

The grafana-datasources.yaml file auto-connects Grafana to backends:

apiVersion: 1
datasources:
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    isDefault: true
    jsonData:
      httpMethod: POST

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: "traceID=(\\w+)"
          url: "$${__value.raw}"
          datasourceUid: tempo
          urlDisplayLabel: "View Trace"

  - name: Tempo
    type: tempo
    access: proxy
    uid: tempo
    url: http://tempo:3200
    jsonData:
      tracesToLogsV2:
        datasourceUid: loki
        filterByTraceID: true
      tracesToMetrics:
        datasourceUid: mimir
        spanStartTimeShift: "-1h"
        spanEndTimeShift: "1h"

Cross-signal Correlation

The configuration above creates bidirectional links between logs ↔ traces ↔ metrics. When viewing a trace in Tempo, you can jump to the corresponding log in Loki (via traceID) and to metrics in Mimir (via time range). This is the core power of LGTM — signal correlation within a single interface.

6. Instrumenting .NET Applications with OpenTelemetry

To send telemetry from a .NET application to the LGTM stack via Alloy, install these NuGet packages:

dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.SqlClient

Configure in Program.cs:

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService("my-api"))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSqlClientInstrumentation(o => o.SetDbStatementForText = true)
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")));

builder.Logging.AddOpenTelemetry(logging =>
{
    logging.IncludeFormattedMessage = true;
    logging.AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317"));
});

With just this code, your application will automatically send metrics, traces, and logs via OTLP to Alloy, which then fans out to Mimir, Tempo, and Loki.

7. Beyla — Auto-instrumentation Without Code Changes

It's not always possible (or desirable) to add SDKs to applications. Grafana Beyla uses eBPF to automatically collect metrics and traces at the kernel level — with absolutely no source code changes or service restarts required.

graph LR
    subgraph "Host / Kubernetes Node"
        K["Kernel (eBPF probes)"]
        APP1["Service A
(any language)"] APP2["Service B
(any language)"] B["Beyla Agent"] end K -.->|hook syscalls| B APP1 -.->|"HTTP/gRPC calls"| K APP2 -.->|"HTTP/gRPC calls"| K B -->|OTLP| AL["Grafana Alloy"] style B fill:#e94560,stroke:#fff,color:#fff style AL fill:#2c3e50,stroke:#fff,color:#fff style K fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Beyla uses eBPF hooks to collect telemetry without instrumentation

Beyla is particularly useful when:

  • You need to monitor third-party services without source code access
  • You want baseline metrics/traces immediately before adding detailed instrumentation
  • Running polyglot microservices (Go, Java, .NET, Node.js, Python...) and wanting uniform telemetry

8. Effective Alerting Strategy

Beautiful dashboards without good alerting are just "eye candy." Grafana 13 improves alerting with provenance support (Kubernetes-style API) and tighter Git Sync integration. Here are alert design principles:

8.1. Alert Pyramid

graph TD
    P1["P1 — Page immediately
Service down, error rate > 5%
Phone call + Slack"] P2["P2 — Handle within the hour
Latency p99 > 2s, disk > 85%
Slack channel"] P3["P3 — Review when free
Memory trending up, cert expiring
Email digest"] P4["P4 — Informational
Deployment success, scaling events
Dashboard annotation"] P1 --> P2 --> P3 --> P4 style P1 fill:#e94560,stroke:#fff,color:#fff style P2 fill:#ff9800,stroke:#fff,color:#fff style P3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style P4 fill:#f8f9fa,stroke:#e0e0e0,color:#888

Alert Pyramid — severity tiering strategy

Alert Fatigue — Enemy #1

If your team receives more than 10 P1/P2 alerts per day, people will start ignoring them all — including the truly critical ones. Rule of thumb: every P1 alert must include a clear runbook link and should only fire when immediate action is required. If no action is needed, it's not a P1.

9. Running LGTM Stack in Production

9.1. Reference Sizing

ScaleMetrics/sLogs GB/dayTraces span/sRecommended Setup
Small (≤ 20 services)50K105KSingle-node Docker Compose, 4 vCPU, 16GB RAM
Medium (20-100 services)500K10050KKubernetes, 3-node cluster, object storage (S3/GCS)
Large (100+ services)5M+1TB+500K+Microservices mode, dedicated read/write path, Kafka ingestion

9.2. Object Storage for Long-term Retention

Loki, Mimir, and Tempo all support object storage (S3, GCS, Azure Blob, MinIO) for long-term retention. This is key to keeping costs low — local disk is only used for cache/WAL, while primary data sits on object storage at ~$0.023/GB/month (S3 Standard).

9.3. Retention Strategy

# Loki — retain logs for 30 days
limits_config:
  retention_period: 720h

# Mimir — retain metrics for 1 year
limits:
  compactor_blocks_retention_period: 8760h

# Tempo — retain traces for 14 days (traces are typically queried recent)
compactor:
  compaction:
    block_retention: 336h

10. Cost Comparison: LGTM Stack vs. SaaS

SolutionEstimated Cost / Month (Medium)Notes
Datadog$3,000 - $8,000Per host + log volume + APM span pricing
New Relic$2,000 - $5,000Per GB ingested + user seat pricing
Elastic Cloud$1,500 - $4,000Per capacity (RAM + storage)
Self-hosted LGTM$200 - $500Infrastructure only (VMs/K8s + object storage). Requires ops team
Grafana Cloud (Free tier)$010K metrics, 50GB logs, 50GB traces/month — sufficient for small projects

Grafana Cloud Free Tier

If you're not ready to self-host, Grafana Cloud offers a generous free tier: 10,000 active metrics, 50GB logs, 50GB traces per month. Enough for side projects or early-stage startups — and you can migrate to self-hosted anytime since the entire stack is open-source.

11. Conclusion

The LGTM Stack with Grafana 13 delivers a comprehensive, open-source observability solution at significantly lower cost than SaaS platforms. The improvements in Grafana 13 — Dynamic Dashboards, Git Sync, Suggested Templates — reduce setup time and accelerate data exploitation. Loki with its new Kafka-backed architecture and parallel query engine has narrowed the performance gap with Elasticsearch while keeping storage costs many times cheaper.

The key to success: start small with Docker Compose, instrument using standard OpenTelemetry (to avoid vendor lock-in), use Alloy as the central collector, and scale to Kubernetes when needed. Observability isn't something you "add later" — it should be a first-class citizen in your system architecture.

References: