Grafana 13 & LGTM Stack — Building a Comprehensive Observability System in 2026

Posted on: 4/26/2026 11:14:56 PM

Table of contents

1. What is the LGTM Stack?
2. Grafana Alloy — The Next-Gen Collector
1. 2.1. Why Choose Alloy Over the Vanilla OTel Collector?
2. 2.2. Basic Alloy Configuration
  1. OTel Engine Mode — New in 2026
3. Grafana 13 — Key Improvements
4. Loki — Redesigned Architecture with Kafka-backed Ingestion
1. 4.1. Kafka-backed Ingestion
2. 4.2. New Query Engine & Scheduler
  1. Logline Acquisition
5. Deploying the LGTM Stack with Docker Compose
1. 5.1. Provisioning Data Sources
  1. Cross-signal Correlation
6. Instrumenting .NET Applications with OpenTelemetry
7. Beyla — Auto-instrumentation Without Code Changes
8. Effective Alerting Strategy
1. 8.1. Alert Pyramid
  1. Alert Fatigue — Enemy #1
9. Running LGTM Stack in Production
10. Cost Comparison: LGTM Stack vs. SaaS
1. Grafana Cloud Free Tier
11. Conclusion

As software systems grow more complex — microservices, containers, serverless, edge computing — observability becomes a survival requirement. It's no longer just about checking logs or CPU usage — you need to correlate metrics, logs, traces, and profiles within a single unified interface. Grafana 13, just unveiled at GrafanaCON 2026 (April 21, 2026), along with the LGTM stack (Loki + Grafana + Tempo + Mimir) and Grafana Alloy, provides the most powerful open-source answer to this challenge.

35M+Grafana users worldwide

7,000+Enterprise customers (NVIDIA, Microsoft, Anthropic...)

170+Supported data sources

77%Orgs choosing open-source for observability

1. What is the LGTM Stack?

LGTM is the informal name for Grafana Labs' open-source observability toolkit, where each component handles a specific signal:

Component	Signal	Role	Equivalent
Loki	Logs	Log storage and query system that only indexes labels instead of full-text — dramatically reducing storage costs	Elasticsearch, Splunk
Grafana	Visualization	Dashboards, alerting, exploration — unified interface for all signals	Kibana, Datadog Dashboard
Tempo	Traces	Distributed tracing backend, stores traces without indexing — low cost	Jaeger, Zipkin, Datadog APM
Mimir	Metrics	Long-term Prometheus storage, horizontal scaling, multi-tenant	Thanos, Cortex, VictoriaMetrics

Beyond these four core components, the stack also includes Pyroscope (continuous profiling), Beyla (eBPF auto-instrumentation), Faro (frontend observability), and most importantly — Grafana Alloy as the central collector.

graph TD
    subgraph Applications
        A1[".NET App"]
        A2["Vue.js SPA"]
        A3["Background Workers"]
    end

    subgraph "Grafana Alloy (Collector)"
        AL["Alloy
OTLP + Prometheus"]
    end

    subgraph "LGTM Backend"
        M["Mimir
Metrics"]
        L["Loki
Logs"]
        T["Tempo
Traces"]
        P["Pyroscope
Profiles"]
    end

    G["Grafana 13
Dashboard + Alerting"]

    A1 -->|OTLP| AL
    A2 -->|Faro SDK| AL
    A3 -->|OTLP| AL
    AL -->|remote_write| M
    AL -->|loki.write| L
    AL -->|otlp| T
    AL -->|pyroscope.write| P
    M --> G
    L --> G
    T --> G
    P --> G

    style AL fill:#e94560,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff
    style M fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style L fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style T fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style P fill:#f8f9fa,stroke:#e94560,color:#2c3e50

LGTM Stack architecture with Grafana Alloy as the central collector

2. Grafana Alloy — The Next-Gen Collector

Grafana Alloy is an open-source distribution of the OpenTelemetry Collector, succeeding Grafana Agent with significantly more power. Alloy was voted the most-used vendor distribution in the 2026 OpenTelemetry community survey.

2.1. Why Choose Alloy Over the Vanilla OTel Collector?

Criteria	Vanilla OTel Collector	Grafana Alloy
Configuration	Static YAML	River language (programmable) + YAML via OTel Engine mode
Pipeline	Receivers → Processors → Exporters	Flexible component graph with branching/merging
Prometheus	Requires additional receiver	Native Prometheus scraping + remote_write
Auto-discovery	Limited	Built-in Kubernetes service discovery, Docker, Consul
Debugging	CLI flags	Built-in UI at port 12345 showing component graph
Profiles	Not supported	Native Pyroscope integration

2.2. Basic Alloy Configuration

Alloy configuration uses the River language — declarative yet programmable (variables, conditions, functions):

// Receive telemetry via OTLP (gRPC + HTTP)
otelcol.receiver.otlp "default" {
  grpc { endpoint = "0.0.0.0:4317" }
  http { endpoint = "0.0.0.0:4318" }

  output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// Batch to reduce network calls
otelcol.processor.batch "default" {
  timeout = "5s"
  send_batch_size = 1000

  output {
    metrics = [otelcol.exporter.otlphttp.mimir.input]
    logs    = [otelcol.exporter.otlphttp.loki.input]
    traces  = [otelcol.exporter.otlp.tempo.input]
  }
}

// Export metrics to Mimir
otelcol.exporter.otlphttp "mimir" {
  client {
    endpoint = "http://mimir:9009/otlp"
  }
}

// Export logs to Loki
otelcol.exporter.otlphttp "loki" {
  client {
    endpoint = "http://loki:3100/otlp"
  }
}

// Export traces to Tempo
otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls { insecure = true }
  }
}

OTel Engine Mode — New in 2026

If your team is already familiar with OTel Collector YAML, Alloy now supports OpenTelemetry Engine mode — allowing you to use standard OTel Collector YAML configuration directly without rewriting to River. This makes migration from OTel Collector to Alloy nearly zero-effort.

3. Grafana 13 — Key Improvements

Grafana 13 was announced at GrafanaCON 2026 with a wave of new features focusing on three pillars: faster time-to-value, governance at scale, and ecosystem expansion.

3.1. Dynamic Dashboards (GA)

Previously, you had to clone and manually edit dashboards for each environment/team/service. Dynamic Dashboards are now the default — dashboards automatically adapt based on variables and user context. The new layout engine automatically migrates all existing dashboards to the new schema.

3.2. Git Sync (GA) — Dashboard as Code

The most anticipated feature: bidirectional sync between Grafana and Git repositories (GitHub, GitLab, Bitbucket). Every UI dashboard change auto-commits to Git, and vice versa — pushing from Git updates dashboards. Combined with the new dashboard schema and versioned API, this is a game-changer for teams adopting GitOps for observability.

sequenceDiagram
    participant Dev as Developer
    participant Git as GitHub/GitLab
    participant G13 as Grafana 13
    participant Alert as Alert Manager

    Dev->>Git: Push dashboard JSON
    Git->>G13: Webhook trigger sync
    G13->>G13: Validate & deploy dashboard
    G13->>Alert: Update alert rules
    Note over G13: Dashboard goes live immediately

    G13->>G13: User edits on UI
    G13->>Git: Auto-commit changes
    Git->>Dev: PR notification

Bidirectional Git Sync workflow in Grafana 13

3.3. Suggested Dashboards & Templates

Grafana 13 solves the "blank page" problem — when you connect a new data source, the system suggests dashboard templates based on the data type. Built-in support for standard methodologies:

USE Method (Utilization, Saturation, Errors) — for infrastructure monitoring
RED Method (Rate, Errors, Duration) — for service monitoring
DORA Metrics — for DevOps performance (deployment frequency, lead time, MTTR, change failure rate)

3.4. Other Notable Features

Feature	Status	Description
Grafana Advisor	GA	Automated health checks: detect failing data sources, outdated plugins, misconfigured SSO
Panel Styles	Preview	Apply preset styles to time series, gauge, stat, bar chart with one click
Annotation Clustering	GA	Group dense annotations into scrollable tooltips
Graphviz Panel	Private Preview	DOT language diagrams with live data mapping
Assistant On-Premises	GA	AI assistant for Enterprise/OSS, supporting SQL expressions
IBM DB2 Data Source	Preview	Direct Grafana connection to IBM DB2 — expanding into enterprise legacy

4. Loki — Redesigned Architecture with Kafka-backed Ingestion

Loki has always stood out with its "like Prometheus, but for logs" philosophy — only indexing labels, not log content, dramatically reducing storage costs compared to Elasticsearch. At GrafanaCON 2026, Grafana Labs announced a major Loki architecture redesign:

4.1. Kafka-backed Ingestion

The new ingestion layer uses Kafka as an intermediate buffer. Benefits:

Durability: Logs won't be lost when Loki ingesters restart or crash
Backpressure handling: Kafka naturally handles burst traffic without over-provisioning ingesters
Replay: Re-index logs from Kafka offsets when needed

4.2. New Query Engine & Scheduler

The new query planner distributes work across partitions and executes in parallel, delivering:

20xLess data scanned

10xFaster aggregated queries

Logline Acquisition

Grafana Labs recently acquired Logline — a precision search technology for large-scale log datasets. This capability is expected to integrate into Loki in upcoming releases, bringing full-text search without traditional full-text indexing overhead.

5. Deploying the LGTM Stack with Docker Compose

Here's a minimal Docker Compose configuration to run the entire LGTM stack on a single server:

version: "3.8"

services:
  # Grafana Alloy - Collector
  alloy:
    image: grafana/alloy:latest
    volumes:
      - ./alloy-config.river:/etc/alloy/config.river
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "12345:12345" # Alloy UI
    command: run --server.http.listen-addr=0.0.0.0:12345 /etc/alloy/config.river

  # Mimir - Metrics
  mimir:
    image: grafana/mimir:latest
    command: -config.file=/etc/mimir/mimir.yaml
    volumes:
      - ./mimir.yaml:/etc/mimir/mimir.yaml
      - mimir-data:/data
    ports:
      - "9009:9009"

  # Loki - Logs
  loki:
    image: grafana/loki:latest
    command: -config.file=/etc/loki/loki.yaml
    volumes:
      - ./loki.yaml:/etc/loki/loki.yaml
      - loki-data:/loki
    ports:
      - "3100:3100"

  # Tempo - Traces
  tempo:
    image: grafana/tempo:latest
    command: -config.file=/etc/tempo/tempo.yaml
    volumes:
      - ./tempo.yaml:/etc/tempo/tempo.yaml
      - tempo-data:/var/tempo
    ports:
      - "3200:3200"   # Tempo API
      - "9095:9095"   # gRPC

  # Grafana - Visualization
  grafana:
    image: grafana/grafana:13.0.0
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      - ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/ds.yaml
      - grafana-data:/var/lib/grafana
    ports:
      - "3000:3000"

volumes:
  mimir-data:
  loki-data:
  tempo-data:
  grafana-data:

5.1. Provisioning Data Sources

The grafana-datasources.yaml file auto-connects Grafana to backends:

apiVersion: 1
datasources:
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    isDefault: true
    jsonData:
      httpMethod: POST

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: "traceID=(\\w+)"
          url: "$${__value.raw}"
          datasourceUid: tempo
          urlDisplayLabel: "View Trace"

  - name: Tempo
    type: tempo
    access: proxy
    uid: tempo
    url: http://tempo:3200
    jsonData:
      tracesToLogsV2:
        datasourceUid: loki
        filterByTraceID: true
      tracesToMetrics:
        datasourceUid: mimir
        spanStartTimeShift: "-1h"
        spanEndTimeShift: "1h"

Cross-signal Correlation

The configuration above creates bidirectional links between logs ↔ traces ↔ metrics. When viewing a trace in Tempo, you can jump to the corresponding log in Loki (via traceID) and to metrics in Mimir (via time range). This is the core power of LGTM — signal correlation within a single interface.

6. Instrumenting .NET Applications with OpenTelemetry

To send telemetry from a .NET application to the LGTM stack via Alloy, install these NuGet packages:

dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.SqlClient

Configure in Program.cs:

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService("my-api"))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSqlClientInstrumentation(o => o.SetDbStatementForText = true)
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")));

builder.Logging.AddOpenTelemetry(logging =>
{
    logging.IncludeFormattedMessage = true;
    logging.AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317"));
});

With just this code, your application will automatically send metrics, traces, and logs via OTLP to Alloy, which then fans out to Mimir, Tempo, and Loki.

7. Beyla — Auto-instrumentation Without Code Changes

It's not always possible (or desirable) to add SDKs to applications. Grafana Beyla uses eBPF to automatically collect metrics and traces at the kernel level — with absolutely no source code changes or service restarts required.

graph LR
    subgraph "Host / Kubernetes Node"
        K["Kernel (eBPF probes)"]
        APP1["Service A
(any language)"]
        APP2["Service B
(any language)"]
        B["Beyla Agent"]
    end

    K -.->|hook syscalls| B
    APP1 -.->|"HTTP/gRPC calls"| K
    APP2 -.->|"HTTP/gRPC calls"| K
    B -->|OTLP| AL["Grafana Alloy"]

    style B fill:#e94560,stroke:#fff,color:#fff
    style AL fill:#2c3e50,stroke:#fff,color:#fff
    style K fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Beyla uses eBPF hooks to collect telemetry without instrumentation

Beyla is particularly useful when:

You need to monitor third-party services without source code access
You want baseline metrics/traces immediately before adding detailed instrumentation
Running polyglot microservices (Go, Java, .NET, Node.js, Python...) and wanting uniform telemetry

8. Effective Alerting Strategy

Beautiful dashboards without good alerting are just "eye candy." Grafana 13 improves alerting with provenance support (Kubernetes-style API) and tighter Git Sync integration. Here are alert design principles:

8.1. Alert Pyramid

graph TD
    P1["P1 — Page immediately
Service down, error rate > 5%
Phone call + Slack"]
    P2["P2 — Handle within the hour
Latency p99 > 2s, disk > 85%
Slack channel"]
    P3["P3 — Review when free
Memory trending up, cert expiring
Email digest"]
    P4["P4 — Informational
Deployment success, scaling events
Dashboard annotation"]

    P1 --> P2 --> P3 --> P4

    style P1 fill:#e94560,stroke:#fff,color:#fff
    style P2 fill:#ff9800,stroke:#fff,color:#fff
    style P3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style P4 fill:#f8f9fa,stroke:#e0e0e0,color:#888

Alert Pyramid — severity tiering strategy

Alert Fatigue — Enemy #1

If your team receives more than 10 P1/P2 alerts per day, people will start ignoring them all — including the truly critical ones. Rule of thumb: every P1 alert must include a clear runbook link and should only fire when immediate action is required. If no action is needed, it's not a P1.

9. Running LGTM Stack in Production

9.1. Reference Sizing

Scale	Metrics/s	Logs GB/day	Traces span/s	Recommended Setup
Small (≤ 20 services)	50K	10	5K	Single-node Docker Compose, 4 vCPU, 16GB RAM
Medium (20-100 services)	500K	100	50K	Kubernetes, 3-node cluster, object storage (S3/GCS)
Large (100+ services)	5M+	1TB+	500K+	Microservices mode, dedicated read/write path, Kafka ingestion

9.2. Object Storage for Long-term Retention

Loki, Mimir, and Tempo all support object storage (S3, GCS, Azure Blob, MinIO) for long-term retention. This is key to keeping costs low — local disk is only used for cache/WAL, while primary data sits on object storage at ~$0.023/GB/month (S3 Standard).

9.3. Retention Strategy

# Loki — retain logs for 30 days
limits_config:
  retention_period: 720h

# Mimir — retain metrics for 1 year
limits:
  compactor_blocks_retention_period: 8760h

# Tempo — retain traces for 14 days (traces are typically queried recent)
compactor:
  compaction:
    block_retention: 336h

10. Cost Comparison: LGTM Stack vs. SaaS

Solution	Estimated Cost / Month (Medium)	Notes
Datadog	$3,000 - $8,000	Per host + log volume + APM span pricing
New Relic	$2,000 - $5,000	Per GB ingested + user seat pricing
Elastic Cloud	$1,500 - $4,000	Per capacity (RAM + storage)
Self-hosted LGTM	$200 - $500	Infrastructure only (VMs/K8s + object storage). Requires ops team
Grafana Cloud (Free tier)	$0	10K metrics, 50GB logs, 50GB traces/month — sufficient for small projects

Grafana Cloud Free Tier

If you're not ready to self-host, Grafana Cloud offers a generous free tier: 10,000 active metrics, 50GB logs, 50GB traces per month. Enough for side projects or early-stage startups — and you can migrate to self-hosted anytime since the entire stack is open-source.

11. Conclusion

The LGTM Stack with Grafana 13 delivers a comprehensive, open-source observability solution at significantly lower cost than SaaS platforms. The improvements in Grafana 13 — Dynamic Dashboards, Git Sync, Suggested Templates — reduce setup time and accelerate data exploitation. Loki with its new Kafka-backed architecture and parallel query engine has narrowed the performance gap with Elasticsearch while keeping storage costs many times cheaper.

The key to success: start small with Docker Compose, instrument using standard OpenTelemetry (to avoid vendor lock-in), use Alloy as the central collector, and scale to Kubernetes when needed. Observability isn't something you "add later" — it should be a first-class citizen in your system architecture.

References:

#system design #Observability #OpenTelemetry #Docker Compose #Grafana #Loki #Prometheus #Monitoring

# Grafana 13 & LGTM Stack — Building a Comprehensive Observability System in 2026

As software systems grow more complex — microservices, containers, serverless, edge computing — **observability** becomes a survival requirement. It's no longer just about checking logs or CPU usage — you need to correlate metrics, logs, traces, and profiles within a single unified interface. **Grafana 13**, just unveiled at GrafanaCON 2026 (April 21, 2026), along with the LGTM stack (Loki + Grafana + Tempo + Mimir) and Grafana Alloy, provides the most powerful open-source answer to this challenge.

35M+Grafana users worldwide

7,000+Enterprise customers (NVIDIA, Microsoft, Anthropic...)

170+Supported data sources

77%Orgs choosing open-source for observability

## 1. What is the LGTM Stack?

LGTM is the informal name for Grafana Labs' open-source observability toolkit, where each component handles a specific signal:

| Component | Signal | Role | Equivalent |
| --- | --- | --- | --- |
| **Loki** | Logs | Log storage and query system that only indexes labels instead of full-text — dramatically reducing storage costs | Elasticsearch, Splunk |
| **Grafana** | Visualization | Dashboards, alerting, exploration — unified interface for all signals | Kibana, Datadog Dashboard |
| **Tempo** | Traces | Distributed tracing backend, stores traces without indexing — low cost | Jaeger, Zipkin, Datadog APM |
| **Mimir** | Metrics | Long-term Prometheus storage, horizontal scaling, multi-tenant | Thanos, Cortex, VictoriaMetrics |

Beyond these four core components, the stack also includes **Pyroscope** (continuous profiling), **Beyla** (eBPF auto-instrumentation), **Faro** (frontend observability), and most importantly — **Grafana Alloy** as the central collector.

```
graph TD
    subgraph Applications
        A1[".NET App"]
        A2["Vue.js SPA"]
        A3["Background Workers"]
    end

subgraph "Grafana Alloy (Collector)"
        AL["Alloy  
OTLP + Prometheus"]
    end

subgraph "LGTM Backend"
        M["Mimir  
Metrics"]
        L["Loki  
Logs"]
        T["Tempo  
Traces"]
        P["Pyroscope  
Profiles"]
    end

G["Grafana 13  
Dashboard + Alerting"]

style AL fill:#e94560,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff
    style M fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style L fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style T fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style P fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```
LGTM Stack architecture with Grafana Alloy as the central collector

## 2. Grafana Alloy — The Next-Gen Collector

**Grafana Alloy** is an open-source distribution of the OpenTelemetry Collector, succeeding Grafana Agent with significantly more power. Alloy was voted the **most-used vendor distribution** in the 2026 OpenTelemetry community survey.

### 2.1. Why Choose Alloy Over the Vanilla OTel Collector?

| Criteria | Vanilla OTel Collector | Grafana Alloy |
| --- | --- | --- |
| Configuration | Static YAML | River language (programmable) + YAML via OTel Engine mode |
| Pipeline | Receivers → Processors → Exporters | Flexible component graph with branching/merging |
| Prometheus | Requires additional receiver | Native Prometheus scraping + remote_write |
| Auto-discovery | Limited | Built-in Kubernetes service discovery, Docker, Consul |
| Debugging | CLI flags | Built-in UI at port 12345 showing component graph |
| Profiles | Not supported | Native Pyroscope integration |

### 2.2. Basic Alloy Configuration

Alloy configuration uses the River language — declarative yet programmable (variables, conditions, functions):

```
// Receive telemetry via OTLP (gRPC + HTTP)
otelcol.receiver.otlp "default" {
  grpc { endpoint = "0.0.0.0:4317" }
  http { endpoint = "0.0.0.0:4318" }

output {
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
    traces  = [otelcol.processor.batch.default.input]
  }
}

// Batch to reduce network calls
otelcol.processor.batch "default" {
  timeout = "5s"
  send_batch_size = 1000

output {
    metrics = [otelcol.exporter.otlphttp.mimir.input]
    logs    = [otelcol.exporter.otlphttp.loki.input]
    traces  = [otelcol.exporter.otlp.tempo.input]
  }
}

// Export metrics to Mimir
otelcol.exporter.otlphttp "mimir" {
  client {
    endpoint = "http://mimir:9009/otlp"
  }
}

// Export logs to Loki
otelcol.exporter.otlphttp "loki" {
  client {
    endpoint = "http://loki:3100/otlp"
  }
}

// Export traces to Tempo
otelcol.exporter.otlp "tempo" {
  client {
    endpoint = "tempo:4317"
    tls { insecure = true }
  }
}

```

#### OTel Engine Mode — New in 2026

If your team is already familiar with OTel Collector YAML, Alloy now supports **OpenTelemetry Engine mode** — allowing you to use standard OTel Collector YAML configuration directly without rewriting to River. This makes migration from OTel Collector to Alloy nearly zero-effort.

## 3. Grafana 13 — Key Improvements

Grafana 13 was announced at GrafanaCON 2026 with a wave of new features focusing on three pillars: **faster time-to-value**, **governance at scale**, and **ecosystem expansion**.

### 3.1. Dynamic Dashboards (GA)

Previously, you had to clone and manually edit dashboards for each environment/team/service. Dynamic Dashboards are now the default — dashboards **automatically adapt** based on variables and user context. The new layout engine automatically migrates all existing dashboards to the new schema.

### 3.2. Git Sync (GA) — Dashboard as Code

The most anticipated feature: **bidirectional sync** between Grafana and Git repositories (GitHub, GitLab, Bitbucket). Every UI dashboard change auto-commits to Git, and vice versa — pushing from Git updates dashboards. Combined with the new dashboard schema and versioned API, this is a game-changer for teams adopting GitOps for observability.

```
sequenceDiagram
    participant Dev as Developer
    participant Git as GitHub/GitLab
    participant G13 as Grafana 13
    participant Alert as Alert Manager

Dev->>Git: Push dashboard JSON
    Git->>G13: Webhook trigger sync
    G13->>G13: Validate & deploy dashboard
    G13->>Alert: Update alert rules
    Note over G13: Dashboard goes live immediately

G13->>G13: User edits on UI
    G13->>Git: Auto-commit changes
    Git->>Dev: PR notification

```
Bidirectional Git Sync workflow in Grafana 13

### 3.3. Suggested Dashboards & Templates

Grafana 13 solves the "blank page" problem — when you connect a new data source, the system **suggests dashboard templates** based on the data type. Built-in support for standard methodologies:

- **USE Method** (Utilization, Saturation, Errors) — for infrastructure monitoring
- **RED Method** (Rate, Errors, Duration) — for service monitoring
- **DORA Metrics** — for DevOps performance (deployment frequency, lead time, MTTR, change failure rate)

### 3.4. Other Notable Features

| Feature | Status | Description |
| --- | --- | --- |
| Grafana Advisor | GA | Automated health checks: detect failing data sources, outdated plugins, misconfigured SSO |
| Panel Styles | Preview | Apply preset styles to time series, gauge, stat, bar chart with one click |
| Annotation Clustering | GA | Group dense annotations into scrollable tooltips |
| Graphviz Panel | Private Preview | DOT language diagrams with live data mapping |
| Assistant On-Premises | GA | AI assistant for Enterprise/OSS, supporting SQL expressions |
| IBM DB2 Data Source | Preview | Direct Grafana connection to IBM DB2 — expanding into enterprise legacy |

## 4. Loki — Redesigned Architecture with Kafka-backed Ingestion

### 4.1. Kafka-backed Ingestion

The new ingestion layer uses **Kafka** as an intermediate buffer. Benefits:

- **Durability**: Logs won't be lost when Loki ingesters restart or crash
- **Backpressure handling**: Kafka naturally handles burst traffic without over-provisioning ingesters
- **Replay**: Re-index logs from Kafka offsets when needed

### 4.2. New Query Engine & Scheduler

The new query planner distributes work across partitions and executes in **parallel**, delivering:

20xLess data scanned

10xFaster aggregated queries

#### Logline Acquisition

Grafana Labs recently acquired **Logline** — a precision search technology for large-scale log datasets. This capability is expected to integrate into Loki in upcoming releases, bringing full-text search without traditional full-text indexing overhead.

## 5. Deploying the LGTM Stack with Docker Compose

Here's a minimal Docker Compose configuration to run the entire LGTM stack on a single server:

```
version: "3.8"

services:
  # Grafana Alloy - Collector
  alloy:
    image: grafana/alloy:latest
    volumes:
      - ./alloy-config.river:/etc/alloy/config.river
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "12345:12345" # Alloy UI
    command: run --server.http.listen-addr=0.0.0.0:12345 /etc/alloy/config.river

# Mimir - Metrics
  mimir:
    image: grafana/mimir:latest
    command: -config.file=/etc/mimir/mimir.yaml
    volumes:
      - ./mimir.yaml:/etc/mimir/mimir.yaml
      - mimir-data:/data
    ports:
      - "9009:9009"

# Loki - Logs
  loki:
    image: grafana/loki:latest
    command: -config.file=/etc/loki/loki.yaml
    volumes:
      - ./loki.yaml:/etc/loki/loki.yaml
      - loki-data:/loki
    ports:
      - "3100:3100"

# Tempo - Traces
  tempo:
    image: grafana/tempo:latest
    command: -config.file=/etc/tempo/tempo.yaml
    volumes:
      - ./tempo.yaml:/etc/tempo/tempo.yaml
      - tempo-data:/var/tempo
    ports:
      - "3200:3200"   # Tempo API
      - "9095:9095"   # gRPC

# Grafana - Visualization
  grafana:
    image: grafana/grafana:13.0.0
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    volumes:
      - ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/ds.yaml
      - grafana-data:/var/lib/grafana
    ports:
      - "3000:3000"

volumes:
  mimir-data:
  loki-data:
  tempo-data:
  grafana-data:

```

### 5.1. Provisioning Data Sources

The `grafana-datasources.yaml` file auto-connects Grafana to backends:

```
apiVersion: 1
datasources:
  - name: Mimir
    type: prometheus
    access: proxy
    url: http://mimir:9009/prometheus
    isDefault: true
    jsonData:
      httpMethod: POST

- name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: "traceID=(\\w+)"
          url: "$${__value.raw}"
          datasourceUid: tempo
          urlDisplayLabel: "View Trace"

- name: Tempo
    type: tempo
    access: proxy
    uid: tempo
    url: http://tempo:3200
    jsonData:
      tracesToLogsV2:
        datasourceUid: loki
        filterByTraceID: true
      tracesToMetrics:
        datasourceUid: mimir
        spanStartTimeShift: "-1h"
        spanEndTimeShift: "1h"

```

#### Cross-signal Correlation

## 6. Instrumenting .NET Applications with OpenTelemetry

To send telemetry from a .NET application to the LGTM stack via Alloy, install these NuGet packages:

```
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Instrumentation.Http
dotnet add package OpenTelemetry.Instrumentation.SqlClient

```
Configure in `Program.cs`:

```
var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry()
    .ConfigureResource(r => r.AddService("my-api"))
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddSqlClientInstrumentation(o => o.SetDbStatementForText = true)
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation()
        .AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317")));

builder.Logging.AddOpenTelemetry(logging =>
{
    logging.IncludeFormattedMessage = true;
    logging.AddOtlpExporter(o => o.Endpoint = new Uri("http://alloy:4317"));
});

```
With just this code, your application will automatically send **metrics, traces, and logs** via OTLP to Alloy, which then fans out to Mimir, Tempo, and Loki.

## 7. Beyla — Auto-instrumentation Without Code Changes

It's not always possible (or desirable) to add SDKs to applications. **Grafana Beyla** uses **eBPF** to automatically collect metrics and traces at the kernel level — with absolutely no source code changes or service restarts required.

```
graph LR
    subgraph "Host / Kubernetes Node"
        K["Kernel (eBPF probes)"]
        APP1["Service A  
(any language)"]
        APP2["Service B  
(any language)"]
        B["Beyla Agent"]
    end

style B fill:#e94560,stroke:#fff,color:#fff
    style AL fill:#2c3e50,stroke:#fff,color:#fff
    style K fill:#f8f9fa,stroke:#e94560,color:#2c3e50

```
Beyla uses eBPF hooks to collect telemetry without instrumentation

Beyla is particularly useful when:

- You need to monitor third-party services without source code access
- You want baseline metrics/traces immediately before adding detailed instrumentation
- Running polyglot microservices (Go, Java, .NET, Node.js, Python...) and wanting uniform telemetry

## 8. Effective Alerting Strategy

### 8.1. Alert Pyramid

```
graph TD
    P1["P1 — Page immediately  
Service down, error rate > 5%  
Phone call + Slack"]
    P2["P2 — Handle within the hour  
Latency p99 > 2s, disk > 85%  
Slack channel"]
    P3["P3 — Review when free  
Memory trending up, cert expiring  
Email digest"]
    P4["P4 — Informational  
Deployment success, scaling events  
Dashboard annotation"]

P1 --> P2 --> P3 --> P4

style P1 fill:#e94560,stroke:#fff,color:#fff
    style P2 fill:#ff9800,stroke:#fff,color:#fff
    style P3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style P4 fill:#f8f9fa,stroke:#e0e0e0,color:#888

```
Alert Pyramid — severity tiering strategy

#### Alert Fatigue — Enemy #1

If your team receives more than 10 P1/P2 alerts per day, people will start **ignoring them all** — including the truly critical ones. Rule of thumb: every P1 alert must include a clear **runbook link** and should only fire when **immediate action is required**. If no action is needed, it's not a P1.

## 9. Running LGTM Stack in Production

### 9.1. Reference Sizing

| Scale | Metrics/s | Logs GB/day | Traces span/s | Recommended Setup |
| --- | --- | --- | --- | --- |
| Small (≤ 20 services) | 50K | 10 | 5K | Single-node Docker Compose, 4 vCPU, 16GB RAM |
| Medium (20-100 services) | 500K | 100 | 50K | Kubernetes, 3-node cluster, object storage (S3/GCS) |
| Large (100+ services) | 5M+ | 1TB+ | 500K+ | Microservices mode, dedicated read/write path, Kafka ingestion |

### 9.2. Object Storage for Long-term Retention

Loki, Mimir, and Tempo all support **object storage** (S3, GCS, Azure Blob, MinIO) for long-term retention. This is key to keeping costs low — local disk is only used for cache/WAL, while primary data sits on object storage at ~$0.023/GB/month (S3 Standard).

### 9.3. Retention Strategy

```
# Loki — retain logs for 30 days
limits_config:
  retention_period: 720h

# Mimir — retain metrics for 1 year
limits:
  compactor_blocks_retention_period: 8760h

# Tempo — retain traces for 14 days (traces are typically queried recent)
compactor:
  compaction:
    block_retention: 336h

```

## 10. Cost Comparison: LGTM Stack vs. SaaS

| Solution | Estimated Cost / Month (Medium) | Notes |
| --- | --- | --- |
| Datadog | $3,000 - $8,000 | Per host + log volume + APM span pricing |
| New Relic | $2,000 - $5,000 | Per GB ingested + user seat pricing |
| Elastic Cloud | $1,500 - $4,000 | Per capacity (RAM + storage) |
| **Self-hosted LGTM** | **$200 - $500** | Infrastructure only (VMs/K8s + object storage). Requires ops team |
| Grafana Cloud (Free tier) | **$0** | 10K metrics, 50GB logs, 50GB traces/month — sufficient for small projects |

#### Grafana Cloud Free Tier

## 11. Conclusion

The key to success: start small with Docker Compose, instrument using standard OpenTelemetry (to avoid vendor lock-in), use Alloy as the central collector, and scale to Kubernetes when needed. Observability isn't something you "add later" — it should be a **first-class citizen** in your system architecture.

**References:**

- [Grafana Labs — Grafana 13 Launch Announcement (GrafanaCON 2026)](https://grafana.com/press/2026/04/21/grafana-labs-launches-grafana-13-at-grafanacon-2026-makes-open-observability-easier-to-run-at-scale/)
- [Grafana Docs — What's new in Grafana v13.0](https://grafana.com/docs/grafana/latest/whatsnew/whats-new-in-v13-0/)
- [Grafana Alloy — OpenTelemetry Collector Distribution](https://grafana.com/oss/alloy-opentelemetry-collector/)
- [GrafanaCON 2026 — All Announcements](https://grafana.com/blog/grafanacon-2026-announcements/)
- [OpenTelemetry .NET SDK Documentation](https://opentelemetry.io/docs/languages/net/)

Tauri v2 — Building Ultra-Lightweight Desktop Apps with Vue.js and Rust

SQL Server 2025 — The AI-Ready Database with Vector Search, Native JSON, and RegEx

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.