Prometheus + Grafana — Building a Production Monitoring Stack

Posted on: 4/25/2026 4:32:04 PM

v3.xPrometheus (CNCF Graduated)
v12Grafana — 100+ data sources
PullMetrics collection model
PromQLPowerful query language

1. Why Do You Need a Monitoring Stack?

Monitoring isn't "nice-to-have" — it's a mandatory requirement for any production system. Without monitoring, you only know something's wrong when customers complain — by then it's too late.

Prometheus + Grafana is the world's most popular monitoring combo, used at Uber, Spotify, DigitalOcean, CERN and thousands of other companies. Both are CNCF Graduated projects, completely free and battle-tested in production with millions of time series.

Prometheus ≠ Grafana

Prometheus collects and stores metrics (time-series database + scraping engine). Grafana visualizes metrics into dashboards and manages alerting. The two tools complement each other — they don't replace one another.

2. Prometheus Architecture — Pull-Based Model

Unlike most monitoring tools (push-based), Prometheus uses a pull model: it actively scrapes metrics from targets (applications, servers) at fixed intervals.

graph LR
    subgraph Targets
        A1[ASP.NET Core App
/metrics endpoint] A2[Node Exporter
Linux system metrics] A3[SQL Server Exporter
DB metrics] A4[Redis Exporter
Cache metrics] end P[Prometheus Server
Scrape + Store + Query] -->|Pull every 15s| A1 P -->|Pull every 15s| A2 P -->|Pull every 15s| A3 P -->|Pull every 15s| A4 P --> AM[Alertmanager
Route alerts] AM --> S[Slack / Email / PagerDuty] P --> G[Grafana
Dashboards + Explore] style P fill:#e94560,stroke:#fff,color:#fff style G fill:#2c3e50,stroke:#fff,color:#fff style AM fill:#ff9800,stroke:#fff,color:#fff style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style A4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 1: Prometheus Architecture — Pull metrics from targets, store in TSDB, expose to Grafana and Alertmanager

Advantages of the pull model:

  • Service discovery: Prometheus auto-discovers new targets (via Kubernetes, Consul, DNS)
  • Easier debugging: Access /metrics endpoint in browser to see raw metrics
  • No agent required: Applications just expose an HTTP endpoint, no separate agent needed
  • Target health: If scrape fails → immediately know the target is down

3. Four Types of Prometheus Metrics

Type Description Example Common PromQL
Counter Value only increases (resets on restart) Total requests, total errors rate(http_requests_total[5m])
Gauge Value goes up and down freely CPU usage, memory, queue size node_memory_MemFree_bytes
Histogram Distributes values into buckets Response time (P50, P95, P99) histogram_quantile(0.95, ...)
Summary Similar to histogram, quantiles computed client-side Response time (pre-calculated) http_request_duration_seconds{quantile="0.95"}

Histogram vs Summary

Always prefer Histogram because it allows server-side quantile calculation (aggregatable across instances). Summary computes quantiles on the client → cannot aggregate across multiple instances. Prometheus 3.x also supports Native Histograms with higher precision and more efficient storage.

4. Integrating Prometheus with ASP.NET Core

4.1 Installation

dotnet add package prometheus-net.AspNetCore
// Program.cs
var builder = WebApplication.CreateBuilder(args);

var app = builder.Build();

// Expose /metrics endpoint for Prometheus scraping
app.MapMetrics(); // → http://localhost:5000/metrics

app.MapGet("/api/orders", async (AppDbContext db) =>
{
    return await db.Orders.ToListAsync();
});

app.Run();

4.2 Custom Metrics

public static class AppMetrics
{
    // Counter — count requests by endpoint and status
    public static readonly Counter HttpRequestsTotal = Metrics.CreateCounter(
        "app_http_requests_total",
        "Total HTTP requests processed",
        new CounterConfiguration
        {
            LabelNames = new[] { "method", "endpoint", "status_code" }
        });

    // Histogram — measure response time
    public static readonly Histogram RequestDuration = Metrics.CreateHistogram(
        "app_request_duration_seconds",
        "HTTP request duration in seconds",
        new HistogramConfiguration
        {
            LabelNames = new[] { "method", "endpoint" },
            Buckets = new[] { 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 }
        });

    // Gauge — active connections
    public static readonly Gauge ActiveConnections = Metrics.CreateGauge(
        "app_active_connections",
        "Number of active connections");

    // Gauge — queue size
    public static readonly Gauge QueueSize = Metrics.CreateGauge(
        "app_background_queue_size",
        "Number of items in background processing queue");
}

// Middleware for automatic metrics
public class MetricsMiddleware
{
    private readonly RequestDelegate _next;

    public MetricsMiddleware(RequestDelegate next) => _next = next;

    public async Task InvokeAsync(HttpContext context)
    {
        var path = context.Request.Path.Value ?? "/";
        var method = context.Request.Method;

        AppMetrics.ActiveConnections.Inc();

        using (AppMetrics.RequestDuration
            .WithLabels(method, path)
            .NewTimer())
        {
            await _next(context);
        }

        AppMetrics.HttpRequestsTotal
            .WithLabels(method, path, context.Response.StatusCode.ToString())
            .Inc();

        AppMetrics.ActiveConnections.Dec();
    }
}

4.3 Prometheus Scrape Configuration

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'aspnet-app'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['order-service:5000', 'payment-service:5000']
        labels:
          environment: 'production'

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  # Kubernetes service discovery
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

5. PromQL — The Metrics Query Language

PromQL (Prometheus Query Language) is a purpose-built language for querying time-series data. Here are the most practical queries:

5.1 Request Rate and Error Rate

# Request rate (requests/second) over last 5 minutes
rate(app_http_requests_total[5m])

# Request rate per endpoint
sum by (endpoint) (rate(app_http_requests_total[5m]))

# Error rate (% of requests returning 5xx)
sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(app_http_requests_total[5m]))
* 100

# Availability (% successful requests)
1 - (
  sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
  /
  sum(rate(app_http_requests_total[5m]))
) * 100

5.2 Latency Percentiles

# P50 (median) response time
histogram_quantile(0.50,
  sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)

# P95 response time
histogram_quantile(0.95,
  sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)

# P99 response time per endpoint
histogram_quantile(0.99,
  sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
)

# Average response time
sum(rate(app_request_duration_seconds_sum[5m]))
/
sum(rate(app_request_duration_seconds_count[5m]))

5.3 Resource Monitoring

# CPU usage per pod (Kubernetes)
sum by (pod) (
  rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
) * 100

# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes * 100

# Disk usage percentage
(node_filesystem_size_bytes - node_filesystem_avail_bytes)
/ node_filesystem_size_bytes * 100

PromQL Golden Rule: rate() first, aggregate second

Always compute rate() BEFORE sum(). If you do it backwards (sum before rate), results will be incorrect because counter resets between instances get "swallowed" by the aggregation. This is the most common PromQL mistake.

6. Alerting — Smart Notifications

6.1 Alert Rules

# alert-rules.yml
groups:
  - name: app-alerts
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(app_http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Error rate > 5% for 5 minutes"
          description: "Current error rate: {{ $value | humanizePercentage }}"

      # High latency
      - alert: HighLatencyP95
        expr: |
          histogram_quantile(0.95,
            sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency > 2 seconds"
          description: "Current P95: {{ $value | humanizeDuration }}"

      # Pod down
      - alert: TargetDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Target {{ $labels.job }}/{{ $labels.instance }} is down"

      # Memory pressure
      - alert: HighMemoryUsage
        expr: |
          (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
          / node_memory_MemTotal_bytes > 0.9
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Memory usage > 90%"

      # Disk almost full
      - alert: DiskSpaceLow
        expr: |
          (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
        for: 15m
        labels:
          severity: critical
        annotations:
          summary: "Disk space < 10% on {{ $labels.mountpoint }}"

6.2 Alertmanager — Route and Deduplicate Alerts

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
      group_wait: 0s
      repeat_interval: 5m

    - match:
        severity: warning
      receiver: 'slack-warnings'
      repeat_interval: 4h

receivers:
  - name: 'default'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts'

  - name: 'pagerduty'
    pagerduty_configs:
      - routing_key: 'xxx'
        severity: '{{ .GroupLabels.severity }}'

  - name: 'slack-warnings'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts-warning'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Alerting Anti-patterns

Avoid alert fatigue: If a team receives >20 alerts/day, most will be ignored. Every alert must be actionable — if you receive an alert and don't need to do anything, remove it. Use for: 5m or longer to prevent flapping (alerts toggling on/off due to temporary spikes).

7. Grafana Dashboards

7.1 RED Method Dashboard

Every service needs a dashboard following the RED method — 3 core metrics:

Metric Meaning PromQL
Rate Requests per second sum(rate(app_http_requests_total[5m]))
Errors Error percentage sum(rate(...{status=~"5.."}[5m])) / sum(rate(...[5m]))
Duration Latency percentiles histogram_quantile(0.95, sum by (le) (rate(..._bucket[5m])))

7.2 USE Method for Infrastructure

Every resource (CPU, Memory, Disk, Network) should be measured using the USE method:

Metric CPU Memory Disk
Utilization % CPU busy % RAM used % disk used
Saturation Load average / cores Swap usage I/O queue depth
Errors CPU throttling events OOM kills I/O errors

8. Deploying on Kubernetes

# Install kube-prometheus-stack (Prometheus + Grafana + Alertmanager + Node Exporter)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=securePassword \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
graph TB
    subgraph Kubernetes Cluster
        subgraph monitoring namespace
            P[Prometheus
StatefulSet] G[Grafana
Deployment] AM[Alertmanager
StatefulSet] NE[Node Exporter
DaemonSet] KSM[Kube-State-Metrics
Deployment] end subgraph production namespace subgraph Pod APP[ASP.NET Core App] APP -->|/metrics| P end end NE -->|system metrics| P KSM -->|k8s state| P P -->|alerts| AM P -->|data source| G AM -->|notify| EXT[Slack / PagerDuty] end style P fill:#e94560,stroke:#fff,color:#fff style G fill:#2c3e50,stroke:#fff,color:#fff style AM fill:#ff9800,stroke:#fff,color:#fff style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 2: kube-prometheus-stack on Kubernetes — all-in-one monitoring solution

8.1 ServiceMonitor for ASP.NET Core

# Auto-discover and scrape ASP.NET Core apps
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: aspnet-apps
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/monitored: "true"
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

9. Recording Rules — Performance Optimization

When PromQL queries are complex and run frequently (dashboard refreshing every 10s), use recording rules to pre-compute:

# recording-rules.yml
groups:
  - name: app-recording
    interval: 30s
    rules:
      # Pre-compute request rate per endpoint
      - record: app:http_request_rate:5m
        expr: sum by (endpoint) (rate(app_http_requests_total[5m]))

      # Pre-compute error rate
      - record: app:http_error_rate:5m
        expr: |
          sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(app_http_requests_total[5m]))

      # Pre-compute P95 latency
      - record: app:http_latency_p95:5m
        expr: |
          histogram_quantile(0.95,
            sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
          )

      # Pre-compute P99 latency per endpoint
      - record: app:http_latency_p99_by_endpoint:5m
        expr: |
          histogram_quantile(0.99,
            sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
          )

Naming Convention for Recording Rules

Standard format: level:metric_name:operations. For example app:http_request_rate:5mapp is the aggregation level, http_request_rate is the metric, 5m is the window. Proper naming helps the team immediately understand what a metric is without reading the original PromQL.

10. Production Best Practices

10.1 Metric Naming

  • Use application prefix: orderservice_requests_total instead of requests_total
  • Include units in name: _seconds, _bytes, _total (counters)
  • Never use high-cardinality labels (user_id, request_id) — will cause Prometheus OOM

10.2 Storage and Retention

  • Local storage: 15-30 days retention is sufficient for most use cases
  • Long-term storage: Use Thanos or Cortex if you need metrics retention >30 days
  • Estimate: ~1-2 bytes/sample × samples/s × retention → plan storage accordingly

10.3 High Availability

  • Run 2 Prometheus instances scraping the same targets → dedup at Thanos/Grafana Cloud
  • Alertmanager runs in cluster mode (3 instances) to avoid duplicate notifications
  • Grafana is stateless — easy horizontal scaling, state stored in PostgreSQL
Component Replicas (Production) Recommended Resources
Prometheus 2 (HA pair) 2 CPU, 8GB RAM, 50GB SSD
Alertmanager 3 (cluster) 0.5 CPU, 256MB RAM
Grafana 2+ 1 CPU, 1GB RAM
Node Exporter 1 per node (DaemonSet) 0.1 CPU, 64MB RAM

Conclusion

Prometheus + Grafana isn't just a monitoring tool — it's the observability foundation for your entire system. Start by exposing /metrics in ASP.NET Core, gradually add custom metrics following the RED method, set up meaningful alerting rules (actionable, not spammy), and build dashboards that help the team detect issues as fast as possible.

With kube-prometheus-stack on Kubernetes, you can have a full monitoring setup in minutes. The hard part isn't installation — it's choosing the right metrics to track and writing alert rules that don't cause alert fatigue.

References