Prometheus + Grafana — Building a Production Monitoring Stack

Posted on: 4/25/2026 4:32:04 PM

Table of contents

1. Why Do You Need a Monitoring Stack?
1. Prometheus ≠ Grafana
2. Prometheus Architecture — Pull-Based Model
3. Four Types of Prometheus Metrics
1. Histogram vs Summary
4. Integrating Prometheus with ASP.NET Core
5. PromQL — The Metrics Query Language
6. Alerting — Smart Notifications
1. 6.1 Alert Rules
2. 6.2 Alertmanager — Route and Deduplicate Alerts
  1. Alerting Anti-patterns
7. Grafana Dashboards
1. 7.1 RED Method Dashboard
2. 7.2 USE Method for Infrastructure
8. Deploying on Kubernetes
1. 8.1 ServiceMonitor for ASP.NET Core
9. Recording Rules — Performance Optimization
1. Naming Convention for Recording Rules
10. Production Best Practices
Conclusion
References

v3.xPrometheus (CNCF Graduated)

v12Grafana — 100+ data sources

PullMetrics collection model

PromQLPowerful query language

1. Why Do You Need a Monitoring Stack?

Monitoring isn't "nice-to-have" — it's a mandatory requirement for any production system. Without monitoring, you only know something's wrong when customers complain — by then it's too late.

Prometheus + Grafana is the world's most popular monitoring combo, used at Uber, Spotify, DigitalOcean, CERN and thousands of other companies. Both are CNCF Graduated projects, completely free and battle-tested in production with millions of time series.

Prometheus ≠ Grafana

Prometheus collects and stores metrics (time-series database + scraping engine). Grafana visualizes metrics into dashboards and manages alerting. The two tools complement each other — they don't replace one another.

2. Prometheus Architecture — Pull-Based Model

Unlike most monitoring tools (push-based), Prometheus uses a pull model: it actively scrapes metrics from targets (applications, servers) at fixed intervals.

graph LR
    subgraph Targets
        A1[ASP.NET Core App
/metrics endpoint]
        A2[Node Exporter
Linux system metrics]
        A3[SQL Server Exporter
DB metrics]
        A4[Redis Exporter
Cache metrics]
    end

    P[Prometheus Server
Scrape + Store + Query] -->|Pull every 15s| A1
    P -->|Pull every 15s| A2
    P -->|Pull every 15s| A3
    P -->|Pull every 15s| A4

    P --> AM[Alertmanager
Route alerts]
    AM --> S[Slack / Email / PagerDuty]

    P --> G[Grafana
Dashboards + Explore]

    style P fill:#e94560,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff
    style AM fill:#ff9800,stroke:#fff,color:#fff
    style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 1: Prometheus Architecture — Pull metrics from targets, store in TSDB, expose to Grafana and Alertmanager

Advantages of the pull model:

Service discovery: Prometheus auto-discovers new targets (via Kubernetes, Consul, DNS)
Easier debugging: Access /metrics endpoint in browser to see raw metrics
No agent required: Applications just expose an HTTP endpoint, no separate agent needed
Target health: If scrape fails → immediately know the target is down

3. Four Types of Prometheus Metrics

Type	Description	Example	Common PromQL
Counter	Value only increases (resets on restart)	Total requests, total errors	`rate(http_requests_total[5m])`
Gauge	Value goes up and down freely	CPU usage, memory, queue size	`node_memory_MemFree_bytes`
Histogram	Distributes values into buckets	Response time (P50, P95, P99)	`histogram_quantile(0.95, ...)`
Summary	Similar to histogram, quantiles computed client-side	Response time (pre-calculated)	`http_request_duration_seconds{quantile="0.95"}`

Histogram vs Summary

Always prefer Histogram because it allows server-side quantile calculation (aggregatable across instances). Summary computes quantiles on the client → cannot aggregate across multiple instances. Prometheus 3.x also supports Native Histograms with higher precision and more efficient storage.

4. Integrating Prometheus with ASP.NET Core

4.1 Installation

dotnet add package prometheus-net.AspNetCore

// Program.cs
var builder = WebApplication.CreateBuilder(args);

var app = builder.Build();

// Expose /metrics endpoint for Prometheus scraping
app.MapMetrics(); // → http://localhost:5000/metrics

app.MapGet("/api/orders", async (AppDbContext db) =>
{
    return await db.Orders.ToListAsync();
});

app.Run();

4.2 Custom Metrics

public static class AppMetrics
{
    // Counter — count requests by endpoint and status
    public static readonly Counter HttpRequestsTotal = Metrics.CreateCounter(
        "app_http_requests_total",
        "Total HTTP requests processed",
        new CounterConfiguration
        {
            LabelNames = new[] { "method", "endpoint", "status_code" }
        });

    // Histogram — measure response time
    public static readonly Histogram RequestDuration = Metrics.CreateHistogram(
        "app_request_duration_seconds",
        "HTTP request duration in seconds",
        new HistogramConfiguration
        {
            LabelNames = new[] { "method", "endpoint" },
            Buckets = new[] { 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 }
        });

    // Gauge — active connections
    public static readonly Gauge ActiveConnections = Metrics.CreateGauge(
        "app_active_connections",
        "Number of active connections");

    // Gauge — queue size
    public static readonly Gauge QueueSize = Metrics.CreateGauge(
        "app_background_queue_size",
        "Number of items in background processing queue");
}

// Middleware for automatic metrics
public class MetricsMiddleware
{
    private readonly RequestDelegate _next;

    public MetricsMiddleware(RequestDelegate next) => _next = next;

    public async Task InvokeAsync(HttpContext context)
    {
        var path = context.Request.Path.Value ?? "/";
        var method = context.Request.Method;

        AppMetrics.ActiveConnections.Inc();

        using (AppMetrics.RequestDuration
            .WithLabels(method, path)
            .NewTimer())
        {
            await _next(context);
        }

        AppMetrics.HttpRequestsTotal
            .WithLabels(method, path, context.Response.StatusCode.ToString())
            .Inc();

        AppMetrics.ActiveConnections.Dec();
    }
}

4.3 Prometheus Scrape Configuration

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'aspnet-app'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['order-service:5000', 'payment-service:5000']
        labels:
          environment: 'production'

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  # Kubernetes service discovery
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)

5. PromQL — The Metrics Query Language

PromQL (Prometheus Query Language) is a purpose-built language for querying time-series data. Here are the most practical queries:

5.1 Request Rate and Error Rate

# Request rate (requests/second) over last 5 minutes
rate(app_http_requests_total[5m])

# Request rate per endpoint
sum by (endpoint) (rate(app_http_requests_total[5m]))

# Error rate (% of requests returning 5xx)
sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(app_http_requests_total[5m]))
* 100

# Availability (% successful requests)
1 - (
  sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
  /
  sum(rate(app_http_requests_total[5m]))
) * 100

5.2 Latency Percentiles

# P50 (median) response time
histogram_quantile(0.50,
  sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)

# P95 response time
histogram_quantile(0.95,
  sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)

# P99 response time per endpoint
histogram_quantile(0.99,
  sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
)

# Average response time
sum(rate(app_request_duration_seconds_sum[5m]))
/
sum(rate(app_request_duration_seconds_count[5m]))

5.3 Resource Monitoring

# CPU usage per pod (Kubernetes)
sum by (pod) (
  rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
) * 100

# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes * 100

# Disk usage percentage
(node_filesystem_size_bytes - node_filesystem_avail_bytes)
/ node_filesystem_size_bytes * 100

PromQL Golden Rule: rate() first, aggregate second

Always compute rate() BEFORE sum(). If you do it backwards (sum before rate), results will be incorrect because counter resets between instances get "swallowed" by the aggregation. This is the most common PromQL mistake.

6. Alerting — Smart Notifications

6.1 Alert Rules

# alert-rules.yml
groups:
  - name: app-alerts
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(app_http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Error rate > 5% for 5 minutes"
          description: "Current error rate: {{ $value | humanizePercentage }}"

      # High latency
      - alert: HighLatencyP95
        expr: |
          histogram_quantile(0.95,
            sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency > 2 seconds"
          description: "Current P95: {{ $value | humanizeDuration }}"

      # Pod down
      - alert: TargetDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Target {{ $labels.job }}/{{ $labels.instance }} is down"

      # Memory pressure
      - alert: HighMemoryUsage
        expr: |
          (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
          / node_memory_MemTotal_bytes > 0.9
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Memory usage > 90%"

      # Disk almost full
      - alert: DiskSpaceLow
        expr: |
          (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
        for: 15m
        labels:
          severity: critical
        annotations:
          summary: "Disk space < 10% on {{ $labels.mountpoint }}"

6.2 Alertmanager — Route and Deduplicate Alerts

# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
      group_wait: 0s
      repeat_interval: 5m

    - match:
        severity: warning
      receiver: 'slack-warnings'
      repeat_interval: 4h

receivers:
  - name: 'default'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts'

  - name: 'pagerduty'
    pagerduty_configs:
      - routing_key: 'xxx'
        severity: '{{ .GroupLabels.severity }}'

  - name: 'slack-warnings'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts-warning'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Alerting Anti-patterns

Avoid alert fatigue: If a team receives >20 alerts/day, most will be ignored. Every alert must be actionable — if you receive an alert and don't need to do anything, remove it. Use for: 5m or longer to prevent flapping (alerts toggling on/off due to temporary spikes).

7. Grafana Dashboards

7.1 RED Method Dashboard

Every service needs a dashboard following the RED method — 3 core metrics:

Metric	Meaning	PromQL
Rate	Requests per second	`sum(rate(app_http_requests_total[5m]))`
Errors	Error percentage	`sum(rate(...{status=~"5.."}[5m])) / sum(rate(...[5m]))`
Duration	Latency percentiles	`histogram_quantile(0.95, sum by (le) (rate(..._bucket[5m])))`

7.2 USE Method for Infrastructure

Every resource (CPU, Memory, Disk, Network) should be measured using the USE method:

Metric	CPU	Memory	Disk
Utilization	% CPU busy	% RAM used	% disk used
Saturation	Load average / cores	Swap usage	I/O queue depth
Errors	CPU throttling events	OOM kills	I/O errors

8. Deploying on Kubernetes

# Install kube-prometheus-stack (Prometheus + Grafana + Alertmanager + Node Exporter)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=securePassword \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

graph TB
    subgraph Kubernetes Cluster
        subgraph monitoring namespace
            P[Prometheus
StatefulSet]
            G[Grafana
Deployment]
            AM[Alertmanager
StatefulSet]
            NE[Node Exporter
DaemonSet]
            KSM[Kube-State-Metrics
Deployment]
        end

        subgraph production namespace
            subgraph Pod
                APP[ASP.NET Core App]
                APP -->|/metrics| P
            end
        end

        NE -->|system metrics| P
        KSM -->|k8s state| P
        P -->|alerts| AM
        P -->|data source| G
        AM -->|notify| EXT[Slack / PagerDuty]
    end

    style P fill:#e94560,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff
    style AM fill:#ff9800,stroke:#fff,color:#fff
    style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50

Figure 2: kube-prometheus-stack on Kubernetes — all-in-one monitoring solution

8.1 ServiceMonitor for ASP.NET Core

# Auto-discover and scrape ASP.NET Core apps
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: aspnet-apps
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/monitored: "true"
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

9. Recording Rules — Performance Optimization

When PromQL queries are complex and run frequently (dashboard refreshing every 10s), use recording rules to pre-compute:

# recording-rules.yml
groups:
  - name: app-recording
    interval: 30s
    rules:
      # Pre-compute request rate per endpoint
      - record: app:http_request_rate:5m
        expr: sum by (endpoint) (rate(app_http_requests_total[5m]))

      # Pre-compute error rate
      - record: app:http_error_rate:5m
        expr: |
          sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(app_http_requests_total[5m]))

      # Pre-compute P95 latency
      - record: app:http_latency_p95:5m
        expr: |
          histogram_quantile(0.95,
            sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
          )

      # Pre-compute P99 latency per endpoint
      - record: app:http_latency_p99_by_endpoint:5m
        expr: |
          histogram_quantile(0.99,
            sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
          )

Naming Convention for Recording Rules

Standard format: level:metric_name:operations. For example app:http_request_rate:5m — app is the aggregation level, http_request_rate is the metric, 5m is the window. Proper naming helps the team immediately understand what a metric is without reading the original PromQL.

10. Production Best Practices

10.1 Metric Naming

Use application prefix: orderservice_requests_total instead of requests_total
Include units in name: _seconds, _bytes, _total (counters)
Never use high-cardinality labels (user_id, request_id) — will cause Prometheus OOM

10.2 Storage and Retention

Local storage: 15-30 days retention is sufficient for most use cases
Long-term storage: Use Thanos or Cortex if you need metrics retention >30 days
Estimate: ~1-2 bytes/sample × samples/s × retention → plan storage accordingly

10.3 High Availability

Run 2 Prometheus instances scraping the same targets → dedup at Thanos/Grafana Cloud
Alertmanager runs in cluster mode (3 instances) to avoid duplicate notifications
Grafana is stateless — easy horizontal scaling, state stored in PostgreSQL

Component	Replicas (Production)	Recommended Resources
Prometheus	2 (HA pair)	2 CPU, 8GB RAM, 50GB SSD
Alertmanager	3 (cluster)	0.5 CPU, 256MB RAM
Grafana	2+	1 CPU, 1GB RAM
Node Exporter	1 per node (DaemonSet)	0.1 CPU, 64MB RAM

Conclusion

Prometheus + Grafana isn't just a monitoring tool — it's the observability foundation for your entire system. Start by exposing /metrics in ASP.NET Core, gradually add custom metrics following the RED method, set up meaningful alerting rules (actionable, not spammy), and build dashboards that help the team detect issues as fast as possible.

With kube-prometheus-stack on Kubernetes, you can have a full monitoring setup in minutes. The hard part isn't installation — it's choosing the right metrics to track and writing alert rules that don't cause alert fatigue.

References

#system design #Kubernetes #Grafana #Prometheus #ASP.NET Core #Monitoring

# Prometheus + Grafana — Building a Production Monitoring Stack

v3.xPrometheus (CNCF Graduated)

v12Grafana — 100+ data sources

PullMetrics collection model

PromQLPowerful query language

## 1. Why Do You Need a Monitoring Stack?

Monitoring isn't "nice-to-have" — it's a **mandatory requirement** for any production system. Without monitoring, you only know something's wrong when customers complain — by then it's too late.

**Prometheus + Grafana** is the world's most popular monitoring combo, used at Uber, Spotify, DigitalOcean, CERN and thousands of other companies. Both are CNCF Graduated projects, completely free and battle-tested in production with millions of time series.

#### Prometheus ≠ Grafana

**Prometheus** collects and stores metrics (time-series database + scraping engine). **Grafana** visualizes metrics into dashboards and manages alerting. The two tools complement each other — they don't replace one another.

## 2. Prometheus Architecture — Pull-Based Model

Unlike most monitoring tools (push-based), Prometheus uses a **pull model**: it actively scrapes metrics from targets (applications, servers) at fixed intervals.

```
graph LR
    subgraph Targets
        A1[ASP.NET Core App  
/metrics endpoint]
        A2[Node Exporter  
Linux system metrics]
        A3[SQL Server Exporter  
DB metrics]
        A4[Redis Exporter  
Cache metrics]
    end

P[Prometheus Server  
Scrape + Store + Query] -->|Pull every 15s| A1
    P -->|Pull every 15s| A2
    P -->|Pull every 15s| A3
    P -->|Pull every 15s| A4

P --> AM[Alertmanager  
Route alerts]
    AM --> S[Slack / Email / PagerDuty]

P --> G[Grafana  
Dashboards + Explore]

style P fill:#e94560,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff
    style AM fill:#ff9800,stroke:#fff,color:#fff
    style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style A4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
  
```
Figure 1: Prometheus Architecture — Pull metrics from targets, store in TSDB, expose to Grafana and Alertmanager

Advantages of the pull model:

- **Service discovery**: Prometheus auto-discovers new targets (via Kubernetes, Consul, DNS)
- **Easier debugging**: Access `/metrics` endpoint in browser to see raw metrics
- **No agent required**: Applications just expose an HTTP endpoint, no separate agent needed
- **Target health**: If scrape fails → immediately know the target is down

## 3. Four Types of Prometheus Metrics

| Type | Description | Example | Common PromQL |
| --- | --- | --- | --- |
| **Counter** | Value only increases (resets on restart) | Total requests, total errors | `rate(http_requests_total[5m])` |
| **Gauge** | Value goes up and down freely | CPU usage, memory, queue size | `node_memory_MemFree_bytes` |
| **Histogram** | Distributes values into buckets | Response time (P50, P95, P99) | `histogram_quantile(0.95, ...)` |
| **Summary** | Similar to histogram, quantiles computed client-side | Response time (pre-calculated) | `http_request_duration_seconds{quantile="0.95"}` |

#### Histogram vs Summary

Always prefer **Histogram** because it allows server-side quantile calculation (aggregatable across instances). Summary computes quantiles on the client → cannot aggregate across multiple instances. Prometheus 3.x also supports **Native Histograms** with higher precision and more efficient storage.

## 4. Integrating Prometheus with ASP.NET Core

### 4.1 Installation

```bash
dotnet add package prometheus-net.AspNetCore
```

```csharp
// Program.cs
var builder = WebApplication.CreateBuilder(args);

var app = builder.Build();

// Expose /metrics endpoint for Prometheus scraping
app.MapMetrics(); // → http://localhost:5000/metrics

app.MapGet("/api/orders", async (AppDbContext db) =>
{
    return await db.Orders.ToListAsync();
});

app.Run();
```

### 4.2 Custom Metrics

```csharp
public static class AppMetrics
{
    // Counter — count requests by endpoint and status
    public static readonly Counter HttpRequestsTotal = Metrics.CreateCounter(
        "app_http_requests_total",
        "Total HTTP requests processed",
        new CounterConfiguration
        {
            LabelNames = new[] { "method", "endpoint", "status_code" }
        });

// Histogram — measure response time
    public static readonly Histogram RequestDuration = Metrics.CreateHistogram(
        "app_request_duration_seconds",
        "HTTP request duration in seconds",
        new HistogramConfiguration
        {
            LabelNames = new[] { "method", "endpoint" },
            Buckets = new[] { 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 }
        });

// Gauge — active connections
    public static readonly Gauge ActiveConnections = Metrics.CreateGauge(
        "app_active_connections",
        "Number of active connections");

// Gauge — queue size
    public static readonly Gauge QueueSize = Metrics.CreateGauge(
        "app_background_queue_size",
        "Number of items in background processing queue");
}

// Middleware for automatic metrics
public class MetricsMiddleware
{
    private readonly RequestDelegate _next;

public MetricsMiddleware(RequestDelegate next) => _next = next;

public async Task InvokeAsync(HttpContext context)
    {
        var path = context.Request.Path.Value ?? "/";
        var method = context.Request.Method;

AppMetrics.ActiveConnections.Inc();

using (AppMetrics.RequestDuration
            .WithLabels(method, path)
            .NewTimer())
        {
            await _next(context);
        }

AppMetrics.HttpRequestsTotal
            .WithLabels(method, path, context.Response.StatusCode.ToString())
            .Inc();

AppMetrics.ActiveConnections.Dec();
    }
}
```

### 4.3 Prometheus Scrape Configuration

```yaml
# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'aspnet-app'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['order-service:5000', 'payment-service:5000']
        labels:
          environment: 'production'

- job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

# Kubernetes service discovery
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
```

## 5. PromQL — The Metrics Query Language

PromQL (Prometheus Query Language) is a purpose-built language for querying time-series data. Here are the most practical queries:

### 5.1 Request Rate and Error Rate

```promql
# Request rate (requests/second) over last 5 minutes
rate(app_http_requests_total[5m])

# Request rate per endpoint
sum by (endpoint) (rate(app_http_requests_total[5m]))

# Error rate (% of requests returning 5xx)
sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(app_http_requests_total[5m]))
* 100

# Availability (% successful requests)
1 - (
  sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
  /
  sum(rate(app_http_requests_total[5m]))
) * 100
```

### 5.2 Latency Percentiles

```promql
# P50 (median) response time
histogram_quantile(0.50,
  sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)

# P95 response time
histogram_quantile(0.95,
  sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)

# P99 response time per endpoint
histogram_quantile(0.99,
  sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
)

# Average response time
sum(rate(app_request_duration_seconds_sum[5m]))
/
sum(rate(app_request_duration_seconds_count[5m]))
```

### 5.3 Resource Monitoring

```promql
# CPU usage per pod (Kubernetes)
sum by (pod) (
  rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
) * 100

# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes * 100

# Disk usage percentage
(node_filesystem_size_bytes - node_filesystem_avail_bytes)
/ node_filesystem_size_bytes * 100
```

#### PromQL Golden Rule: rate() first, aggregate second

Always compute `rate()` BEFORE `sum()`. If you do it backwards (`sum` before `rate`), results will be incorrect because counter resets between instances get "swallowed" by the aggregation. This is the most common PromQL mistake.

## 6. Alerting — Smart Notifications

### 6.1 Alert Rules

```yaml
# alert-rules.yml
groups:
  - name: app-alerts
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(app_http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Error rate > 5% for 5 minutes"
          description: "Current error rate: {{ $value | humanizePercentage }}"

# High latency
      - alert: HighLatencyP95
        expr: |
          histogram_quantile(0.95,
            sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
          ) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "P95 latency > 2 seconds"
          description: "Current P95: {{ $value | humanizeDuration }}"

# Pod down
      - alert: TargetDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Target {{ $labels.job }}/{{ $labels.instance }} is down"

# Memory pressure
      - alert: HighMemoryUsage
        expr: |
          (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
          / node_memory_MemTotal_bytes > 0.9
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Memory usage > 90%"

# Disk almost full
      - alert: DiskSpaceLow
        expr: |
          (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
        for: 15m
        labels:
          severity: critical
        annotations:
          summary: "Disk space < 10% on {{ $labels.mountpoint }}"
```

### 6.2 Alertmanager — Route and Deduplicate Alerts

```yaml
# alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'default'

routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
      group_wait: 0s
      repeat_interval: 5m

- match:
        severity: warning
      receiver: 'slack-warnings'
      repeat_interval: 4h

receivers:
  - name: 'default'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts'

- name: 'pagerduty'
    pagerduty_configs:
      - routing_key: 'xxx'
        severity: '{{ .GroupLabels.severity }}'

- name: 'slack-warnings'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/xxx'
        channel: '#alerts-warning'

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']
```

#### Alerting Anti-patterns

**Avoid alert fatigue**: If a team receives >20 alerts/day, most will be ignored. Every alert must be *actionable* — if you receive an alert and don't need to do anything, remove it. Use `for: 5m` or longer to prevent flapping (alerts toggling on/off due to temporary spikes).

## 7. Grafana Dashboards

### 7.1 RED Method Dashboard

Every service needs a dashboard following the **RED method** — 3 core metrics:

| Metric | Meaning | PromQL |
| --- | --- | --- |
| **R**ate | Requests per second | `sum(rate(app_http_requests_total[5m]))` |
| **E**rrors | Error percentage | `sum(rate(...{status=~"5.."}[5m])) / sum(rate(...[5m]))` |
| **D**uration | Latency percentiles | `histogram_quantile(0.95, sum by (le) (rate(..._bucket[5m])))` |

### 7.2 USE Method for Infrastructure

Every resource (CPU, Memory, Disk, Network) should be measured using the **USE method**:

| Metric | CPU | Memory | Disk |
| --- | --- | --- | --- |
| **U**tilization | % CPU busy | % RAM used | % disk used |
| **S**aturation | Load average / cores | Swap usage | I/O queue depth |
| **E**rrors | CPU throttling events | OOM kills | I/O errors |

## 8. Deploying on Kubernetes

```bash
# Install kube-prometheus-stack (Prometheus + Grafana + Alertmanager + Node Exporter)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=securePassword \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
```

```
graph TB
    subgraph Kubernetes Cluster
        subgraph monitoring namespace
            P[Prometheus  
StatefulSet]
            G[Grafana  
Deployment]
            AM[Alertmanager  
StatefulSet]
            NE[Node Exporter  
DaemonSet]
            KSM[Kube-State-Metrics  
Deployment]
        end

subgraph production namespace
            subgraph Pod
                APP[ASP.NET Core App]
                APP -->|/metrics| P
            end
        end

style P fill:#e94560,stroke:#fff,color:#fff
    style G fill:#2c3e50,stroke:#fff,color:#fff
    style AM fill:#ff9800,stroke:#fff,color:#fff
    style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50
  
```
Figure 2: kube-prometheus-stack on Kubernetes — all-in-one monitoring solution

### 8.1 ServiceMonitor for ASP.NET Core

```yaml
# Auto-discover and scrape ASP.NET Core apps
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: aspnet-apps
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/monitored: "true"
  namespaceSelector:
    matchNames:
      - production
  endpoints:
    - port: http
      path: /metrics
      interval: 15s
```

## 9. Recording Rules — Performance Optimization

When PromQL queries are complex and run frequently (dashboard refreshing every 10s), use **recording rules** to pre-compute:

```yaml
# recording-rules.yml
groups:
  - name: app-recording
    interval: 30s
    rules:
      # Pre-compute request rate per endpoint
      - record: app:http_request_rate:5m
        expr: sum by (endpoint) (rate(app_http_requests_total[5m]))

# Pre-compute error rate
      - record: app:http_error_rate:5m
        expr: |
          sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(app_http_requests_total[5m]))

# Pre-compute P95 latency
      - record: app:http_latency_p95:5m
        expr: |
          histogram_quantile(0.95,
            sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
          )

# Pre-compute P99 latency per endpoint
      - record: app:http_latency_p99_by_endpoint:5m
        expr: |
          histogram_quantile(0.99,
            sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
          )
```

#### Naming Convention for Recording Rules

Standard format: `level:metric_name:operations`. For example `app:http_request_rate:5m` — *app* is the aggregation level, *http_request_rate* is the metric, *5m* is the window. Proper naming helps the team immediately understand what a metric is without reading the original PromQL.

## 10. Production Best Practices

### 10.1 Metric Naming

- Use application prefix: `orderservice_requests_total` instead of `requests_total`
- Include units in name: `_seconds`, `_bytes`, `_total` (counters)
- Never use high-cardinality labels (user_id, request_id) — will cause Prometheus OOM

### 10.2 Storage and Retention

- **Local storage**: 15-30 days retention is sufficient for most use cases
- **Long-term storage**: Use Thanos or Cortex if you need metrics retention >30 days
- Estimate: ~1-2 bytes/sample × samples/s × retention → plan storage accordingly

### 10.3 High Availability

- Run 2 Prometheus instances scraping the same targets → dedup at Thanos/Grafana Cloud
- Alertmanager runs in cluster mode (3 instances) to avoid duplicate notifications
- Grafana is stateless — easy horizontal scaling, state stored in PostgreSQL

| Component | Replicas (Production) | Recommended Resources |
| --- | --- | --- |
| Prometheus | 2 (HA pair) | 2 CPU, 8GB RAM, 50GB SSD |
| Alertmanager | 3 (cluster) | 0.5 CPU, 256MB RAM |
| Grafana | 2+ | 1 CPU, 1GB RAM |
| Node Exporter | 1 per node (DaemonSet) | 0.1 CPU, 64MB RAM |

## Conclusion

Prometheus + Grafana isn't just a monitoring tool — it's the **observability foundation** for your entire system. Start by exposing `/metrics` in ASP.NET Core, gradually add custom metrics following the RED method, set up meaningful alerting rules (actionable, not spammy), and build dashboards that help the team detect issues as fast as possible.

With `kube-prometheus-stack` on Kubernetes, you can have a full monitoring setup in minutes. The hard part isn't installation — it's choosing the right metrics to track and writing alert rules that don't cause alert fatigue.

## References

- [Prometheus Documentation — Overview](https://prometheus.io/docs/introduction/overview/)
- [Grafana Documentation](https://grafana.com/docs/grafana/latest/)
- [Infrastructure Monitoring with Prometheus and Grafana 2026 — Hostperl](https://hostperl.com/blog/infrastructure-monitoring-prometheus-grafana-production-observability-2026)
- [kube-prometheus-stack Helm Chart — GitHub](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack)
- [Grafana & Prometheus Complete Guide 2026 — AiCybr](https://aicybr.com/blog/grafana-prometheus-complete-guide)

n8n — Open-Source AI Workflow Automation Platform for Developers

htmx — Building Dynamic Web Apps Without JavaScript Frameworks

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.