Prometheus + Grafana — Building a Production Monitoring Stack
Posted on: 4/25/2026 4:32:04 PM
Table of contents
- 1. Why Do You Need a Monitoring Stack?
- 2. Prometheus Architecture — Pull-Based Model
- 3. Four Types of Prometheus Metrics
- 4. Integrating Prometheus with ASP.NET Core
- 5. PromQL — The Metrics Query Language
- 6. Alerting — Smart Notifications
- 7. Grafana Dashboards
- 8. Deploying on Kubernetes
- 9. Recording Rules — Performance Optimization
- 10. Production Best Practices
- Conclusion
- References
1. Why Do You Need a Monitoring Stack?
Monitoring isn't "nice-to-have" — it's a mandatory requirement for any production system. Without monitoring, you only know something's wrong when customers complain — by then it's too late.
Prometheus + Grafana is the world's most popular monitoring combo, used at Uber, Spotify, DigitalOcean, CERN and thousands of other companies. Both are CNCF Graduated projects, completely free and battle-tested in production with millions of time series.
Prometheus ≠ Grafana
Prometheus collects and stores metrics (time-series database + scraping engine). Grafana visualizes metrics into dashboards and manages alerting. The two tools complement each other — they don't replace one another.
2. Prometheus Architecture — Pull-Based Model
Unlike most monitoring tools (push-based), Prometheus uses a pull model: it actively scrapes metrics from targets (applications, servers) at fixed intervals.
graph LR
subgraph Targets
A1[ASP.NET Core App
/metrics endpoint]
A2[Node Exporter
Linux system metrics]
A3[SQL Server Exporter
DB metrics]
A4[Redis Exporter
Cache metrics]
end
P[Prometheus Server
Scrape + Store + Query] -->|Pull every 15s| A1
P -->|Pull every 15s| A2
P -->|Pull every 15s| A3
P -->|Pull every 15s| A4
P --> AM[Alertmanager
Route alerts]
AM --> S[Slack / Email / PagerDuty]
P --> G[Grafana
Dashboards + Explore]
style P fill:#e94560,stroke:#fff,color:#fff
style G fill:#2c3e50,stroke:#fff,color:#fff
style AM fill:#ff9800,stroke:#fff,color:#fff
style A1 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style A2 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style A3 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style A4 fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Figure 1: Prometheus Architecture — Pull metrics from targets, store in TSDB, expose to Grafana and Alertmanager
Advantages of the pull model:
- Service discovery: Prometheus auto-discovers new targets (via Kubernetes, Consul, DNS)
- Easier debugging: Access
/metricsendpoint in browser to see raw metrics - No agent required: Applications just expose an HTTP endpoint, no separate agent needed
- Target health: If scrape fails → immediately know the target is down
3. Four Types of Prometheus Metrics
| Type | Description | Example | Common PromQL |
|---|---|---|---|
| Counter | Value only increases (resets on restart) | Total requests, total errors | rate(http_requests_total[5m]) |
| Gauge | Value goes up and down freely | CPU usage, memory, queue size | node_memory_MemFree_bytes |
| Histogram | Distributes values into buckets | Response time (P50, P95, P99) | histogram_quantile(0.95, ...) |
| Summary | Similar to histogram, quantiles computed client-side | Response time (pre-calculated) | http_request_duration_seconds{quantile="0.95"} |
Histogram vs Summary
Always prefer Histogram because it allows server-side quantile calculation (aggregatable across instances). Summary computes quantiles on the client → cannot aggregate across multiple instances. Prometheus 3.x also supports Native Histograms with higher precision and more efficient storage.
4. Integrating Prometheus with ASP.NET Core
4.1 Installation
dotnet add package prometheus-net.AspNetCore
// Program.cs
var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
// Expose /metrics endpoint for Prometheus scraping
app.MapMetrics(); // → http://localhost:5000/metrics
app.MapGet("/api/orders", async (AppDbContext db) =>
{
return await db.Orders.ToListAsync();
});
app.Run();
4.2 Custom Metrics
public static class AppMetrics
{
// Counter — count requests by endpoint and status
public static readonly Counter HttpRequestsTotal = Metrics.CreateCounter(
"app_http_requests_total",
"Total HTTP requests processed",
new CounterConfiguration
{
LabelNames = new[] { "method", "endpoint", "status_code" }
});
// Histogram — measure response time
public static readonly Histogram RequestDuration = Metrics.CreateHistogram(
"app_request_duration_seconds",
"HTTP request duration in seconds",
new HistogramConfiguration
{
LabelNames = new[] { "method", "endpoint" },
Buckets = new[] { 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 }
});
// Gauge — active connections
public static readonly Gauge ActiveConnections = Metrics.CreateGauge(
"app_active_connections",
"Number of active connections");
// Gauge — queue size
public static readonly Gauge QueueSize = Metrics.CreateGauge(
"app_background_queue_size",
"Number of items in background processing queue");
}
// Middleware for automatic metrics
public class MetricsMiddleware
{
private readonly RequestDelegate _next;
public MetricsMiddleware(RequestDelegate next) => _next = next;
public async Task InvokeAsync(HttpContext context)
{
var path = context.Request.Path.Value ?? "/";
var method = context.Request.Method;
AppMetrics.ActiveConnections.Inc();
using (AppMetrics.RequestDuration
.WithLabels(method, path)
.NewTimer())
{
await _next(context);
}
AppMetrics.HttpRequestsTotal
.WithLabels(method, path, context.Response.StatusCode.ToString())
.Inc();
AppMetrics.ActiveConnections.Dec();
}
}
4.3 Prometheus Scrape Configuration
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'aspnet-app'
metrics_path: '/metrics'
static_configs:
- targets: ['order-service:5000', 'payment-service:5000']
labels:
environment: 'production'
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# Kubernetes service discovery
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
5. PromQL — The Metrics Query Language
PromQL (Prometheus Query Language) is a purpose-built language for querying time-series data. Here are the most practical queries:
5.1 Request Rate and Error Rate
# Request rate (requests/second) over last 5 minutes
rate(app_http_requests_total[5m])
# Request rate per endpoint
sum by (endpoint) (rate(app_http_requests_total[5m]))
# Error rate (% of requests returning 5xx)
sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(app_http_requests_total[5m]))
* 100
# Availability (% successful requests)
1 - (
sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
/
sum(rate(app_http_requests_total[5m]))
) * 100
5.2 Latency Percentiles
# P50 (median) response time
histogram_quantile(0.50,
sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)
# P95 response time
histogram_quantile(0.95,
sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)
# P99 response time per endpoint
histogram_quantile(0.99,
sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
)
# Average response time
sum(rate(app_request_duration_seconds_sum[5m]))
/
sum(rate(app_request_duration_seconds_count[5m]))
5.3 Resource Monitoring
# CPU usage per pod (Kubernetes)
sum by (pod) (
rate(container_cpu_usage_seconds_total{namespace="production"}[5m])
) * 100
# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes * 100
# Disk usage percentage
(node_filesystem_size_bytes - node_filesystem_avail_bytes)
/ node_filesystem_size_bytes * 100
PromQL Golden Rule: rate() first, aggregate second
Always compute rate() BEFORE sum(). If you do it backwards (sum before rate), results will be incorrect because counter resets between instances get "swallowed" by the aggregation. This is the most common PromQL mistake.
6. Alerting — Smart Notifications
6.1 Alert Rules
# alert-rules.yml
groups:
- name: app-alerts
rules:
# High error rate
- alert: HighErrorRate
expr: |
sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(app_http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate > 5% for 5 minutes"
description: "Current error rate: {{ $value | humanizePercentage }}"
# High latency
- alert: HighLatencyP95
expr: |
histogram_quantile(0.95,
sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "P95 latency > 2 seconds"
description: "Current P95: {{ $value | humanizeDuration }}"
# Pod down
- alert: TargetDown
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Target {{ $labels.job }}/{{ $labels.instance }} is down"
# Memory pressure
- alert: HighMemoryUsage
expr: |
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Memory usage > 90%"
# Disk almost full
- alert: DiskSpaceLow
expr: |
(node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
for: 15m
labels:
severity: critical
annotations:
summary: "Disk space < 10% on {{ $labels.mountpoint }}"
6.2 Alertmanager — Route and Deduplicate Alerts
# alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'default'
routes:
- match:
severity: critical
receiver: 'pagerduty'
group_wait: 0s
repeat_interval: 5m
- match:
severity: warning
receiver: 'slack-warnings'
repeat_interval: 4h
receivers:
- name: 'default'
slack_configs:
- api_url: 'https://hooks.slack.com/services/xxx'
channel: '#alerts'
- name: 'pagerduty'
pagerduty_configs:
- routing_key: 'xxx'
severity: '{{ .GroupLabels.severity }}'
- name: 'slack-warnings'
slack_configs:
- api_url: 'https://hooks.slack.com/services/xxx'
channel: '#alerts-warning'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
Alerting Anti-patterns
Avoid alert fatigue: If a team receives >20 alerts/day, most will be ignored. Every alert must be actionable — if you receive an alert and don't need to do anything, remove it. Use for: 5m or longer to prevent flapping (alerts toggling on/off due to temporary spikes).
7. Grafana Dashboards
7.1 RED Method Dashboard
Every service needs a dashboard following the RED method — 3 core metrics:
| Metric | Meaning | PromQL |
|---|---|---|
| Rate | Requests per second | sum(rate(app_http_requests_total[5m])) |
| Errors | Error percentage | sum(rate(...{status=~"5.."}[5m])) / sum(rate(...[5m])) |
| Duration | Latency percentiles | histogram_quantile(0.95, sum by (le) (rate(..._bucket[5m]))) |
7.2 USE Method for Infrastructure
Every resource (CPU, Memory, Disk, Network) should be measured using the USE method:
| Metric | CPU | Memory | Disk |
|---|---|---|---|
| Utilization | % CPU busy | % RAM used | % disk used |
| Saturation | Load average / cores | Swap usage | I/O queue depth |
| Errors | CPU throttling events | OOM kills | I/O errors |
8. Deploying on Kubernetes
# Install kube-prometheus-stack (Prometheus + Grafana + Alertmanager + Node Exporter)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=securePassword \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
graph TB
subgraph Kubernetes Cluster
subgraph monitoring namespace
P[Prometheus
StatefulSet]
G[Grafana
Deployment]
AM[Alertmanager
StatefulSet]
NE[Node Exporter
DaemonSet]
KSM[Kube-State-Metrics
Deployment]
end
subgraph production namespace
subgraph Pod
APP[ASP.NET Core App]
APP -->|/metrics| P
end
end
NE -->|system metrics| P
KSM -->|k8s state| P
P -->|alerts| AM
P -->|data source| G
AM -->|notify| EXT[Slack / PagerDuty]
end
style P fill:#e94560,stroke:#fff,color:#fff
style G fill:#2c3e50,stroke:#fff,color:#fff
style AM fill:#ff9800,stroke:#fff,color:#fff
style APP fill:#f8f9fa,stroke:#e94560,color:#2c3e50
Figure 2: kube-prometheus-stack on Kubernetes — all-in-one monitoring solution
8.1 ServiceMonitor for ASP.NET Core
# Auto-discover and scrape ASP.NET Core apps
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: aspnet-apps
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/monitored: "true"
namespaceSelector:
matchNames:
- production
endpoints:
- port: http
path: /metrics
interval: 15s
9. Recording Rules — Performance Optimization
When PromQL queries are complex and run frequently (dashboard refreshing every 10s), use recording rules to pre-compute:
# recording-rules.yml
groups:
- name: app-recording
interval: 30s
rules:
# Pre-compute request rate per endpoint
- record: app:http_request_rate:5m
expr: sum by (endpoint) (rate(app_http_requests_total[5m]))
# Pre-compute error rate
- record: app:http_error_rate:5m
expr: |
sum(rate(app_http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(app_http_requests_total[5m]))
# Pre-compute P95 latency
- record: app:http_latency_p95:5m
expr: |
histogram_quantile(0.95,
sum by (le) (rate(app_request_duration_seconds_bucket[5m]))
)
# Pre-compute P99 latency per endpoint
- record: app:http_latency_p99_by_endpoint:5m
expr: |
histogram_quantile(0.99,
sum by (le, endpoint) (rate(app_request_duration_seconds_bucket[5m]))
)
Naming Convention for Recording Rules
Standard format: level:metric_name:operations. For example app:http_request_rate:5m — app is the aggregation level, http_request_rate is the metric, 5m is the window. Proper naming helps the team immediately understand what a metric is without reading the original PromQL.
10. Production Best Practices
10.1 Metric Naming
- Use application prefix:
orderservice_requests_totalinstead ofrequests_total - Include units in name:
_seconds,_bytes,_total(counters) - Never use high-cardinality labels (user_id, request_id) — will cause Prometheus OOM
10.2 Storage and Retention
- Local storage: 15-30 days retention is sufficient for most use cases
- Long-term storage: Use Thanos or Cortex if you need metrics retention >30 days
- Estimate: ~1-2 bytes/sample × samples/s × retention → plan storage accordingly
10.3 High Availability
- Run 2 Prometheus instances scraping the same targets → dedup at Thanos/Grafana Cloud
- Alertmanager runs in cluster mode (3 instances) to avoid duplicate notifications
- Grafana is stateless — easy horizontal scaling, state stored in PostgreSQL
| Component | Replicas (Production) | Recommended Resources |
|---|---|---|
| Prometheus | 2 (HA pair) | 2 CPU, 8GB RAM, 50GB SSD |
| Alertmanager | 3 (cluster) | 0.5 CPU, 256MB RAM |
| Grafana | 2+ | 1 CPU, 1GB RAM |
| Node Exporter | 1 per node (DaemonSet) | 0.1 CPU, 64MB RAM |
Conclusion
Prometheus + Grafana isn't just a monitoring tool — it's the observability foundation for your entire system. Start by exposing /metrics in ASP.NET Core, gradually add custom metrics following the RED method, set up meaningful alerting rules (actionable, not spammy), and build dashboards that help the team detect issues as fast as possible.
With kube-prometheus-stack on Kubernetes, you can have a full monitoring setup in minutes. The hard part isn't installation — it's choosing the right metrics to track and writing alert rules that don't cause alert fatigue.
References
n8n — Open-Source AI Workflow Automation Platform for Developers
htmx — Building Dynamic Web Apps Without JavaScript Frameworks
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.