Load Testing for Distributed Systems — k6, NBomber and Performance Testing Strategies
Posted on: 4/25/2026 1:14:20 PM
Table of contents
- 1. Why Load Testing Matters for Distributed Systems
- 2. Types of Performance Testing
- 3. Grafana k6 — Modern Load Testing with JavaScript/TypeScript
- 4. NBomber — Native .NET Load Testing for C# Developers
- 5. k6 vs NBomber Comparison
- 6. Load Testing Strategies for Microservices
- 7. Integrating Load Testing into CI/CD Pipelines
- 8. Analyzing Results and Debugging Bottlenecks
- 9. Best Practices for Production Load Testing
- 10. Conclusion
- References
You just deployed a new microservice — everything runs smoothly on staging with response times under 200ms and zero errors. But when production hits 10,000 concurrent users, the system collapses within 3 minutes. Load testing is the defense layer that helps you discover these weaknesses before they become production incidents.
This article dives deep into performance testing strategies for distributed systems, comparing two leading tools — Grafana k6 (JavaScript/TypeScript) and NBomber (.NET/C#) — along with practical patterns for integrating load testing into your CI/CD pipeline.
1. Why Load Testing Matters for Distributed Systems
In monolithic architecture, bottlenecks typically reside at a single point — the database or application server. But with distributed systems, things get much more complex: one slow service can cause cascading failures across the entire call chain, and connection pool exhaustion at one node can affect dozens of downstream services.
graph LR
A[Client
10K req/s] --> B[API Gateway]
B --> C[Auth Service]
B --> D[Product Service]
B --> E[Order Service]
D --> F[(Database)]
E --> F
E --> G[Payment Service]
G --> H[External Payment API]
style A fill:#e94560,stroke:#fff,color:#fff
style B fill:#2c3e50,stroke:#fff,color:#fff
style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style F fill:#16213e,stroke:#fff,color:#fff
style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style H fill:#ff9800,stroke:#fff,color:#fff
Typical microservices architecture — every node can be a bottleneck under high load
Load testing for distributed systems needs to answer these questions:
- Maximum throughput — How many requests/second can the system handle before response time exceeds the acceptable threshold?
- Breaking point — At what load level does the system start returning errors or timeouts?
- Cascading failure — When one service is overloaded, how far does the impact spread?
- Resource saturation — Which service exhausts CPU, memory, or connection pool first?
- Recovery time — After load decreases, how long does it take the system to return to normal state?
2. Types of Performance Testing
Each test type serves a different purpose. Understanding each one helps you design a test suite that matches your system's SLA requirements.
graph TB
subgraph Types["Performance Testing Types"]
A["🔥 Smoke Test
Basic validation
1-5 VUs"]
B["📊 Load Test
Average load
Target VUs"]
C["💪 Stress Test
Beyond limits
2-3x Target"]
D["⚡ Spike Test
Sudden surge
0 → Max → 0"]
E["🕐 Soak Test
Long-running
4-24 hours"]
F["🎯 Breakpoint Test
Find limits
Continuous ramp"]
end
style A fill:#4CAF50,stroke:#fff,color:#fff
style B fill:#2196F3,stroke:#fff,color:#fff
style C fill:#ff9800,stroke:#fff,color:#fff
style D fill:#e94560,stroke:#fff,color:#fff
style E fill:#9C27B0,stroke:#fff,color:#fff
style F fill:#16213e,stroke:#fff,color:#fff
6 main performance testing types — from basic validation to finding system breaking points
| Test Type | Objective | Load Model | Duration | When to Use |
|---|---|---|---|---|
| Smoke Test | Verify script correctness | 1-5 constant VUs | 1-3 min | After every test script change |
| Load Test | Evaluate normal performance | Ramp up → steady → ramp down | 10-30 min | Every sprint/release |
| Stress Test | Find system limits | 2-3x normal load | 15-30 min | Before major launches |
| Spike Test | Test sudden traffic surges | 0 → peak → 0 in seconds | 5-10 min | Flash sales, viral events |
| Soak Test | Detect memory/connection leaks | Sustained average load | 4-24 hours | Before production releases |
| Breakpoint Test | Determine max capacity | Continuous ramp until failure | Variable | Capacity planning |
Recommended Testing Order
Always start with a Smoke Test to ensure the script works correctly, then run a Load Test to establish a baseline, before moving to Stress/Spike/Soak for advanced scenarios. Don't jump straight to stress testing — you'll waste time debugging your test script instead of debugging the system.
3. Grafana k6 — Modern Load Testing with JavaScript/TypeScript
3.1 Why Choose k6?
k6 is an open-source load testing tool from Grafana Labs, written in Go but allowing scripting in JavaScript/TypeScript. Its biggest strength is the high-performance engine — using 70% less CPU than similar tools, allowing thousands of virtual users on a single machine.
Since version 1.0, k6 supports native TypeScript — you get type safety and IDE autocomplete without a separate build step.
3.2 Writing Your First Test Script
// load-test.ts — k6 load test for an API endpoint
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('errors');
const responseTime = new Trend('response_time_ms');
// Test configuration
export const options = {
scenarios: {
average_load: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 100 }, // Ramp up to 100 VUs
{ duration: '5m', target: 100 }, // Stay at 100 VUs
{ duration: '2m', target: 0 }, // Ramp down
],
},
},
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'],
errors: ['rate<0.01'], // Error rate < 1%
response_time_ms: ['p(95)<400'],
},
};
export default function () {
const res = http.get('https://api.example.com/products', {
headers: { 'Authorization': `Bearer ${__ENV.API_TOKEN}` },
tags: { name: 'GetProducts' },
});
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
'body has products': (r) => JSON.parse(r.body).length > 0,
});
errorRate.add(res.status !== 200);
responseTime.add(res.timings.duration);
sleep(1); // Think time between requests
}
3.3 Advanced Scenarios and Executors
k6 provides multiple executor types for different load models:
export const options = {
scenarios: {
// Scenario 1: Constant arrival rate — control RPS
constant_rps: {
executor: 'constant-arrival-rate',
rate: 200, // 200 iterations/second
timeUnit: '1s',
duration: '5m',
preAllocatedVUs: 50,
maxVUs: 200,
},
// Scenario 2: Spike test
spike: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '10s', target: 0 },
{ duration: '10s', target: 500 }, // Spike to 500
{ duration: '30s', target: 500 },
{ duration: '10s', target: 0 }, // Drop to 0
],
startTime: '6m', // Start after scenario 1
},
},
};
Constant VUs vs Constant Arrival Rate
Constant VUs (ramping-vus): each VU completes one iteration then starts another. RPS depends on response time — slower server → lower RPS. Use when simulating concurrent users.
Constant Arrival Rate: guarantees exactly N iterations/second regardless of response time. k6 spawns additional VUs as needed. Use when measuring the system under fixed throughput — ideal for SLA testing.
3.4 Browser Testing with k6
k6 integrates a Playwright-compatible browser API, allowing hybrid tests — combining protocol-level (HTTP) and browser-level testing in the same scenario:
import { browser } from 'k6/browser';
import http from 'k6/http';
export const options = {
scenarios: {
browser_test: {
executor: 'constant-vus',
vus: 5,
duration: '3m',
options: { browser: { type: 'chromium' } },
},
api_test: {
executor: 'constant-arrival-rate',
rate: 100,
timeUnit: '1s',
duration: '3m',
preAllocatedVUs: 20,
},
},
};
export async function browser_test() {
const page = await browser.newPage();
await page.goto('https://app.example.com/dashboard');
// Measure real Web Vitals
const lcp = await page.evaluate(() => {
return new Promise(resolve => {
new PerformanceObserver(list => {
const entries = list.getEntries();
resolve(entries[entries.length - 1].startTime);
}).observe({ type: 'largest-contentful-paint', buffered: true });
});
});
console.log(`LCP: ${lcp}ms`);
await page.close();
}
4. NBomber — Native .NET Load Testing for C# Developers
4.1 When to Choose NBomber
If your team primarily uses .NET and wants to write load tests in C#/F# — leveraging IDE support, debugging, and NuGet packages — NBomber is the ideal choice. NBomber works as a .NET library, installed via NuGet, and test scenarios run like regular unit tests.
4.2 Writing Scenarios with NBomber
using NBomber.CSharp;
using NBomber.Http.CSharp;
var httpClient = new HttpClient();
var scenario = Scenario.Create("get_products", async context =>
{
var request = Http.CreateRequest("GET", "https://api.example.com/products")
.WithHeader("Authorization", "Bearer " + Environment.GetEnvironmentVariable("API_TOKEN"));
var response = await Http.Send(httpClient, request);
return response;
})
.WithLoadSimulations(
Simulation.RampingInject(rate: 100, interval: TimeSpan.FromSeconds(1),
during: TimeSpan.FromMinutes(2)),
Simulation.Inject(rate: 100, interval: TimeSpan.FromSeconds(1),
during: TimeSpan.FromMinutes(5)),
Simulation.RampingInject(rate: 0, interval: TimeSpan.FromSeconds(1),
during: TimeSpan.FromMinutes(2))
);
NBomberRunner
.RegisterScenarios(scenario)
.WithReportFormats(ReportFormat.Html, ReportFormat.Csv)
.WithReportFolder("./reports")
.Run();
4.3 Multi-Protocol Testing
NBomber's strength lies in testing any protocol — HTTP, gRPC, WebSocket, database, message queue — within the same scenario:
var httpStep = Scenario.Create("mixed_workload", async context =>
{
// Step 1: Call REST API
var apiResponse = await Http.Send(httpClient,
Http.CreateRequest("GET", "https://api.example.com/products"));
if (apiResponse.StatusCode != "200")
return Response.Fail();
// Step 2: Call gRPC service
var grpcResponse = await grpcClient.GetProductDetailsAsync(
new ProductRequest { Id = context.ScenarioInfo.ThreadNumber });
// Step 3: Publish message to RabbitMQ
channel.BasicPublish(exchange: "", routingKey: "orders",
body: Encoding.UTF8.GetBytes($"order-{context.InvocationNumber}"));
return Response.Ok(statusCode: "200",
sizeBytes: apiResponse.SizeBytes + grpcResponse.CalculateSize());
})
.WithLoadSimulations(
Simulation.KeepConstant(copies: 50, during: TimeSpan.FromMinutes(10))
);
5. k6 vs NBomber Comparison
| Criteria | Grafana k6 | NBomber |
|---|---|---|
| Language | JavaScript / TypeScript | C# / F# |
| Engine | Go (goroutines) | .NET (async/await) |
| Installation | Standalone binary | NuGet package |
| IDE Support | VS Code + extensions | Visual Studio / Rider (full debug) |
| Browser Testing | Yes (built-in Chromium) | Yes (via Playwright NuGet) |
| Protocols | HTTP, WebSocket, gRPC (extension) | Any (HTTP, gRPC, WS, DB, MQ...) |
| Distributed Testing | k6 Cloud or k6-operator (K8s) | NBomber Cluster |
| Reporting | Grafana dashboards, JSON, CSV | HTML, CSV, TXT, Markdown |
| CI/CD | Native (exit code based on thresholds) | xUnit/NUnit runner, threshold assertions |
| Pricing | OSS free, Cloud paid | Free (personal), $99/month (business) |
| Best For | Multi-language teams, DevOps-driven | .NET teams, developer-driven testing |
Practical Recommendation
If your team primarily uses .NET and wants load tests running as unit tests in the CI pipeline → choose NBomber. If your team is multi-language or prefers fast TypeScript scripting → choose k6. Many large organizations use both: k6 for quick API-level testing in CI, NBomber for complex integration testing that requires debugging.
6. Load Testing Strategies for Microservices
6.1 Layered Testing Model
graph TB
subgraph L1["Layer 1 — Component Testing"]
A["Test individual
services"]
B["Mock dependencies"]
C["Measure baseline
response time"]
end
subgraph L2["Layer 2 — Integration Testing"]
D["Test chains of
2-3 services"]
E["Real dependencies"]
F["Measure end-to-end
latency"]
end
subgraph L3["Layer 3 — System Testing"]
G["Test entire
system"]
H["Production-like
traffic"]
I["Measure throughput
& error rate"]
end
L1 --> L2 --> L3
style A fill:#4CAF50,stroke:#fff,color:#fff
style B fill:#4CAF50,stroke:#fff,color:#fff
style C fill:#4CAF50,stroke:#fff,color:#fff
style D fill:#2196F3,stroke:#fff,color:#fff
style E fill:#2196F3,stroke:#fff,color:#fff
style F fill:#2196F3,stroke:#fff,color:#fff
style G fill:#e94560,stroke:#fff,color:#fff
style H fill:#e94560,stroke:#fff,color:#fff
style I fill:#e94560,stroke:#fff,color:#fff
3-layer performance testing model for microservices
6.2 Designing Realistic Test Scenarios
The most common mistake in load testing is creating traffic patterns that don't resemble reality. Production traffic is rarely uniform — there are typically hot paths (20% of endpoints receiving 80% of traffic) and cold paths.
// k6: Simulating realistic traffic patterns
export const options = {
scenarios: {
// 70% traffic: Browse products (reads)
browse: {
executor: 'constant-arrival-rate',
rate: 700,
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 100,
exec: 'browseProducts',
},
// 20% traffic: Search (read + compute)
search: {
executor: 'constant-arrival-rate',
rate: 200,
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 50,
exec: 'searchProducts',
},
// 10% traffic: Checkout (writes)
checkout: {
executor: 'constant-arrival-rate',
rate: 100,
timeUnit: '1s',
duration: '10m',
preAllocatedVUs: 30,
exec: 'checkout',
},
},
};
export function browseProducts() {
const categoryId = Math.floor(Math.random() * 20) + 1;
http.get(`${BASE_URL}/products?category=${categoryId}`);
sleep(Math.random() * 3 + 1); // Think time 1-4s
}
export function searchProducts() {
const terms = ['laptop', 'phone', 'tablet', 'headphone', 'camera'];
const q = terms[Math.floor(Math.random() * terms.length)];
http.get(`${BASE_URL}/search?q=${q}`);
sleep(Math.random() * 2 + 0.5);
}
export function checkout() {
const payload = JSON.stringify({
productId: Math.floor(Math.random() * 1000) + 1,
quantity: Math.floor(Math.random() * 3) + 1,
});
http.post(`${BASE_URL}/orders`, payload, {
headers: { 'Content-Type': 'application/json' },
});
sleep(Math.random() * 5 + 2); // Longer think time for checkout
}
6.3 Thresholds and SLA Validation
Thresholds transform load tests from "reports" into "quality gates" — if metrics exceed the threshold, the test fails and the CI pipeline stops.
export const options = {
thresholds: {
// Global thresholds
http_req_duration: [
'p(50)<200', // 50th percentile < 200ms
'p(95)<500', // 95th percentile < 500ms
'p(99)<1000', // 99th percentile < 1s
],
http_req_failed: ['rate<0.01'], // <1% errors
// Per-endpoint thresholds
'http_req_duration{name:GetProducts}': ['p(95)<300'],
'http_req_duration{name:Checkout}': ['p(95)<800'],
'http_req_duration{name:Search}': ['p(95)<600'],
// Custom metrics
'errors{scenario:checkout}': ['rate<0.005'], // Checkout: <0.5% errors
},
};
7. Integrating Load Testing into CI/CD Pipelines
graph LR
A[Code Push] --> B[Build & Unit Test]
B --> C[Deploy to Staging]
C --> D[Smoke Test
k6 / NBomber]
D -->|Pass| E[Load Test
10 min]
D -->|Fail| F[Alert & Stop]
E -->|Thresholds OK| G[Deploy to Prod]
E -->|Thresholds Fail| F
G --> H[Canary Test
5% traffic]
H -->|Healthy| I[Full Rollout]
H -->|Degraded| J[Rollback]
style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
style D fill:#4CAF50,stroke:#fff,color:#fff
style E fill:#2196F3,stroke:#fff,color:#fff
style F fill:#e94560,stroke:#fff,color:#fff
style G fill:#4CAF50,stroke:#fff,color:#fff
style H fill:#ff9800,stroke:#fff,color:#fff
style I fill:#4CAF50,stroke:#fff,color:#fff
style J fill:#e94560,stroke:#fff,color:#fff
Load testing in CI/CD — smoke test gate first, load test gate second
7.1 GitHub Actions + k6
# .github/workflows/load-test.yml
name: Load Test
on:
pull_request:
branches: [main]
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Deploy to staging
run: ./deploy-staging.sh
- name: Install k6
run: |
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
--keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D68
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
| sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6
- name: Run smoke test
run: k6 run --tag testid=smoke tests/smoke.ts
env:
API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}
- name: Run load test
run: k6 run tests/load.ts
env:
API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}
- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: k6-results
path: results/
7.2 .NET CI + NBomber
// LoadTests/ProductApiLoadTest.cs — runs as xUnit test
public class ProductApiLoadTest
{
[Fact]
public void Products_endpoint_handles_100rps()
{
var httpClient = new HttpClient();
var scenario = Scenario.Create("get_products", async context =>
{
var response = await Http.Send(httpClient,
Http.CreateRequest("GET", "https://staging.api.example.com/products"));
return response;
})
.WithLoadSimulations(
Simulation.Inject(rate: 100, interval: TimeSpan.FromSeconds(1),
during: TimeSpan.FromMinutes(5))
);
var stats = NBomberRunner
.RegisterScenarios(scenario)
.Run();
var scnStats = stats.ScenarioStats[0];
// Assert SLA
Assert.True(scnStats.Ok.Latency.Percent95 < 500,
$"P95 latency {scnStats.Ok.Latency.Percent95}ms exceeds 500ms threshold");
Assert.True(scnStats.Fail.Request.Percent < 1,
$"Error rate {scnStats.Fail.Request.Percent}% exceeds 1% threshold");
}
}
8. Analyzing Results and Debugging Bottlenecks
8.1 Critical Metrics to Monitor
| Metric | Meaning | Suggested Threshold | When Exceeded |
|---|---|---|---|
| p50 (median) | Median response time | < 200ms | System-wide slowness |
| p95 | 95% of requests are faster than this | < 500ms | Tail latency impacting UX |
| p99 | 99% of requests are faster than this | < 1000ms | Outliers from GC, cold start, DB locks |
| Error rate | % of failed requests (4xx, 5xx, timeout) | < 1% | Service overloaded or bugs |
| Throughput (RPS) | Actual requests per second | ≥ target SLA | Bottleneck at compute or I/O |
| Active VUs | Virtual users currently active | As planned | Stuck VU = connection leak |
8.2 Bottleneck Detection Patterns
Common Warning Signs
Response time increases linearly with VUs → CPU-bound: the application is serializing processing. Review async code or scale horizontally.
Response time spikes sharply at N VUs → Connection pool exhaustion: the database or downstream service has run out of connections. Check MaxPoolSize and HttpClient lifecycle.
Error rate spikes while response time drops → Circuit breaker is open or requests are being rejected early. Good for system self-protection, but thresholds need tuning.
Memory grows steadily during soak test → Memory leak: typically caused by unsubscribed event handlers, HttpClient created repeatedly, or cache without an eviction policy.
9. Best Practices for Production Load Testing
9.1 Golden Rules
- Test on a production-like environment — Same hardware specs, same data volume, same network topology. Testing on a laptop with 100 DB rows tells you nothing.
- Use realistic data — Create datasets that reflect real distributions: full product catalogs, diverse user profiles, search queries from actual access logs.
- Measure from the client side, not the server — Server metrics show processing response time, but user experience includes network latency, DNS resolution, and TLS handshake.
- Warm up before measuring — JIT compilation (.NET), connection pool initialization, and cache warming all affect results if you start measuring from the first request.
- Run multiple iterations, take averages — Single-run results have high variance. Run at least 3 times with the same config, remove outliers, and report average results.
9.2 Anti-Patterns to Avoid
- Testing without think time — Real users don't fire requests continuously. Add
sleep(1-5s)between requests to simulate reading time. - Hardcoded test data — Calling the same endpoint with the same parameters → 100% cache hits → results don't reflect reality.
- Skipping ramp-up — Firing 10,000 VUs simultaneously creates a thundering herd — that's not load testing, that's a DDoS. Always ramp up gradually.
- Testing only the happy path — Production has 404s, 401s, timeouts, and malformed requests. Test scripts should include error scenarios.
- Not monitoring server-side resources — Load tests only produce output metrics. Combine with APM (Application Performance Monitoring) to know server CPU/memory/disk I/O levels.
10. Conclusion
Load testing isn't a one-time activity before launch that you then forget about. In distributed systems, every new service addition, database schema change, or dependency upgrade can affect overall performance. Integrating load testing into your CI/CD pipeline — starting with a smoke test gate, progressing to a load test gate — is the only way to ensure performance regressions are caught early.
Whether you choose k6 for JavaScript/TypeScript flexibility or NBomber for the power of the .NET ecosystem, the most important thing is to start — a simple smoke test in a CI pipeline is more valuable than a perfect load testing plan that never gets executed.
References
- Grafana k6 Documentation — Official Docs
- NBomber — Distributed load testing framework for .NET
- k6.io — Load testing for engineering teams
- Grafana Cloud k6 — Performance & Load Testing
- k6 GitHub Repository — Open Source
- NBomber GitHub Repository — .NET Load Testing
- Best API Load Testing Tools 2026 — PFLB Comparison
AWS Step Functions: Orchestrating Serverless Workflows for Distributed Systems
Multi-Tier Caching Strategy: From Browser to Database for High-Performance Applications
Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.