Load Testing for Distributed Systems — k6, NBomber and Performance Testing Strategies

Posted on: 4/25/2026 1:14:20 PM

You just deployed a new microservice — everything runs smoothly on staging with response times under 200ms and zero errors. But when production hits 10,000 concurrent users, the system collapses within 3 minutes. Load testing is the defense layer that helps you discover these weaknesses before they become production incidents.

This article dives deep into performance testing strategies for distributed systems, comparing two leading tools — Grafana k6 (JavaScript/TypeScript) and NBomber (.NET/C#) — along with practical patterns for integrating load testing into your CI/CD pipeline.

72.8%Dev teams prioritize AI-powered testing (2025)
70%k6 CPU savings vs alternatives
83%APIs still use REST protocol
340%GraphQL growth at Fortune 500

1. Why Load Testing Matters for Distributed Systems

In monolithic architecture, bottlenecks typically reside at a single point — the database or application server. But with distributed systems, things get much more complex: one slow service can cause cascading failures across the entire call chain, and connection pool exhaustion at one node can affect dozens of downstream services.

graph LR
    A[Client
10K req/s] --> B[API Gateway] B --> C[Auth Service] B --> D[Product Service] B --> E[Order Service] D --> F[(Database)] E --> F E --> G[Payment Service] G --> H[External Payment API] style A fill:#e94560,stroke:#fff,color:#fff style B fill:#2c3e50,stroke:#fff,color:#fff style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style F fill:#16213e,stroke:#fff,color:#fff style G fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style H fill:#ff9800,stroke:#fff,color:#fff

Typical microservices architecture — every node can be a bottleneck under high load

Load testing for distributed systems needs to answer these questions:

  • Maximum throughput — How many requests/second can the system handle before response time exceeds the acceptable threshold?
  • Breaking point — At what load level does the system start returning errors or timeouts?
  • Cascading failure — When one service is overloaded, how far does the impact spread?
  • Resource saturation — Which service exhausts CPU, memory, or connection pool first?
  • Recovery time — After load decreases, how long does it take the system to return to normal state?

2. Types of Performance Testing

Each test type serves a different purpose. Understanding each one helps you design a test suite that matches your system's SLA requirements.

graph TB
    subgraph Types["Performance Testing Types"]
        A["🔥 Smoke Test
Basic validation
1-5 VUs"] B["📊 Load Test
Average load
Target VUs"] C["💪 Stress Test
Beyond limits
2-3x Target"] D["⚡ Spike Test
Sudden surge
0 → Max → 0"] E["🕐 Soak Test
Long-running
4-24 hours"] F["🎯 Breakpoint Test
Find limits
Continuous ramp"] end style A fill:#4CAF50,stroke:#fff,color:#fff style B fill:#2196F3,stroke:#fff,color:#fff style C fill:#ff9800,stroke:#fff,color:#fff style D fill:#e94560,stroke:#fff,color:#fff style E fill:#9C27B0,stroke:#fff,color:#fff style F fill:#16213e,stroke:#fff,color:#fff

6 main performance testing types — from basic validation to finding system breaking points

Test TypeObjectiveLoad ModelDurationWhen to Use
Smoke TestVerify script correctness1-5 constant VUs1-3 minAfter every test script change
Load TestEvaluate normal performanceRamp up → steady → ramp down10-30 minEvery sprint/release
Stress TestFind system limits2-3x normal load15-30 minBefore major launches
Spike TestTest sudden traffic surges0 → peak → 0 in seconds5-10 minFlash sales, viral events
Soak TestDetect memory/connection leaksSustained average load4-24 hoursBefore production releases
Breakpoint TestDetermine max capacityContinuous ramp until failureVariableCapacity planning

Always start with a Smoke Test to ensure the script works correctly, then run a Load Test to establish a baseline, before moving to Stress/Spike/Soak for advanced scenarios. Don't jump straight to stress testing — you'll waste time debugging your test script instead of debugging the system.

3. Grafana k6 — Modern Load Testing with JavaScript/TypeScript

3.1 Why Choose k6?

k6 is an open-source load testing tool from Grafana Labs, written in Go but allowing scripting in JavaScript/TypeScript. Its biggest strength is the high-performance engine — using 70% less CPU than similar tools, allowing thousands of virtual users on a single machine.

Since version 1.0, k6 supports native TypeScript — you get type safety and IDE autocomplete without a separate build step.

3.2 Writing Your First Test Script

// load-test.ts — k6 load test for an API endpoint
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

// Custom metrics
const errorRate = new Rate('errors');
const responseTime = new Trend('response_time_ms');

// Test configuration
export const options = {
  scenarios: {
    average_load: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 100 },  // Ramp up to 100 VUs
        { duration: '5m', target: 100 },  // Stay at 100 VUs
        { duration: '2m', target: 0 },    // Ramp down
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1000'],
    errors: ['rate<0.01'],  // Error rate < 1%
    response_time_ms: ['p(95)<400'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/products', {
    headers: { 'Authorization': `Bearer ${__ENV.API_TOKEN}` },
    tags: { name: 'GetProducts' },
  });

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
    'body has products': (r) => JSON.parse(r.body).length > 0,
  });

  errorRate.add(res.status !== 200);
  responseTime.add(res.timings.duration);

  sleep(1); // Think time between requests
}

3.3 Advanced Scenarios and Executors

k6 provides multiple executor types for different load models:

export const options = {
  scenarios: {
    // Scenario 1: Constant arrival rate — control RPS
    constant_rps: {
      executor: 'constant-arrival-rate',
      rate: 200,           // 200 iterations/second
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 50,
      maxVUs: 200,
    },
    // Scenario 2: Spike test
    spike: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '10s', target: 0 },
        { duration: '10s', target: 500 },  // Spike to 500
        { duration: '30s', target: 500 },
        { duration: '10s', target: 0 },    // Drop to 0
      ],
      startTime: '6m', // Start after scenario 1
    },
  },
};

Constant VUs vs Constant Arrival Rate

Constant VUs (ramping-vus): each VU completes one iteration then starts another. RPS depends on response time — slower server → lower RPS. Use when simulating concurrent users.

Constant Arrival Rate: guarantees exactly N iterations/second regardless of response time. k6 spawns additional VUs as needed. Use when measuring the system under fixed throughput — ideal for SLA testing.

3.4 Browser Testing with k6

k6 integrates a Playwright-compatible browser API, allowing hybrid tests — combining protocol-level (HTTP) and browser-level testing in the same scenario:

import { browser } from 'k6/browser';
import http from 'k6/http';

export const options = {
  scenarios: {
    browser_test: {
      executor: 'constant-vus',
      vus: 5,
      duration: '3m',
      options: { browser: { type: 'chromium' } },
    },
    api_test: {
      executor: 'constant-arrival-rate',
      rate: 100,
      timeUnit: '1s',
      duration: '3m',
      preAllocatedVUs: 20,
    },
  },
};

export async function browser_test() {
  const page = await browser.newPage();
  await page.goto('https://app.example.com/dashboard');

  // Measure real Web Vitals
  const lcp = await page.evaluate(() => {
    return new Promise(resolve => {
      new PerformanceObserver(list => {
        const entries = list.getEntries();
        resolve(entries[entries.length - 1].startTime);
      }).observe({ type: 'largest-contentful-paint', buffered: true });
    });
  });

  console.log(`LCP: ${lcp}ms`);
  await page.close();
}

4. NBomber — Native .NET Load Testing for C# Developers

4.1 When to Choose NBomber

If your team primarily uses .NET and wants to write load tests in C#/F# — leveraging IDE support, debugging, and NuGet packages — NBomber is the ideal choice. NBomber works as a .NET library, installed via NuGet, and test scenarios run like regular unit tests.

4.2 Writing Scenarios with NBomber

using NBomber.CSharp;
using NBomber.Http.CSharp;

var httpClient = new HttpClient();

var scenario = Scenario.Create("get_products", async context =>
{
    var request = Http.CreateRequest("GET", "https://api.example.com/products")
        .WithHeader("Authorization", "Bearer " + Environment.GetEnvironmentVariable("API_TOKEN"));

    var response = await Http.Send(httpClient, request);

    return response;
})
.WithLoadSimulations(
    Simulation.RampingInject(rate: 100, interval: TimeSpan.FromSeconds(1),
                             during: TimeSpan.FromMinutes(2)),
    Simulation.Inject(rate: 100, interval: TimeSpan.FromSeconds(1),
                      during: TimeSpan.FromMinutes(5)),
    Simulation.RampingInject(rate: 0, interval: TimeSpan.FromSeconds(1),
                             during: TimeSpan.FromMinutes(2))
);

NBomberRunner
    .RegisterScenarios(scenario)
    .WithReportFormats(ReportFormat.Html, ReportFormat.Csv)
    .WithReportFolder("./reports")
    .Run();

4.3 Multi-Protocol Testing

NBomber's strength lies in testing any protocol — HTTP, gRPC, WebSocket, database, message queue — within the same scenario:

var httpStep = Scenario.Create("mixed_workload", async context =>
{
    // Step 1: Call REST API
    var apiResponse = await Http.Send(httpClient,
        Http.CreateRequest("GET", "https://api.example.com/products"));

    if (apiResponse.StatusCode != "200")
        return Response.Fail();

    // Step 2: Call gRPC service
    var grpcResponse = await grpcClient.GetProductDetailsAsync(
        new ProductRequest { Id = context.ScenarioInfo.ThreadNumber });

    // Step 3: Publish message to RabbitMQ
    channel.BasicPublish(exchange: "", routingKey: "orders",
        body: Encoding.UTF8.GetBytes($"order-{context.InvocationNumber}"));

    return Response.Ok(statusCode: "200",
        sizeBytes: apiResponse.SizeBytes + grpcResponse.CalculateSize());
})
.WithLoadSimulations(
    Simulation.KeepConstant(copies: 50, during: TimeSpan.FromMinutes(10))
);

5. k6 vs NBomber Comparison

CriteriaGrafana k6NBomber
LanguageJavaScript / TypeScriptC# / F#
EngineGo (goroutines).NET (async/await)
InstallationStandalone binaryNuGet package
IDE SupportVS Code + extensionsVisual Studio / Rider (full debug)
Browser TestingYes (built-in Chromium)Yes (via Playwright NuGet)
ProtocolsHTTP, WebSocket, gRPC (extension)Any (HTTP, gRPC, WS, DB, MQ...)
Distributed Testingk6 Cloud or k6-operator (K8s)NBomber Cluster
ReportingGrafana dashboards, JSON, CSVHTML, CSV, TXT, Markdown
CI/CDNative (exit code based on thresholds)xUnit/NUnit runner, threshold assertions
PricingOSS free, Cloud paidFree (personal), $99/month (business)
Best ForMulti-language teams, DevOps-driven.NET teams, developer-driven testing

Practical Recommendation

If your team primarily uses .NET and wants load tests running as unit tests in the CI pipeline → choose NBomber. If your team is multi-language or prefers fast TypeScript scripting → choose k6. Many large organizations use both: k6 for quick API-level testing in CI, NBomber for complex integration testing that requires debugging.

6. Load Testing Strategies for Microservices

6.1 Layered Testing Model

graph TB
    subgraph L1["Layer 1 — Component Testing"]
        A["Test individual
services"] B["Mock dependencies"] C["Measure baseline
response time"] end subgraph L2["Layer 2 — Integration Testing"] D["Test chains of
2-3 services"] E["Real dependencies"] F["Measure end-to-end
latency"] end subgraph L3["Layer 3 — System Testing"] G["Test entire
system"] H["Production-like
traffic"] I["Measure throughput
& error rate"] end L1 --> L2 --> L3 style A fill:#4CAF50,stroke:#fff,color:#fff style B fill:#4CAF50,stroke:#fff,color:#fff style C fill:#4CAF50,stroke:#fff,color:#fff style D fill:#2196F3,stroke:#fff,color:#fff style E fill:#2196F3,stroke:#fff,color:#fff style F fill:#2196F3,stroke:#fff,color:#fff style G fill:#e94560,stroke:#fff,color:#fff style H fill:#e94560,stroke:#fff,color:#fff style I fill:#e94560,stroke:#fff,color:#fff

3-layer performance testing model for microservices

6.2 Designing Realistic Test Scenarios

The most common mistake in load testing is creating traffic patterns that don't resemble reality. Production traffic is rarely uniform — there are typically hot paths (20% of endpoints receiving 80% of traffic) and cold paths.

// k6: Simulating realistic traffic patterns
export const options = {
  scenarios: {
    // 70% traffic: Browse products (reads)
    browse: {
      executor: 'constant-arrival-rate',
      rate: 700,
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 100,
      exec: 'browseProducts',
    },
    // 20% traffic: Search (read + compute)
    search: {
      executor: 'constant-arrival-rate',
      rate: 200,
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 50,
      exec: 'searchProducts',
    },
    // 10% traffic: Checkout (writes)
    checkout: {
      executor: 'constant-arrival-rate',
      rate: 100,
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 30,
      exec: 'checkout',
    },
  },
};

export function browseProducts() {
  const categoryId = Math.floor(Math.random() * 20) + 1;
  http.get(`${BASE_URL}/products?category=${categoryId}`);
  sleep(Math.random() * 3 + 1); // Think time 1-4s
}

export function searchProducts() {
  const terms = ['laptop', 'phone', 'tablet', 'headphone', 'camera'];
  const q = terms[Math.floor(Math.random() * terms.length)];
  http.get(`${BASE_URL}/search?q=${q}`);
  sleep(Math.random() * 2 + 0.5);
}

export function checkout() {
  const payload = JSON.stringify({
    productId: Math.floor(Math.random() * 1000) + 1,
    quantity: Math.floor(Math.random() * 3) + 1,
  });
  http.post(`${BASE_URL}/orders`, payload, {
    headers: { 'Content-Type': 'application/json' },
  });
  sleep(Math.random() * 5 + 2); // Longer think time for checkout
}

6.3 Thresholds and SLA Validation

Thresholds transform load tests from "reports" into "quality gates" — if metrics exceed the threshold, the test fails and the CI pipeline stops.

export const options = {
  thresholds: {
    // Global thresholds
    http_req_duration: [
      'p(50)<200',   // 50th percentile < 200ms
      'p(95)<500',   // 95th percentile < 500ms
      'p(99)<1000',  // 99th percentile < 1s
    ],
    http_req_failed: ['rate<0.01'],  // <1% errors

    // Per-endpoint thresholds
    'http_req_duration{name:GetProducts}': ['p(95)<300'],
    'http_req_duration{name:Checkout}': ['p(95)<800'],
    'http_req_duration{name:Search}': ['p(95)<600'],

    // Custom metrics
    'errors{scenario:checkout}': ['rate<0.005'], // Checkout: <0.5% errors
  },
};

7. Integrating Load Testing into CI/CD Pipelines

graph LR
    A[Code Push] --> B[Build & Unit Test]
    B --> C[Deploy to Staging]
    C --> D[Smoke Test
k6 / NBomber] D -->|Pass| E[Load Test
10 min] D -->|Fail| F[Alert & Stop] E -->|Thresholds OK| G[Deploy to Prod] E -->|Thresholds Fail| F G --> H[Canary Test
5% traffic] H -->|Healthy| I[Full Rollout] H -->|Degraded| J[Rollback] style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style D fill:#4CAF50,stroke:#fff,color:#fff style E fill:#2196F3,stroke:#fff,color:#fff style F fill:#e94560,stroke:#fff,color:#fff style G fill:#4CAF50,stroke:#fff,color:#fff style H fill:#ff9800,stroke:#fff,color:#fff style I fill:#4CAF50,stroke:#fff,color:#fff style J fill:#e94560,stroke:#fff,color:#fff

Load testing in CI/CD — smoke test gate first, load test gate second

7.1 GitHub Actions + k6

# .github/workflows/load-test.yml
name: Load Test
on:
  pull_request:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Deploy to staging
        run: ./deploy-staging.sh

      - name: Install k6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg \
            --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D68
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" \
            | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update && sudo apt-get install k6

      - name: Run smoke test
        run: k6 run --tag testid=smoke tests/smoke.ts
        env:
          API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}

      - name: Run load test
        run: k6 run tests/load.ts
        env:
          API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}

      - name: Upload results
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: k6-results
          path: results/

7.2 .NET CI + NBomber

// LoadTests/ProductApiLoadTest.cs — runs as xUnit test
public class ProductApiLoadTest
{
    [Fact]
    public void Products_endpoint_handles_100rps()
    {
        var httpClient = new HttpClient();

        var scenario = Scenario.Create("get_products", async context =>
        {
            var response = await Http.Send(httpClient,
                Http.CreateRequest("GET", "https://staging.api.example.com/products"));
            return response;
        })
        .WithLoadSimulations(
            Simulation.Inject(rate: 100, interval: TimeSpan.FromSeconds(1),
                              during: TimeSpan.FromMinutes(5))
        );

        var stats = NBomberRunner
            .RegisterScenarios(scenario)
            .Run();

        var scnStats = stats.ScenarioStats[0];

        // Assert SLA
        Assert.True(scnStats.Ok.Latency.Percent95 < 500,
            $"P95 latency {scnStats.Ok.Latency.Percent95}ms exceeds 500ms threshold");
        Assert.True(scnStats.Fail.Request.Percent < 1,
            $"Error rate {scnStats.Fail.Request.Percent}% exceeds 1% threshold");
    }
}

8. Analyzing Results and Debugging Bottlenecks

8.1 Critical Metrics to Monitor

MetricMeaningSuggested ThresholdWhen Exceeded
p50 (median)Median response time< 200msSystem-wide slowness
p9595% of requests are faster than this< 500msTail latency impacting UX
p9999% of requests are faster than this< 1000msOutliers from GC, cold start, DB locks
Error rate% of failed requests (4xx, 5xx, timeout)< 1%Service overloaded or bugs
Throughput (RPS)Actual requests per second≥ target SLABottleneck at compute or I/O
Active VUsVirtual users currently activeAs plannedStuck VU = connection leak

8.2 Bottleneck Detection Patterns

Common Warning Signs

Response time increases linearly with VUs → CPU-bound: the application is serializing processing. Review async code or scale horizontally.

Response time spikes sharply at N VUs → Connection pool exhaustion: the database or downstream service has run out of connections. Check MaxPoolSize and HttpClient lifecycle.

Error rate spikes while response time drops → Circuit breaker is open or requests are being rejected early. Good for system self-protection, but thresholds need tuning.

Memory grows steadily during soak test → Memory leak: typically caused by unsubscribed event handlers, HttpClient created repeatedly, or cache without an eviction policy.

9. Best Practices for Production Load Testing

9.1 Golden Rules

  1. Test on a production-like environment — Same hardware specs, same data volume, same network topology. Testing on a laptop with 100 DB rows tells you nothing.
  2. Use realistic data — Create datasets that reflect real distributions: full product catalogs, diverse user profiles, search queries from actual access logs.
  3. Measure from the client side, not the server — Server metrics show processing response time, but user experience includes network latency, DNS resolution, and TLS handshake.
  4. Warm up before measuring — JIT compilation (.NET), connection pool initialization, and cache warming all affect results if you start measuring from the first request.
  5. Run multiple iterations, take averages — Single-run results have high variance. Run at least 3 times with the same config, remove outliers, and report average results.

9.2 Anti-Patterns to Avoid

  1. Testing without think time — Real users don't fire requests continuously. Add sleep(1-5s) between requests to simulate reading time.
  2. Hardcoded test data — Calling the same endpoint with the same parameters → 100% cache hits → results don't reflect reality.
  3. Skipping ramp-up — Firing 10,000 VUs simultaneously creates a thundering herd — that's not load testing, that's a DDoS. Always ramp up gradually.
  4. Testing only the happy path — Production has 404s, 401s, timeouts, and malformed requests. Test scripts should include error scenarios.
  5. Not monitoring server-side resources — Load tests only produce output metrics. Combine with APM (Application Performance Monitoring) to know server CPU/memory/disk I/O levels.

10. Conclusion

Load testing isn't a one-time activity before launch that you then forget about. In distributed systems, every new service addition, database schema change, or dependency upgrade can affect overall performance. Integrating load testing into your CI/CD pipeline — starting with a smoke test gate, progressing to a load test gate — is the only way to ensure performance regressions are caught early.

Whether you choose k6 for JavaScript/TypeScript flexibility or NBomber for the power of the .NET ecosystem, the most important thing is to start — a simple smoke test in a CI pipeline is more valuable than a perfect load testing plan that never gets executed.

References