Skip to main content
Jun 12, 2026load test reports

By Performate

How to Read Load Test Reports and Turn Results into Action

Decode k6 summary metrics—latency, throughput, checks, thresholds—and translate patterns into engineering actions stakeholders understand.

The deck shows green p95 charts—but nobody wrote down target VUs, dropped iterations, or which route owned the tail. Leadership approves the release; production still hurts on /cart/checkout because the report answered “was latency high?” instead of “did we simulate peak checkout, and where did it fail?”

A useful report answers three questions: Did we simulate the traffic we intended? Did user-visible behavior stay acceptable? If not, what subsystem deserves the next hour of investigation? k6’s built-in summary aggregates HTTP metrics, checks, and thresholds (metrics, results output). In this guide you will learn how to read that summary in order, spot classic failure patterns, and export narratives stakeholders can act on without hype.

Warm up with p95 vs p99 latency and cross-check methodology against common load testing mistakes. When sizing is still fuzzy, pair this with how many virtual users.

Why report order matters more than another chart

Metrics interact. High http_req_failed distorts latency percentiles. Dropped iterations in arrival-rate mode mean you measured under-fed load, not “fast at peak.” Checks can pass while thresholds fail—valid but slow responses. Reading averages before scenario fidelity invites false confidence.

Think of the report as an incident timeline:

  1. Scenario fidelity — duration, executor, stages, achieved VUs/iterations.
  2. Failure ratehttp_req_failed and check failures by tag.
  3. Latency shapep95/p99 segmented by route or journey.
  4. Throughputhttp_reqs vs latency climb (saturation signature).

Skipping step one is like diagnosing production without knowing which deploy was live.

Sample size matters for percentiles. p99 on a two-minute smoke with dozens of requests is noise; document minimum duration and iteration counts beside percentile slides so reviewers do not chase ghosts. When in doubt, extend the steady stage or raise arrival rate modestly until tagged routes have enough samples—then re-read tails (p95 vs p99).

When pretty percentiles hide the wrong test

Teams present aggregate http_req_duration while checkout was 10% of traffic. Marketing pages dominate the average; checkout tails disappear. Tag scenarios (route, journey, version) so summaries align with API bottleneck analysis.

Practical k6 implementation: thresholds, checks, and tagged routes

Encode SLO gates in thresholds; encode logical correctness in checks. Export summaries with percentile trends for comparisons across builds.

Example script (illustrative—not production-ready). Fictional SLO numbers; adapt to your service.

What this example demonstrates:

  • Checks on status and a JSON fragment; thresholds on tagged p95/p99.
  • Grouped routes so the summary splits browse vs checkout.
  • Scenario metadata via tags (build, env) for report filenames and CI artifacts.
  • Strict failure rate before latency gates so tails are meaningful.
import http from 'k6/http';
import { check, group, sleep } from 'k6';

const BASE = __ENV.API_BASE || 'https://staging.example.com';
const BUILD = __ENV.BUILD_SHA || 'local';

export const options = {
  scenarios: {
    report_demo: {
      executor: 'ramping-vus',
      stages: [
        { duration: '1m', target: 10 },
        { duration: '5m', target: 40 },
        { duration: '1m', target: 0 },
      ],
      tags: { build: BUILD, env: __ENV.ENV || 'staging' },
    },
  },
  thresholds: {
    http_req_failed: ['rate<0.01'],
    'http_req_duration{route:browse}': ['p(95)<450', 'p(99)<800'],
    'http_req_duration{route:checkout}': ['p(95)<900', 'p(99)<1400'],
    checks: ['rate>0.99'],
  },
};

export default function () {
  group('browse', () => {
    const res = http.get(`${BASE}/catalog`, { tags: { route: 'browse' } });
    check(res, {
      'browse status 2xx': (r) => r.status >= 200 && r.status < 300,
      'browse has items': (r) => r.body && r.body.includes('items'),
    });
    sleep(1);
  });

  group('checkout', () => {
    const body = JSON.stringify({ sku: 'SKU-1', qty: 1 });
    const res = http.post(`${BASE}/checkout`, body, {
      headers: { 'Content-Type': 'application/json' },
      tags: { route: 'checkout' },
    });
    check(res, {
      'checkout status 2xx': (r) => r.status >= 200 && r.status < 300,
    });
    sleep(0.5);
  });
}

Patterns that work

  • Read http_req_failed first, then percentiles; segment by tags present in the script.
  • Attach scenario parameters (executor, stages, rates, env) to every PDF or slide deck.
  • Use --summary-trend-stats="p(95),p(99)" in CI for comparable tails run-over-run.
  • Translate to user stories for executives: “4% of simulated checkout sessions waited >1s.”

Anti-patterns to avoid

  • Screenshots without VU/RPS context—stakeholders compare incompatible runs.
  • Celebrating passing checks when thresholds failed (slow-but-valid).
  • Ignoring dropped iterations on arrival-rate executors (arrival-rate tracking).

Pro tip (example command):

k6 run report-demo.js --summary-export=summary.json --summary-trend-stats="p(95),p(99)"

What this command demonstrates: machine-readable summary JSON for dashboards plus percentile trends for diffing builds.

Decision framework: which metric to escalate first

Signal in reportLikely meaningNext action
http_req_failed spike earlyAuth, routing, or bad deployFix checks; compare status codes by tag
p95 up, p99 much worseTail-sensitive dependency (DB, partner API)Trace tagged route; see bottleneck analysis
Throughput flat, latency upSaturation (pools, CPU, locks)Profile server; consider capacity planning
Checks fail, latency OKFunctional regressionPair with contract tests (contract vs performance)
Thresholds fail, checks passSlow valid responsesOptimize path or scale; revisit SLO

Escalate failure rate first when any http_req_failed exceeds budget—percentiles lie on error bodies.

Escalate tagged p99 when user-visible journeys have tight tail SLOs (checkout, payments).

Escalate scenario fidelity when VUs or iterations did not reach documented targets.

Communicate upward without hype

Translate metrics into user stories: “Under simulated peak checkout traffic, 4% of sessions saw waits above one second—primarily on /cart/checkout.” Attach scenario parameters so execs cannot confuse runs. For vocabulary on trade-offs, share throughput vs latency alongside the report appendix.

Pre-release checklist

  • Export includes executor model, stages, duration, and achieved load (VUs/RPS).
  • Summary segmented by route/journey tags used in the script.
  • Threshold table copied with pass/fail and actual p95/p99 values.
  • Checks and http_req_failed reviewed before latency slides go to leadership.
  • Build SHA or release tag recorded next to the report filename.

How Performate standardizes reports engineering and product both trust

Raw k6 console output is enough for developers; product and leadership need the same charts with scenario context baked in.

Example: from k6 run to stakeholder-ready export

  1. Run the scenario from the desktop app with tags already applied in the visual editor. Problem solved: route/journey splits match how QA thinks about flows.
  2. Open the integrated report and filter by route:checkout (or your tag convention). Problem solved: tails visible without manual log parsing.
  3. Pin the scenario panel (VUs, stages, rates) into the export footer. Problem solved: “green chart” debates end when parameters travel with the PDF.
  4. Compare previous build side by side for p95/p99 on critical tags. Problem solved: regressions obvious without rebuilding spreadsheets.
  5. Share one export to engineering and product—same numbers, same labels.
  6. Export k6 + summary JSON for CI artifacts when gates fail (load testing in CI/CD).

Closing takeaway

Reports are actionable when fidelity → failures → tagged tails → story is explicit. Thresholds encode gates; checks encode correctness; tags encode ownership.

After your next run, write three sentences: intended load, worst tagged p99, and the subsystem on-call should open first—before anyone asks for “the green slide.”

Try Performate free | Book a demo | k6 metrics reference

Ready to optimize your API performance?

Use Performate to generate consistent reports you can share with engineering and product after every run.

← Back to all posts