By Performate
How to Read Load Test Reports and Turn Results into Action
Decode k6 summary metrics—latency, throughput, checks, thresholds—and translate patterns into engineering actions stakeholders understand.
The deck shows green p95 charts—but nobody wrote down target VUs, dropped iterations, or which route owned the tail. Leadership approves the release; production still hurts on /cart/checkout because the report answered “was latency high?” instead of “did we simulate peak checkout, and where did it fail?”
A useful report answers three questions: Did we simulate the traffic we intended? Did user-visible behavior stay acceptable? If not, what subsystem deserves the next hour of investigation? k6’s built-in summary aggregates HTTP metrics, checks, and thresholds (metrics, results output). In this guide you will learn how to read that summary in order, spot classic failure patterns, and export narratives stakeholders can act on without hype.
Warm up with p95 vs p99 latency and cross-check methodology against common load testing mistakes. When sizing is still fuzzy, pair this with how many virtual users.
Why report order matters more than another chart
Metrics interact. High http_req_failed distorts latency percentiles. Dropped iterations in arrival-rate mode mean you measured under-fed load, not “fast at peak.” Checks can pass while thresholds fail—valid but slow responses. Reading averages before scenario fidelity invites false confidence.
Think of the report as an incident timeline:
- Scenario fidelity — duration, executor, stages, achieved VUs/iterations.
- Failure rate —
http_req_failedand check failures by tag. - Latency shape —
p95/p99segmented by route or journey. - Throughput —
http_reqsvs latency climb (saturation signature).
Skipping step one is like diagnosing production without knowing which deploy was live.
Sample size matters for percentiles. p99 on a two-minute smoke with dozens of requests is noise; document minimum duration and iteration counts beside percentile slides so reviewers do not chase ghosts. When in doubt, extend the steady stage or raise arrival rate modestly until tagged routes have enough samples—then re-read tails (p95 vs p99).
When pretty percentiles hide the wrong test
Teams present aggregate http_req_duration while checkout was 10% of traffic. Marketing pages dominate the average; checkout tails disappear. Tag scenarios (route, journey, version) so summaries align with API bottleneck analysis.
Practical k6 implementation: thresholds, checks, and tagged routes
Encode SLO gates in thresholds; encode logical correctness in checks. Export summaries with percentile trends for comparisons across builds.
Example script (illustrative—not production-ready). Fictional SLO numbers; adapt to your service.
What this example demonstrates:
- Checks on status and a JSON fragment; thresholds on tagged
p95/p99. - Grouped routes so the summary splits browse vs checkout.
- Scenario metadata via tags (
build,env) for report filenames and CI artifacts. - Strict failure rate before latency gates so tails are meaningful.
import http from 'k6/http';
import { check, group, sleep } from 'k6';
const BASE = __ENV.API_BASE || 'https://staging.example.com';
const BUILD = __ENV.BUILD_SHA || 'local';
export const options = {
scenarios: {
report_demo: {
executor: 'ramping-vus',
stages: [
{ duration: '1m', target: 10 },
{ duration: '5m', target: 40 },
{ duration: '1m', target: 0 },
],
tags: { build: BUILD, env: __ENV.ENV || 'staging' },
},
},
thresholds: {
http_req_failed: ['rate<0.01'],
'http_req_duration{route:browse}': ['p(95)<450', 'p(99)<800'],
'http_req_duration{route:checkout}': ['p(95)<900', 'p(99)<1400'],
checks: ['rate>0.99'],
},
};
export default function () {
group('browse', () => {
const res = http.get(`${BASE}/catalog`, { tags: { route: 'browse' } });
check(res, {
'browse status 2xx': (r) => r.status >= 200 && r.status < 300,
'browse has items': (r) => r.body && r.body.includes('items'),
});
sleep(1);
});
group('checkout', () => {
const body = JSON.stringify({ sku: 'SKU-1', qty: 1 });
const res = http.post(`${BASE}/checkout`, body, {
headers: { 'Content-Type': 'application/json' },
tags: { route: 'checkout' },
});
check(res, {
'checkout status 2xx': (r) => r.status >= 200 && r.status < 300,
});
sleep(0.5);
});
}
Patterns that work
- Read
http_req_failedfirst, then percentiles; segment by tags present in the script. - Attach scenario parameters (executor, stages, rates, env) to every PDF or slide deck.
- Use
--summary-trend-stats="p(95),p(99)"in CI for comparable tails run-over-run. - Translate to user stories for executives: “4% of simulated checkout sessions waited >1s.”
Anti-patterns to avoid
- Screenshots without VU/RPS context—stakeholders compare incompatible runs.
- Celebrating passing checks when thresholds failed (slow-but-valid).
- Ignoring dropped iterations on arrival-rate executors (arrival-rate tracking).
Pro tip (example command):
k6 run report-demo.js --summary-export=summary.json --summary-trend-stats="p(95),p(99)"
What this command demonstrates: machine-readable summary JSON for dashboards plus percentile trends for diffing builds.
Decision framework: which metric to escalate first
| Signal in report | Likely meaning | Next action |
|---|---|---|
http_req_failed spike early | Auth, routing, or bad deploy | Fix checks; compare status codes by tag |
p95 up, p99 much worse | Tail-sensitive dependency (DB, partner API) | Trace tagged route; see bottleneck analysis |
| Throughput flat, latency up | Saturation (pools, CPU, locks) | Profile server; consider capacity planning |
| Checks fail, latency OK | Functional regression | Pair with contract tests (contract vs performance) |
| Thresholds fail, checks pass | Slow valid responses | Optimize path or scale; revisit SLO |
Escalate failure rate first when any http_req_failed exceeds budget—percentiles lie on error bodies.
Escalate tagged p99 when user-visible journeys have tight tail SLOs (checkout, payments).
Escalate scenario fidelity when VUs or iterations did not reach documented targets.
Communicate upward without hype
Translate metrics into user stories: “Under simulated peak checkout traffic, 4% of sessions saw waits above one second—primarily on /cart/checkout.” Attach scenario parameters so execs cannot confuse runs. For vocabulary on trade-offs, share throughput vs latency alongside the report appendix.
Pre-release checklist
- Export includes executor model, stages, duration, and achieved load (VUs/RPS).
- Summary segmented by route/journey tags used in the script.
- Threshold table copied with pass/fail and actual
p95/p99values. - Checks and
http_req_failedreviewed before latency slides go to leadership. - Build SHA or release tag recorded next to the report filename.
How Performate standardizes reports engineering and product both trust
Raw k6 console output is enough for developers; product and leadership need the same charts with scenario context baked in.
Example: from k6 run to stakeholder-ready export
- Run the scenario from the desktop app with tags already applied in the visual editor. Problem solved: route/journey splits match how QA thinks about flows.
- Open the integrated report and filter by
route:checkout(or your tag convention). Problem solved: tails visible without manual log parsing. - Pin the scenario panel (VUs, stages, rates) into the export footer. Problem solved: “green chart” debates end when parameters travel with the PDF.
- Compare previous build side by side for
p95/p99on critical tags. Problem solved: regressions obvious without rebuilding spreadsheets. - Share one export to engineering and product—same numbers, same labels.
- Export k6 + summary JSON for CI artifacts when gates fail (load testing in CI/CD).
Closing takeaway
Reports are actionable when fidelity → failures → tagged tails → story is explicit. Thresholds encode gates; checks encode correctness; tags encode ownership.
After your next run, write three sentences: intended load, worst tagged p99, and the subsystem on-call should open first—before anyone asks for “the green slide.”
Ready to optimize your API performance?
Use Performate to generate consistent reports you can share with engineering and product after every run.