GraphQL Load Testing with k6: Queries, Batching, and Failure Modes

Model resolver hotspots, batch POST storms, and complexity limits—REST intuition misleads GraphQL teams.

REST load tests often hammer one URL with predictable JSON. GraphQL shifts complexity server-side: the same /graphql endpoint accepts radically different resolver graphs, batch arrays, and persisted-query hashes. GraphQL load testing with k6 means varying query shapes and depths, tagging operation names, and watching failure modes REST teams rarely see—N+1 resolver storms, complexity rejections, and batch POST amplification.

In this guide you will learn why GraphQL performance is a query-shape problem, how to send realistic POST bodies with k6, when batching helps or hurts, and which thresholds should gate releases before mobile clients ship a new persisted query bundle.

Why GraphQL breaks REST intuition under load

A shallow viewer { id } query can look fast while production queries fan out across lists and dataloaders:

Resolver graphs multiply database round-trips; moderate RPS with rising CPU often signals N+1 patterns.
Query cost engines return 400 when depth or field costs exceed policy—sudden error spikes without saturation.
Persisted queries mismatch when the gateway disables ad-hoc documents—mobile apps send hashes staging no longer recognizes.
Batch HTTP arrays multiply operations per request; one POST becomes ten resolver trees—great for efficiency, dangerous for hotspots.
Subscriptions and live queries (if used) change connection profiles—HTTP-only tests miss them unless you model separately.

Hammering a health-check query proves the server is up—not that checkout or catalog lists meet SLOs. Pair GraphQL scenarios with API bottleneck analysis and pagination load testing when connections expose cursor edges.

Practical k6 implementation: operations, batching, and tags

Core k6 has no GraphQL-specific engine—clarity with http.post and JSON bodies beats opaque helpers (HTTP requests). Tag operation and query_depth (or complexity bucket) so summaries split percentiles.

Example script (illustrative—not production-ready). Uses fictional endpoint and thresholds. Adapt documents, auth, and complexity policies to your gateway.

What this example demonstrates:

Named operations: Tags operation:CatalogList vs operation:CheckoutSummary map failures to product areas.
Depth mix: SharedArray rotates shallow and deep documents to mimic mobile + web clients.
Batch POST: Optional scenario sends an array of operations—stressing gateway batch limits.
Complexity header: Sends X-GraphQL-Complexity when your gateway scores queries for observability alignment.
Separate thresholds: Deep queries get looser p99 than shallow reads—reflecting product SLOs.

import http from 'k6/http';
import { check, sleep } from 'k6';
import { SharedArray } from 'k6/data';

const GQL = __ENV.GQL_URL || 'https://staging.example.com/graphql';
const TOKEN = __ENV.API_TOKEN || 'replace-me';

const documents = new SharedArray('gql_docs', function () {
  return [
    {
      operation: 'ViewerShallow',
      depth: 'shallow',
      body: {
        operationName: 'ViewerShallow',
        query: `query ViewerShallow { viewer { id displayName } }`,
        variables: {},
      },
    },
    {
      operation: 'CatalogList',
      depth: 'deep',
      body: {
        operationName: 'CatalogList',
        query: `query CatalogList($first: Int!) {
          catalog(first: $first) {
            edges { node { id title variants { sku price } } }
          }
        }`,
        variables: { first: 25 },
      },
    },
  ];
});

export const options = {
  scenarios: {
    single_ops: {
      executor: 'ramping-arrival-rate',
      startRate: 5,
      timeUnit: '1s',
      preAllocatedVUs: 30,
      maxVUs: 100,
      stages: [
        { duration: '2m', target: 20 },
        { duration: '5m', target: 20 },
        { duration: '2m', target: 0 },
      ],
      exec: 'singleOperation',
    },
    batch_storm: {
      executor: 'constant-arrival-rate',
      rate: Number(__ENV.BATCH_RPS || 3),
      timeUnit: '1s',
      duration: '4m',
      preAllocatedVUs: 20,
      maxVUs: 60,
      exec: 'batchOperations',
      startTime: '9m',
    },
  },
  thresholds: {
    'http_req_duration{depth:shallow}': ['p(99)<400'],
    'http_req_duration{depth:deep}': ['p(99)<1200'],
    http_req_failed: ['rate<0.02'],
  },
};

function gqlPost(payload, tags) {
  return http.post(GQL, JSON.stringify(payload), {
    headers: {
      Authorization: `Bearer ${TOKEN}`,
      'Content-Type': 'application/json',
      'X-GraphQL-Complexity': 'load-test',
    },
    tags,
  });
}

export function singleOperation() {
  const doc = documents[Math.floor(Math.random() * documents.length)];
  const res = gqlPost(doc.body, {
    name: 'graphql',
    operation: doc.operation,
    depth: doc.depth,
  });

  check(res, {
    'no GraphQL errors': (r) => {
      const json = r.json();
      return !json.errors || json.errors.length === 0;
    },
    'data present': (r) => r.json('data') !== null,
  });

  sleep(0.3);
}

export function batchOperations() {
  const batchBody = documents.map((d) => d.body);
  const res = gqlPost(batchBody, {
    name: 'graphql_batch',
    operation: 'BatchBundle',
    depth: 'batch',
  });

  check(res, {
    'batch status 200': (r) => r.status === 200,
  });

  sleep(0.5);
}

Track complexity scores from gateway logs if enforced—correlate with k6 tags during triage. When introspection is disabled in production, staging tests must use the same persisted-query allowlist mobile ships.

Decision framework: single ops vs batch vs depth spikes

Situation	Recommended action
Web + mobile with different documents	SharedArray weighted to analytics operation counts
Gateway enforces complexity	Include rejected queries in mix; threshold on `http_req_failed` and GraphQL `errors`
BFF batches client operations	Dedicated `batch_storm` scenario at low RPS; watch CPU vs single-op baseline
New persisted query release	Smoke hash set under load before store submission
Read-heavy with connection pagination	Add cursor-paged operations; pair with pagination load guide

Use single-operation scenarios for baseline SLOs per named operation—your primary product dashboards.

Use batch scenarios when production clients actually batch—otherwise you are testing a fantasy hotspot.

Use depth spikes when marketing enables richer product cards—shallow-only tests miss N+1 regressions.

Observability and pre-release checklist

GraphQL regressions hide inside errors arrays with HTTP 200. Before you raise RPS:

Document operation mix percentages and which documents represent each client surface.
Tag k6 with operation and depth (or complexity bucket) for per-family thresholds.
Fail checks on errors length, not only HTTP status—partial data breaks UX silently.
Compare batch vs single-op CPU at equal business RPS—not equal HTTP request count.
Archive persisted-query manifest version with git SHA per run (load testing in CI/CD smoke after schema SDL merges).

How Performate iterates GraphQL collections with threshold discipline

Evolving SDL and operation lists outpace hand-edited scripts. Concrete workflow for catalog + checkout operations above:

Import a GraphQL or Postman collection with named operations CatalogList and CheckoutSummary (and a batch folder if clients batch). Problem solved: operations stay visual as SDL changes.
Create single_ops scenario with ramping arrival rate to 20 req/s and attach weighted requests matching analytics. Problem solved: shape matches product traffic without rewriting executor blocks.
Add optional batch_storm scenario at 3 req/s using the batch request—toggle on when gateway batching ships. Problem solved: batch hotspots tested deliberately, not accidentally.
Set tags per request: operation:*, depth:shallow|deep|batch. Problem solved: reports align with k6 thresholds in this article.
Run and compare shallow vs deep p99 in the integrated report before mobile releases a new persisted bundle. Problem solved: one export for backend and mobile leads.
Export k6 for CI smoke after gateway policy changes—same operations, pipeline-aligned thresholds.

That maps to the cta: iterate GraphQL collections against staging gateways with threshold discipline.

Closing takeaway

GraphQL load testing is a query-shape and batching problem on a single HTTP path. Vary documents, tag operations, fail on GraphQL errors, and stress batch arrays only when clients send them—REST-style single-endpoint thinking will miss the resolver graph that actually burns CPU.

Run this week’s operation mix against staging before your next persisted-query or complexity-policy change—and note which operation still owns the p99 your SLO names.

Try Performate free | Book a demo | k6 JavaScript API

Ready to optimize your API performance?

Use Performate to iterate GraphQL collections against staging gateways with threshold discipline.

Get Performate

← Back to all posts