Vivian Voss

GraphQL: The Query Tax

javascript architecture web performance

The Invoice ■ Episode 15

"Query exactly what you need! One endpoint! No over-fetching!"

Splendid. Let us examine what you are actually paying for.

In 2012, Facebook had a rather specific problem. Their iOS News Feed consumed data from hundreds of internal microservices over mobile bandwidth that was, at the time, neither fast nor cheap. The existing REST endpoints returned too much data, too many of them needed to be called in sequence, and the mobile team was spending more time choreographing API calls than building features. So Lee Byron, Dan Schafer, and Nick Schrock built GraphQL to solve it.

The solution was brilliant. For Facebook. You have 12 REST endpoints and a fetch() call. But do carry on.

The N+1 Invoice

Consider a modest query: fetch 25 users with their posts. In REST, this is two calls. GET /users returns the list. GET /users/:id/posts returns the posts. Two requests, two responses, two cache entries. Straightforward.

In GraphQL, you write one query. Elegant. The resolver for users fires once and returns 25 rows. Then the resolver for posts fires once per user. That is 26 database queries for one API call. This is the N+1 problem, and it is not a bug. It is how resolvers work by design.

25 Users with Posts: The Query Count REST 2 HTTP requests 2 database queries GET /users + GET /users/:id/posts GraphQL 1 HTTP request 26 database queries 1 users query + 25 posts resolvers The fix: DataLoader. A batching utility you must implement yourself. It is not built in. It is homework.

The fix exists. It is called DataLoader, a batching utility that collects individual resolver calls and combines them into bulk queries. With DataLoader, those 26 queries become 2. Splendid. But DataLoader is not built into GraphQL. It is not part of the specification. It is not enabled by default. It is a separate library that you must install, configure, and integrate into every resolver that touches a database. It is, in the most generous interpretation, homework.

Without it, a relation-heavy GraphQL endpoint performs measurably worse than the REST equivalent it was meant to replace. REST delivers nearly half the latency and 70 per cent more requests per second for standard CRUD operations. The query language that promised efficiency costs you efficiency.

The Caching Invoice

REST uses HTTP caching. ETags, Cache-Control, CDN layers. GET requests to unique URLs, cacheable by design. The browser caches them. The CDN caches them. The reverse proxy caches them. Three layers of caching infrastructure, built into the web itself, requiring precisely zero configuration from you.

GraphQL uses POST to a single endpoint. Every query, every mutation, one URL: /graphql. HTTP caching does not work because every request is a POST to the same address with a different body. The browser cannot cache it. The CDN cannot cache it. The reverse proxy cannot cache it. HTTP, the most battle-tested caching infrastructure in the history of computing, has been demoted to a dumb tunnel.

HTTP Caching: Who Benefits? REST (GET /users) Browser cacheworks CDN cacheworks Reverse proxyworks ETagsworks Cache-Controlworks GraphQL (POST /graphql) Browser cachebroken CDN cachebroken Reverse proxybroken ETagsbroken Cache-Controlbroken 56% of teams report caching challenges with GraphQL. One rather suspects the other 44% have not noticed yet.

The replacement is Apollo's normalised cache, persisted queries, or custom caching layers that you build and maintain yourself. 56 per cent of teams report caching challenges with GraphQL. One rather suspects the other 44 per cent have not noticed yet.

And Apollo Client, the library that provides this custom caching, weighs 43 KB gzipped. fetch() ships with every browser at 0 KB. You are paying 43 kilobytes to restore functionality that HTTP provided for free before you broke it.

The Security Invoice

A 128-byte nested query can consume 10 seconds of CPU time. No authentication required. The query is syntactically valid. The schema permits it. The server dutifully executes it.

{
  users {
    posts {
      comments {
        author {
          posts {
            comments {
              author { name }
            }
          }
        }
      }
    }
  }
}

This is a valid GraphQL query. It is also a denial-of-service attack. The recursive relationship between users, posts, comments, and authors creates exponential depth that REST never exposes, because REST endpoints are flat by design. You cannot accidentally nest GET /users six levels deep.

80 per cent of GraphQL APIs are vulnerable to denial-of-service through query depth. Most frameworks ship with no default depth limit. You must build depth limiting, cost analysis, and complexity-based rate limiting. Traditional rate limiting by endpoint does not work when every request hits the same URL. The OWASP GraphQL Cheat Sheet reads like a confession of design decisions that were never made.

The Monitoring Invoice

GraphQL returns HTTP 200. Always. Even when your application is on fire.

Errors live in a JSON array inside the response body. Your monitoring dashboard shows 100 per cent success rate. Your users see failures. Your on-call engineer sleeps through the night. Your customers do not. The HTTP status code, that universal language of success and failure that every tool in the ecosystem understands, has been reduced to a decorative constant.

What Your Dashboard Sees REST Success:200 Not found:404 Server error:500 GraphQL Success:200 Not found:200 Server error:200 GraphQL errors live in a JSON array inside the response body. Your dashboard shows 100% success. Your users see failures. Marvellous.

The Alternative

REST with OpenAPI 3.0: self-documenting, typed client generation, HTTP caching built in. fetch() ships with every browser at 0 KB. The specification is stable. The tooling is mature. The caching works. The status codes mean what they say. 83 per cent of web services use REST. Not because they have not heard of GraphQL, but because they evaluated the trade-off and chose accordingly.

GraphQL solves a real problem: aggregating hundreds of services behind a single query interface. If you have that problem, use it. If you have hundreds of microservices, a mobile client on constrained bandwidth, and a frontend team that needs to iterate independently of the backend, GraphQL is genuinely excellent. Facebook built it for exactly this scenario, and it works beautifully there.

If you do not have that problem, you are paying the invoice for someone else's architecture.

The Pattern

Facebook built GraphQL for a mobile feed consuming hundreds of services over constrained bandwidth. In 2015, they open-sourced it. The industry adopted it without the problem. A dashboard with 12 endpoints gained a query language, a schema layer, a type system, a normalised cache, a depth limiter, a cost analyser, and 43 KB of client library. The original fetch() call is still there, underneath it all, wondering what it did wrong.

The query language is excellent. The question is whether you have queries worth asking.

Facebook built GraphQL for hundreds of microservices over mobile bandwidth. You copied it for a dashboard with 12 endpoints. N+1 queries by design. HTTP caching broken by design. 200 on errors by design. 80 per cent of APIs vulnerable to depth-based DoS. The query language is excellent. The question is whether you have queries worth asking.