Vivian Voss

The Kubernetes Tax

kubernetes docker cloud devops

The Invoice ■ Episode 02

“We need Kubernetes.”

That sentence has cost more engineering hours than any distributed systems problem it was deployed to solve. It sounds authoritative. It sounds inevitable. And it is the sentence that transforms a perfectly functional deployment (one that ran with systemctl start myapp) into an orchestration platform with 81 distinct resource types, each carrying its own YAML schema, lifecycle hooks, and failure modes.

Let us examine the invoice.

The Complexity Tariff

Before Kubernetes, deploying an application looked like this:

systemctl start myapp

One line. One command. The service starts. It restarts on failure if you have written a competent unit file. Logs appear in journalctl. The whole affair takes roughly four seconds, including the time to type it.

After Kubernetes, deploying the same application requires 200 or more lines of YAML spread across five to eight files: a Deployment, a Service, an Ingress, a ConfigMap, a Secret, a HorizontalPodAutoscaler, a PersistentVolumeClaim, and, depending on how seriously one takes the liturgy, a NetworkPolicy, a ServiceAccount, and a PodDisruptionBudget. Each file has its own apiVersion, its own kind, and its own quietly expanding surface area of things that can go wrong.

Before systemctl start myapp 1 line 1 command • 4 seconds After Deployment Service Ingress ConfigMap Secret HPA PVC NetworkPolicy 200+ lines of YAML 5-8 files • 81 resource types systemctl Kubernetes YAML ~200x more configuration

The YAML, incidentally, is not merely verbose. It drifts. The configuration committed to Git and the configuration actually running in the cluster are, in any sufficiently mature deployment, two different things. Discovering which of the two is the source of truth requires kubectl, stern, k9s, Lens, and, in the more colourful incidents, a prayer. One does not debug Kubernetes so much as interrogate it, and it is not always forthcoming.

The Platform Tax

Kubernetes does not run itself. It requires a platform team: engineers whose full-time occupation is maintaining the system that runs your system. The going rate is two to four engineers at €80,000 to €120,000 per annum. That is €160,000 to €480,000 in annual salary before a single cloud invoice has been opened. Before a single line of product code has been deployed. Before a single customer has been acquired.

These are not people writing features. They are not shipping products. They are maintaining the infrastructure that maintains the infrastructure. It is rather like hiring a full-time mechanic to keep the lorry running that delivers your milk, when the dairy is across the road.

Developers who previously deployed with git push now open Jira tickets. “DevOps”, a term coined to eliminate the wall between development and operations, became, in practice, “Dev waits for Ops.” The wall was not torn down. It was rebuilt in YAML and given a service-level agreement.

The Utilisation Scandal

Here is the number that should appear on every quarterly board report and never does.

According to Datadog’s 2023 State of Kubernetes report, average cluster utilisation is 13 per cent. Thirteen. That is 87 per cent of paid compute sitting idle. You are provisioning 7.7 times the resources you actually use, and the cloud provider is sending an invoice for every idle cycle.

Average Kubernetes Cluster Utilisation Used 13% Idle 87% Paying for 7.7x what you use Source: Datadog State of Kubernetes 2023

Let that settle. For every euro of compute your application actually consumes, you are paying €7.70. Not because the application requires it, but because Kubernetes requires headroom, redundancy, node overhead, system daemons, and the comforting illusion that autoscaling will handle the rest. The cloud provider, understandably, does not object.

The Service Mesh Surcharge

Once the cluster is running, someone will suggest a service mesh. Istio, Linkerd, Consul Connect. The options are plentiful and the sales pitch is consistent: observability, mTLS, traffic management, the full ecclesiastical vestments of modern infrastructure.

What the pitch omits is the latency. According to Istio’s own performance documentation, a service mesh adds 5 to 15 milliseconds of latency to every internal call. Every service-to-service request now passes through a sidecar proxy that inspects, encrypts, logs, and routes the traffic. This is not free. It is a per-request tax, and it compounds with every hop in the call chain.

A request that traverses five services accumulates 25 to 75 milliseconds of mesh overhead alone, before any business logic has executed. The certificate rotation that underpins the mTLS, meanwhile, fails silently at three in the morning. Nobody notices until the dashboards go red, and by then the postmortem is already writing itself.

The Hidden Invoice

The line items above are the ones that appear on spreadsheets. The hidden costs are more insidious.

Cognitive load. Kubernetes has 81 distinct resource types. Each type has its own API group, versioning scheme, and set of behaviours that are documented, in the generous sense of the word, on kubernetes.io. A developer who once needed to understand their application now needs to understand their application and the orchestration platform, and the networking model, and the storage provisioner, and the RBAC policy that determines whether they are permitted to look at any of it.

Tooling proliferation. Debugging a Kubernetes deployment requires kubectl for cluster state, stern for log aggregation, k9s for terminal-based cluster management, Lens for a graphical view, Prometheus for metrics, Grafana for dashboards, and Jaeger for distributed tracing. That is seven tools to diagnose what journalctl -u myapp would have shown you in a single command.

Opportunity cost. Every hour a senior engineer spends debugging a CrashLoopBackOff is an hour not spent building the product that justifies the infrastructure. The platform team does not generate revenue. It maintains the conditions under which revenue generation is theoretically possible. The distinction matters.

The Annual Kubernetes Invoice Platform Team 2-4 engineers × €80-120k €160-480k/yr Cloud Compute 87% paid idle (7.7x overprovisioned) used wasted Debugging Toolchain 7 tools to replace journalctl kubectl stern k9s Lens Prometheus Grafana Jaeger Without Kubernetes VPS + systemd + journalctl

The Origin Story Nobody Reads

Kubernetes descends from Borg, Google’s internal cluster management system. Borg was designed for Google’s scale: millions of containers across global data centres, serving billions of requests, managed by thousands of engineers. The paper makes this context abundantly clear. Google open-sourced the concept, the CNCF adopted it, and an industry that mostly runs a handful of services on a few machines decided it needed the same tooling as one of the largest distributed systems operations on Earth.

This is rather like observing that Formula One teams use telemetry and pit crews, and concluding that the school run requires the same.

Ninety-nine point nine per cent of companies will never have Google’s problems. They will, however, have Google’s infrastructure costs, because they adopted Google’s tooling without Google’s constraints, Google’s scale, or Google’s budget.

The Alternatives That Already Exist

The uncomfortable truth is that the alternatives are not exotic. They are not cutting-edge. They are, in many cases, older than Kubernetes itself and considerably more battle-tested.

A VPS with systemd handles deployment, process supervision, log management, and automatic restarts. It has done so reliably since 2010. The configuration is a single unit file, not a folder of YAML manifests. Scaling means adding another VPS and a load balancer, not another abstraction layer.

Docker Compose orchestrates multi-container applications on a single host with a clarity that Kubernetes has never managed. It is not suitable for global scale. It is, however, perfectly suitable for the 95 per cent of applications that do not operate at global scale and never will.

FreeBSD jails have provided process isolation since the year 2000, twenty-four years before the industry collectively decided that containers were a novelty. Jails are lightweight, well-documented, and require no orchestration platform. They simply work, in the profoundly unglamorous way that well-engineered systems tend to.

None of these alternatives have a foundation, a landscape diagram, or an annual conference. They have something rather more useful: simplicity that scales to the actual problem.

The Verdict

Kubernetes is a tool. A powerful one, designed for a specific class of problems at a specific scale. The difficulty is that its adoption has been driven not by the presence of those problems but by the fear of appearing to lack ambition. “We use Kubernetes” has become a status marker, a signal of technical sophistication, an item on the conference talk abstract. That it costs half a million euros in platform engineering before a single feature ships is treated as the price of modernity rather than what it actually is: waste.

The best infrastructure is the one you do not notice. If you are spending more time maintaining the platform than building the product, the platform has become the product, and nobody is buying it.

Thirteen per cent utilisation. Two hundred lines of YAML. Half a million in platform salaries. Seven debugging tools. And underneath it all, an application that once started with a single command and ran for years without complaint.

The invoice is on the table. Whether anyone reads it is, as always, another matter entirely.