Vivian Voss

The Workaround That Became Architecture

nodejs docker containers freebsd

There is a moment, familiar to anyone who has shipped a small service in the last decade, that is worth looking at plainly. You have written a program that does one thing. It listens on a port and answers. To run it, you write a Dockerfile, build an image, push it to a registry, and describe to an orchestrator how many replicas to schedule, how to health-check them, how to route to them. The program is fifty lines. The apparatus around it is the architecture. And almost nobody stops to ask whether the program needed any of it, because containing everything is simply how things are done now.

This is the third of the four breaks the first piece named, and it is the one most often mistaken for good practice rather than a workaround. To see why, it helps to go back to a single decision, made in good faith, in 2009.

The Decision of 2009

In November 2009, Ryan Dahl presented Node.js at JSConf EU. The idea was elegant and, for its problem, correct: a single-threaded event loop with non-blocking I/O, so that one process could hold thousands of idle connections without a thread per connection. For an I/O-bound workload, waiting on the network rather than the processor, it was a genuinely good answer, and it had the further appeal of letting a team write one language on both sides of the wire.

The difficulty is that it did not stay an answer to that question. Within a few years Node was the default runtime for very nearly everything, I/O-bound or not, and a model that was right for holding idle sockets was now being asked to run whole applications on a single core.

It also carried a consequence that took a few years to become visible. A single-threaded runtime uses one core. By 2009 the multi-core processor was already the ordinary desktop part, and the server had several. A Node process, by design, left all but one of them idle. This was not an oversight; it was the simplification that made the model safe, no shared-memory threads, no locks, no data races. The cost of that safety was that the runtime could not, on its own, use the machine it ran on.

The Workaround

The question that followed was reasonable: how do you use the other cores? The answer arrived in layers. First the cluster module (Node 0.6, 2011), which forks one worker process per core and shares a listening socket between them. That is already a workaround, one process pretending to be several because the runtime cannot spread itself. Then, with Docker (Solomon Hykes, 2013), a tidier packaging of the same idea: put each instance in its own container, and let an orchestrator place the containers across cores and machines and restart the ones that fall over.

One Core, or All of Them Node.js — one core per process worker → core 1 worker → core 2 worker → core 3 worker → core 4 worker → core 5 worker → core 6 N processes or containers to fill N cores Go / Rust — one binary one process all cores a scheduler in the runtime fills the box The sprawl is a language limit, generalised into a way of building.

And because the convenient way to build a container is to start from a distribution base image, each one tends to carry a whole userland with it: its own libraries, its own shell, its own package manager, and therefore its own attack surface, its own configuration drift, its own list of vulnerabilities to patch. You set out to ship one process and shipped a small operating system to keep it company, then ran a zoo of them to stand in for a single application.

What Ships in the Container container — a small OS your process shell • package manager base-image libraries config • state attack surface • patch burden jail — one binary your process and nothing else around it One process, wearing a whole operating system.

None of this was totally wrong, and on Linux it had genuine use. Docker packaged away the works-on-my-machine tax, the deployment inconsistency that a fragmented distribution landscape produces, and it did so well enough to change the industry in a year. It is worth noticing what kind of tax that is. A system with a single base and one package tree, FreeBSD among them, never had the problem in the same form, because the machine you built on and the machine you shipped to were the same machine. Docker paid down a Linux tax, and much of the world took the payment for a universal law. But watch what happened to the unit of thought. The container stopped being the way you shipped a process and became the way you reasoned about the system. Scaling meant more containers. A service meant a container. The default for a program that could have been one process on one machine became an image, a registry, a scheduler, and a control loop, whether or not the program had any reason to be distributed. The workaround for a runtime that could not use its cores had quietly become the architecture for everything.

The Languages That Did Not Need It

The clean test of whether something is architecture or workaround is to ask whether a different choice removes the need for it. Here it does.

Go, released by Google around the same time, ships a scheduler in the runtime: goroutines are multiplexed across all the cores of one process, so a single binary saturates the machine without a cluster and without a container per core. Rust reaches the same place through an async runtime such as Tokio: one binary, all cores, no external supervisor to fan work across processes. In both, the thing Node needed the container to do is done inside the language, for free, invisibly. I run a small web server of my own, a single Rust binary declared in an earlier piece, and it uses every core on its host without a cluster module or a scheduler in front of it; that is not a product and I am not selling it, only a demonstration that the sprawl is not compulsory.

So the sprawl is not a fact about distributed systems. It is a fact about a language that could not, by itself, use a whole computer, and about an industry that generalised one runtime's limitation into a universal way of building. That is worth saying without heat: the people who reached for the container were solving a real problem with the best tool to hand. The error was not the reach. It was the promotion of the reach to a principle.

The Isolation Was Already Solved

The usual defence of the container is that it is about isolation, not cores: one service cannot corrupt another, dependencies are pinned, the blast radius is bounded. That is a real virtue, and it is also not new, and it did not require any of this. Kernel-native isolation of exactly this kind shipped in FreeBSD 4.0 in 2000, in the form of jails, from work by Poul-Henning Kamp: a confined userland, its own file system view, its own network identity, enforced by the kernel at almost no cost and to an enterprise-grade standard that has held in production ever since. The isolation the container advertises was a solved problem before the container existed. What was new in 2013 was the packaging and the momentum, not the boundary.

The Bill, at Scale

Take this off the single machine and multiply it. Because a Node process holds one core, saturating a modern server means running many of them, as processes or as containers, each carrying its own runtime, its own heap, its own garbage collector, its own memory floor before it has done a thing. A language with a scheduler does the same work in one process on the same box. On one server the difference looks like waste you could tolerate.

The Bill, Multiplied to fill one server: many containers runtime + heap + GC + userland runtime + heap + GC + userland runtime + heap + GC + userland runtime + heap + GC + userland each one patched, each one paid for one scheduler process all cores one heap same work, one thing to patch Multiplied across every deployment: data centres of power, at no gain in speed.

Multiplied across the number of Node-and-container deployments now running, it stops being tolerable. On any honest estimate it is data centres of power and silicon spent to operate a workaround for a language decision, and spent for no gain in speed, because a runtime that uses its own cores behind the same non-blocking model, isolated by a kernel-native jail, answers at least as quickly and often faster. And power is only the measured part of the bill. The rest is a userland per container to keep patched and configured, where a single binary in a jail would have needed none. It may be the most quietly expensive over-provisioning in modern computing, and almost none of it is charged to the decision that caused it.

The Walk-Backs

The first piece in this series already named the returns, so a sentence will do here: when the sprawl grew expensive enough to measure, the teams that could afford to measure it walked back.

The Walk-Backs Amazon Prime Video — 2023 distributed pipeline one process • ~90% cheaper Segment — 2018 140 services one service Istio — 2020 split control plane one binary • istiod Each walk-back is proof the sprawl was a choice, not a price the problem forced.

Amazon's Prime Video quality team collapsed a distributed pipeline into a single process and cut cost by about ninety per cent; Segment folded a hundred and forty services into one; Istio merged its own control plane into a single binary. The point for this piece is narrow and worth keeping: each walk-back is a proof that the sprawl was a choice. A thing you can undo by decision was never forced on you by the nature of the problem.

The Point

The container earned its place. It solved deployment, it bounded dependencies, and where a system is genuinely large and genuinely many-teamed, orchestrating it is honest work. What was sold beyond that, and bought without much argument, is the idea that a program should be wrapped, imaged, scheduled and supervised by default, when the truthful description of a great many services is one process that ought to use its cores and does not, because of a language decision made in 2009 and never revisited.

Lean here is not anti-container. It is the discipline of asking, before the Dockerfile, what the program actually is. Often it is a single binary that saturates its machine and isolates cleanly, and the whole apparatus was a costume for a workaround.

Next Wednesday: the layers that hide what they should bound.