Vivian Voss

Technical Beauty: FreeBSD Jails

freebsd architecture unix

Technical Beauty ■ Episode 04

In 1999, a Danish programmer solved a problem. Not a fashionable problem. Not a problem that would attract venture capital or inspire a logo with a whale on it. A real problem: chroot was not secure enough for multi-tenant hosting.

Poul-Henning Kamp built FreeBSD Jails. Kernel-level process isolation. Near-zero overhead. No daemon. No overlay network. No orchestration layer. No YAML. The solution shipped in FreeBSD 4.0 in March 2000, and the API has not broken since.

Fourteen years later, Docker arrived with considerably more fanfare and considerably more moving parts. The industry declared containerisation a revolution. FreeBSD administrators checked the date and carried on.

The Problem

The Unix chroot system call, introduced in 1979, changes the apparent root directory for a process. It was never designed as a security mechanism. It was designed for building and testing. The name is honest: it changes the root. It does not isolate the process, restrict its system calls, limit its network access, or prevent it from escaping if it has root privileges inside the changed root. A process with sufficient determination (and CAP_SYS_CHROOT) can simply chroot again, walk upwards, and break free. The escape is well-documented, trivially reproducible, and has been known since the 1980s.

R&D Associates, a hosting provider, needed proper isolation. They needed to give customers root access inside their own environments without giving them access to the host system, other customers, or the network stack. chroot could not do this. Virtual machines could, but at the cost of running an entire operating system per tenant. Kamp’s insight was that the kernel already had all the necessary enforcement mechanisms. They simply needed to be composed correctly.

The Architecture

A jail is a partition of the kernel’s resource namespace. Not a separate kernel. Not a daemon that intercepts system calls. Not a userspace process pretending to be an operating system. The FreeBSD kernel itself enforces the boundaries. A jailed process sees its own filesystem root, its own process table, its own network interfaces (via VNET), its own users. It cannot see, signal, or interact with processes outside its jail. The root user inside a jail is root only within that jail; to the host, it is merely a confined user ID with delusions of authority.

The implementation is startlingly direct. The original paper by Kamp and Robert Watson describes the mechanism in a few pages. The kernel maintains a jail structure per confined environment. System calls check the calling process’s jail membership and restrict access accordingly. There is no interception layer. There is no ptrace-based sandboxing. There is no LD_PRELOAD trickery. The kernel knows. The kernel enforces. The boundary is not a suggestion; it is a property of the process itself.

Architecture: Jails vs Docker Layers between your process and the kernel FreeBSD Jails Application pid 1 in jail FreeBSD Kernel jail(2) enforcement built in 1 layer. Zero daemons. Near-native performance. Docker Application runc (OCI runtime) containerd (container runtime) dockerd (Docker daemon) OverlayFS iptables cgroups + namespaces Linux Kernel 6 layers. 3 daemons. Each layer adds latency and attack surface. The fastest path between a process and the kernel is no path at all.

This architectural directness is the source of every advantage that follows. Performance, security, simplicity: all three are consequences of the same decision. When there is nothing between your process and the kernel, there is nothing to go wrong, nothing to restart, and nothing to consume resources pretending to be infrastructure.

The Configuration

A jail is created with jail(8) and configured in jail.conf. A working configuration:

webserver {
    host.hostname = "web.example.com";
    ip4.addr = "10.0.0.10";
    path = "/jails/webserver";
    exec.start = "/bin/sh /etc/rc";
    exec.stop = "/bin/sh /etc/rc.shutdown";
    mount.devfs;
}

Seven lines. A hostname, an IP address, a filesystem path, start and stop commands, and device filesystem access. No registry to pull from. No image layers to assemble. No daemon to keep running in the background hoping it will not segfault at three in the morning. The jail starts when you tell it to start. It stops when you tell it to stop. It does not phone home, check for updates, or require a subscription.

With VNET, each jail receives its own complete network stack: its own routing table, its own firewall rules, its own interfaces. Not a NAT bridge. Not an overlay network. A proper, independent network stack that the kernel partitions at the same level as the process namespace. A jailed service binds to its own IP address. ifconfig inside the jail shows only that jail’s interfaces. The networking model is not a workaround. It is the real thing.

The ZFS Integration

FreeBSD Jails and ZFS are not merely compatible. They are complementary in a way that suggests someone designed them to work together, which, in the manner of all good Unix tools, nobody did. They simply compose.

Each jail lives on a ZFS dataset. Snapshots are instantaneous and cost nothing until data diverges. Cloning a jail is a ZFS clone: a copy-on-write reference to the parent snapshot that consumes zero additional space until the clone writes its first byte. Creating a new jail from a template takes seconds, not minutes. Rolling back a broken update takes one command: zfs rollback. Not a rebuild. Not a re-pull. Not a prayer.

This is not a feature that was designed by a product team after three quarters of user research. It is what happens when two tools that do their jobs correctly happen to operate on the same substrate. ZFS manages storage. Jails manage isolation. The composition is the feature. Nobody had to write a plugin.

The Performance

A 2020 benchmark study measured FreeBSD Jails against Docker on identical hardware. The results were as subtle as a freight train. Jails achieved approximately 63,000 requests per second at near-native performance. Docker, with its daemon, its overlay filesystem, its NAT bridge, and its collection of userspace intermediaries, consumed measurably more CPU for the same workload.

The reason is not that Jails are optimised. The reason is that there is nothing to optimise away. A jailed process makes a system call. The kernel checks the jail structure. The call proceeds or is denied. There is no daemon in the middle forwarding requests. There is no overlay filesystem translating layer references into actual file locations. There is no network address translation rewriting packet headers. The overhead of a jail is, in practical terms, the cost of a pointer dereference and a conditional check. On modern hardware, that cost is measured in nanoseconds.

Docker’s overhead, by contrast, is structural. The daemon must run. containerd must run. The overlay filesystem must resolve layers. The NAT bridge must translate addresses. Each layer adds latency, CPU cycles, and memory consumption. These are not bugs. They are architecture. They cannot be optimised away because they are the product itself.

The Stability

The jail(2) system call was introduced in FreeBSD 4.0, released March 2000. The API has remained stable for twenty-five years. Configuration files written for FreeBSD 4 remain structurally valid on FreeBSD 14. Features have been added (VNET networking in FreeBSD 8, hierarchical jails in FreeBSD 9, per-jail resource limits via rctl(8)) but nothing has been removed. Nothing has been renamed. Nothing has been deprecated and replaced with a “v2 API” that requires rewriting all existing configurations.

Twenty-five years of stability is not an accident. It is a commitment by the FreeBSD project to treat the system call interface as a contract. Users who configured jails in 2001 are not punished for the crime of having adopted the technology early. They are rewarded with a quarter-century of their configurations continuing to work. In the Docker ecosystem, by contrast, the storage driver has changed three times, the networking model has been rewritten twice, and docker-compose has undergone a migration from Python to Go that broke every CI pipeline that relied on the original version.

The Containerisation Timeline From chroot to Docker: 34 years, one solved problem 1979 chroot Filesystem isolation only. Not a security boundary. 1999 FreeBSD Jails Kernel-level isolation. PHK for R&D Associates. 2000 FreeBSD 4.0 Jails ship. Stable API begins. Still stable. 2002 Linux namespaces mount namespace first. 11 years of additions. 2006 cgroups Resource limits. Google engineers. Merged 2008. 2008 LXC First Linux container runtime. Namespaces + cgroups. 2013 Docker LXC wrapper + image format + registry + marketing. 14 years Jails solved the problem in 1999. Docker solved the marketing in 2013.

The Contrast

To run a Docker container on Linux, one requires: dockerd (the Docker daemon), containerd (the container runtime daemon), runc (the OCI runtime), an overlay filesystem driver, a NAT bridge for networking, iptables rules for port mapping, and cgroups plus namespaces for the actual isolation that the kernel provides but Docker wraps in three layers of indirection.

To run a FreeBSD jail, one requires: the kernel. Which is already running.

This is not a simplification for rhetorical effect. A jail starts with jail -c or service jail start. There is no daemon to install. There is no service to enable. There is no socket to worry about (the Docker socket is, infamously, a root-equivalent attack surface). The kernel provides isolation because that is what kernels do. Docker provides isolation by running three daemons that collectively ask the kernel to do what it could have done directly, had anyone thought to use the system calls that already existed.

The irony is not lost on the FreeBSD community. Linux needed three separate kernel subsystems (namespaces, cgroups, seccomp) developed over eleven years, plus a userspace daemon stack, to approximate what FreeBSD shipped as a single, coherent kernel feature in 2000. The Linux approach is not wrong; it works. But it is the difference between a room designed with a door and a room where someone cut a hole in the wall, fitted a frame, attached hinges, and called it a design pattern.

The Quiet Adoption

FreeBSD Jails have no marketing department. They have no corporate sponsor pushing them at conferences. They have no logo, no certification programme, no “Jail Desktop” for developers who want to run their IDE inside a container for reasons that remain mysterious. They have, instead, a handbook chapter and a quarter-century of production deployments.

Netflix serves video to 260 million subscribers using FreeBSD and jails on bare metal. Not because FreeBSD has better branding, but because when you are delivering terabits per second, the overhead of a container daemon is not a rounding error; it is a line item on the infrastructure bill. The PlayStation 4 and PlayStation 5 run on a FreeBSD-derived operating system. WhatsApp, before its acquisition, ran on FreeBSD. These are not hobby deployments. They are systems where performance and stability are not negotiable, and where the industry’s preferred container runtime was quietly evaluated and quietly declined.

The Lesson

FreeBSD Jails are technically beautiful not because they are old, though twenty-five years of stability is its own form of beauty. They are beautiful because they are correct. The kernel is the right place for isolation. A system call is the right interface. ZFS is the right storage layer. None of this required a daemon, a registry, a build tool, an orchestrator, or a conference talk explaining why you need all five.

The industry chose Docker. Not because Docker is technically superior, but because Docker is easier to sell. It has a logo. It has a company. It has a certification. It solves a problem that Linux created (no standard base system) by shipping the entire operating system with every application, which is the sort of solution that only makes sense if you have never seen the alternative.

The alternative has been running since 2000. It has not broken. It has not changed its API. It has not been acquired, relicensed, or pivoted into a platform play. It sits in the FreeBSD kernel, one system call deep, doing precisely what it was designed to do.

One system call. Zero daemons. Twenty-five years. No breaking changes. The most elegant containerisation is the one that never needed a container runtime.