Technical Beauty ⊣ Episode 38
A maintainer sends you a fix. Not as a branch on a server somewhere, not as a pull request awaiting your authentication token, not as an invitation to fetch a remote you have not heard of. A text file, attached to an email or pasted into an issue, with plus signs and minus signs and a few lines of plain context around them. You drop it onto your tree, type one command, and your code is current. No credentials, no network, no infrastructure.
The format that taught the world to ship a change in a dozen lines arrived in 1985, written by a man who would invent Perl two years later. It is, by some distance, the smallest unit of code distribution in widespread use, and it is the format every modern code-review tool, every commit, every diff in your terminal still speaks.
The Format
A patch is a recipe, not a snapshot. It does not contain the new file; it contains the change that turns the old file into the new one. That is the whole conceptual move, and it is the reason a fix to a million-line codebase can travel as twenty lines of email.
The file header names the source and destination paths.
Each hunk in the file is a small unit of change with its
own little header (the line ranges in the old and new
versions: @@ -42,7 +42,9 @@), followed by
the lines themselves: a few lines of context, each
prefixed with a space; the lines to remove, each
prefixed with -; the lines to add, each
prefixed with +; and a few more lines of
context to close. The context is what allows the apply
tool to find the right place in the file even if the
surrounding code has drifted by a few lines since the
patch was generated.
That is the entire vocabulary. A header, hunks, plus and minus and space. The whole standard fits on a page, and it has carried every code change in every open-source project for the better part of four decades.
The Surface
In daily use the idiom is two commands. Make a patch:
diff -u original modified > change.patch
Apply a patch:
patch -p1 < change.patch
The -u flag tells diff to use the unified
format, which is the one every modern tool produces. The
-p1 flag tells patch to strip one leading
directory from the paths recorded in the file, which is
how you make a patch generated against
a/src/main.c apply to
src/main.c in your tree. The
leading-directory game is a convention that grew up
around how patches were generated against named source
trees and is, after thirty years, simply the gesture one
learns.
A handful of further flags carry most of the rest of the daily work:
patch -R < change.patch # reverse: undo the patch
patch --dry-run -p1 < change.patch # check, do not write
patch -p1 -i change.patch # explicit input file
When a hunk does not fit cleanly, the patch tool tries to
apply it with what it calls line-fuzz: it allows the
context lines to be a little wrong, within a small
tolerance, on the theory that the file has drifted but
not changed beyond recognition. If even that fails, it
writes the rejected hunk to a .rej file
beside the target, plainly, with the failing hunk in the
original format. You can read the reject, edit by hand,
and try again. The tool is honest about its limits, and
it tells you exactly what it could not do. Beauty, here,
includes telling the truth about failure.
On FreeBSD
FreeBSD ships patch in the base system at
/usr/bin/patch, BSD-licensed, descended
directly from
Larry Wall's original source line.
OpenBSD and NetBSD carry the same lineage; macOS does
too. The tool is simply present on a fresh install, no
package required.
GNU patch,
part of the GNU project and licensed under the GPL, is a
separate fork from the same root. It has grown some
extra extensions over the years (additional file-name
heuristics, more flexible handling of edge cases) but
the on-disk format both tools read and write is the
same. A patch produced by git diff on one
machine applies on any machine with either
implementation. That interoperability after forty years
is not an accident; it is what one gets when the format
is small enough to specify completely and honestly.
This series prizes that property of FreeBSD's choices:
keep the lean BSD-licensed original in base, where it is
one of the few tools every engineer has on day one. The
GPL alternative is a pkg install away if
you need a specific extension. For the daily load, the
in-base tool is the whole tool.
The Lineage
Larry Wall posted patch 1.3 to the mod.sources newsgroup on 8 May 1985,
from his desk at NASA's Jet Propulsion Laboratory in
Pasadena. He had already written
rn, the news reader, the year before;
Perl was still two years away. Wall's patch
borrowed the diff output that the BSD
diff(1) tool already produced (in the older
context-diff format) and turned it from a thing humans
read into a thing programs apply. That move, treating
diff output as a machine-readable changeset rather than
a human report, is the conceptual reduction the rest of
the story rests on.
The format Wall worked with then was the context diff: each hunk listed several lines of context before and after the change, with the old and new versions in separate blocks. It worked, but it was verbose. In August 1990, Wayne Davison posted unidiff to comp.sources.misc, volume 14: a new format that interleaved the deletions and insertions into a single block, sharing the context between them, and saved roughly a quarter of the bytes on a typical patch. Richard Stallman folded unidiff support into GNU diff 1.15 in January 1991, the patch tool learned to read it shortly after, and the unified format has been the lingua franca ever since. Git produces it. GitHub shows it. Every code-review tool you have used speaks it.
The two implementations alive today (the BSD line carried in FreeBSD, OpenBSD, NetBSD and macOS; the GNU fork in the GNU project's repositories) both descend from Wall's 1985 source, and both remain interoperable to this day. A bug found in 2020 by Warner Losh, while restoring 2.11BSD, traced a thirty-five-year-old corner case in the original parsing and was triaged identically against both lines. The format is so small that two independent maintainer lines have kept it stable for forty-one years without drifting.
The standard worth keeping is the one small enough to be a sentence. Wall wrote one in 1985, Davison sharpened it in 1990, and every modern code review still depends on it. A format becomes universal when it is small enough to be a sentence. Wall's was.