Technical Beauty ■ Episode 26
In 1974, two programmers at Bell Labs had a problem. They needed to know what changed between two versions of a file. Not the files themselves. The difference.
James Hunt
implemented the algorithm. Douglas McIlroy designed the framework
around it. The result was diff: a tool that compares
two files and outputs exactly what changed. Nothing more.
McIlroy had already
invented Unix pipes
in 1973. The idea that programmes should pass text to each other
through simple connectors. diff was the natural
consequence: if everything flows as text, you need a tool that
can describe how one text became another.
The Basics
diff old.conf new.conf
Two files in. One output: every line that differs. That is the entire interface.
diff -u old.conf new.conf
Unified format. Minus means removed, plus means added. This is
what git diff shows you. This is what every pull
request renders. The format dates from
1990.
It has not changed since.
The Workflow That Built Open Source
diff -u old.conf new.conf > change.patch
patch < change.patch
Larry Wall wrote patch in
1985.
Two commands: diff creates the patch,
patch applies it. Before the web, before GitHub,
before pull requests: developers exchanged patches via email and
Usenet. Tiny text files describing changes. diff
made them. patch applied them.
Eric Raymond called patch
"the single tool that did more than any other to enable collaborative development over the Internet."
The Algorithm
In 1986, Eugene Myers published "An O(ND) Difference Algorithm" in Algorithmica. The key insight: finding the shortest edit script is equivalent to finding the shortest path in a graph. When files are similar (and they usually are), this is dramatically faster than brute force.
Git uses Myers by default.
Every git diff you have ever read was computed by a
paper from 1986.
The Chain
The Point
The same man who invented the pipe invented the tool that describes change. Every code review is a diff. Every pull request is a diff. Every CI pipeline that decides what to test starts with a diff. Every version control system since RCS stores history as a sequence of diffs.
GNU diffutils: 11,474 lines of code. Fifty-two years in production. No subscription. No breaking changes. No rewrite.
The tool that enabled collaborative software development is older than most of the people using it. Rather reassuring, that.