Vivian Voss

diff: The Tool That Enabled Collaboration

unix tooling devops

Technical Beauty ■ Episode 26

In 1974, two programmers at Bell Labs had a problem. They needed to know what changed between two versions of a file. Not the files themselves. The difference.

James Hunt implemented the algorithm. Douglas McIlroy designed the framework around it. The result was diff: a tool that compares two files and outputs exactly what changed. Nothing more.

McIlroy had already invented Unix pipes in 1973. The idea that programmes should pass text to each other through simple connectors. diff was the natural consequence: if everything flows as text, you need a tool that can describe how one text became another.

The Basics

diff old.conf new.conf

Two files in. One output: every line that differs. That is the entire interface.

diff -u old.conf new.conf

Unified format. Minus means removed, plus means added. This is what git diff shows you. This is what every pull request renders. The format dates from 1990. It has not changed since.

The Workflow That Built Open Source

diff -u old.conf new.conf > change.patch
patch < change.patch

Larry Wall wrote patch in 1985. Two commands: diff creates the patch, patch applies it. Before the web, before GitHub, before pull requests: developers exchanged patches via email and Usenet. Tiny text files describing changes. diff made them. patch applied them.

Eric Raymond called patch "the single tool that did more than any other to enable collaborative development over the Internet."

The diff Workflow File A old version File B new version diff 1974 .patch the delta patch RCS / Git delta storage

The Algorithm

In 1986, Eugene Myers published "An O(ND) Difference Algorithm" in Algorithmica. The key insight: finding the shortest edit script is equivalent to finding the shortest path in a graph. When files are similar (and they usually are), this is dramatically faster than brute force.

Git uses Myers by default. Every git diff you have ever read was computed by a paper from 1986.

The Chain

The Chain diff Built 1974 diff Hunt, McIlroy, Bell Labs 1982 RCS version history stored as diffs 1985 patch Larry Wall (later created Perl) 1986 Myers algorithm Algorithmica 1990 unified format Wayne Davison 2005 Git adopts Myers as default

The Point

The same man who invented the pipe invented the tool that describes change. Every code review is a diff. Every pull request is a diff. Every CI pipeline that decides what to test starts with a diff. Every version control system since RCS stores history as a sequence of diffs.

GNU diffutils: 11,474 lines of code. Fifty-two years in production. No subscription. No breaking changes. No rewrite.

The tool that enabled collaborative software development is older than most of the people using it. Rather reassuring, that.