Technical Beauty ■ Episode 25
The Missing Tool
By 2012, JSON had become the undisputed lingua franca of the web. Every API returned it. Every configuration file spoke it. Every log aggregator produced it. And the Unix toolbox, refined over forty years for line-oriented text, had precisely nothing for it.
grep sees lines. awk sees fields.
sed sees patterns. None of them understand a tree.
A JSON document is not flat. It nests. It branches. It contains
arrays within objects within arrays, and the moment you need to
extract a value three levels deep, every classic Unix tool reaches
for its hammer and discovers the nail is a fractal.
Stephen Dolan, a PhD student at the University of Cambridge, resolved this quietly in 2012. He did not write a viewer. He did not write a validator. He wrote a language.
A Language, Not a Utility
This is the distinction that earns jq its place among beautiful tools.
grep is a filter. sed is a transformer.
awk is, admittedly, also a language, but one designed for
tabular records. jq is a fully functional programming language with
generators, backtracking, and immutable values, compiled to a
stack-based bytecode VM.
It exists for precisely one purpose: transforming JSON on the command
line. stdin in, stdout out. The Unix pipeline, applied to structured data.
The comparison is instructive. To extract a nested value from an API response, Python needs two imports and a runtime:
python3 -c "import json,sys; print(json.load(sys.stdin)['results'][0]['name'])"
jq needs one expression:
jq '.results[0].name'
Same output. 84 characters versus 22. One requires a 50 MB runtime installed on the target machine. The other is an 822 KB binary that does not link against anything except libc. One is a general-purpose language pressed into service for a specific job. The other was born for it.
The Vital Statistics
Approximately 510 KB of C source code, spread across 21 files. A single binary: 822 KB on macOS ARM, 2.1 MB on Linux when statically linked. 33,700+ GitHub stars and 214 contributors, under the MIT licence since its first release in 2012, unchanged.
Those are the numbers. Here is what they mean: jq is smaller
than most configuration files in a typical Node.js project.
It is smaller than the node_modules folder of
a "Hello World" Express application by approximately four
orders of magnitude. And it has been parsing JSON reliably
since the year Docker was invented, without a single breaking
change to its core language.
The Spec That Arrived Twelve Years Late
The language had no formal specification until 2024. For twelve
years, jq was defined entirely by its own source code. If you
wanted to know what .foo | select(. > 3) did, you
read the implementation, or you ran it. The documentation was
useful. The source was canonical.
In March 2024, Michael Färber published the first formal specification on arXiv. Twelve years after Dolan wrote the implementation. The specification agreed with the code. Not the other way round.
Rather Unix, that.
The implication is worth pausing on. A language whose behaviour was so consistent, so unsurprising, so obviously correct that it ran for twelve years without a formal spec and nobody noticed the absence. The spec, when it arrived, was descriptive, not prescriptive. It documented what jq already did. Try that with JavaScript.
The Streaming Parser
This is the section where jq quietly demonstrates that being small does not mean being limited.
In regular mode, jq loads the entire JSON document into memory, builds the tree, and operates on it. For a typical API response, this is instant. For ten million JSON items, it consumes approximately 5.5 GB of memory.
In streaming mode (--stream), jq emits path-value
pairs as it reads, without building the tree. The same ten million
items: 2 MB. At three in the morning, with a 40 GB log file and
a production server that would rather not swap, that difference
is not academic.
The Sincerest Form of Flattery
When the community decided that jq could be improved, they did not replace the language. They rewrote the engine.
jaq, written in Rust by Michael Färber (the same person who later formalised the spec), is 5 to 10 times faster on some benchmarks. gojq, written in Go, is a pure reimplementation. Both speak the same syntax. Both accept the same filters. Both produce the same output.
This is the strongest evidence of design quality a tool can
receive. One does not improve sed by inventing a
new grammar. One rewrites the engine. The jq language was so
precisely right that two independent reimplementations in two
different languages adopted it wholesale. The design was correct
the first time. The performance was not.
The Custodian Changes, the Tool Endures
Dolan left the project around 2019 for Jane Street, where he now works on the OCaml compiler. A PhD in Algebraic Subtyping, supervised by Alan Mycroft at Cambridge, followed by a career designing type systems for a trading firm. The pedigree explains the language: jq feels like it was written by someone who understands type theory. It was.
The community formed jqlang. Version 1.7 arrived in September 2023, after a five-year gap since 1.6. Version 1.8 followed in 2025. The tool did not notice the absence of its creator. It continued parsing. One might observe that this is rather the point of writing a tool that does one thing well.
The Point
Forty years of Unix tools for flat text. Forty years of
grep and awk and sed
and sort and cut and
paste, each doing one thing brilliantly, all of
them blind to the data format that had quietly taken over the
world.
One PhD student at Cambridge added trees. He wrote a functional language with generators and backtracking, compiled it to a bytecode VM, shipped it as an 822 KB binary, and walked away to work on type theory. The community rewrote the engine twice, in two different languages, and kept the grammar untouched. Someone formalised the specification twelve years later and found it agreed with the implementation. The tool spent five years without its creator and continued parsing without complaint.
822 KB. Technical beauty emerges from reduction.