Technical Beauty ✦ Episode 35
You have typed tcpdump -ni em0 'tcp port 443'
at three in the morning, with a customer on the phone, and
watched the lines scroll past in a small green miracle.
The command was unremarkable. The thing that made it
possible to type at three in the morning has been quietly
doing the work for thirty-seven years.
The point of this episode is not that tcpdump
is a useful command, which one rather hopes the reader
already knew. The point is that tcpdump is a
thin tool sitting on a thoughtfully layered architecture,
and the layering is the technical beauty. The thing the
network engineer types is the surface. The thing that
makes the surface work has not needed to be replaced since
1992.
A Short Origin Story
tcpdump was written in 1988 by Van Jacobson,
Craig Leres and Steven McCanne at the Network Research
Group of
Lawrence Berkeley Laboratory.
The same Van Jacobson who, the year before, had published
"Congestion Avoidance and Control"
at SIGCOMM '88, the paper that fixed the Internet by
introducing slow start, fast retransmit, and the
congestion-window heuristics that keep TCP from collapsing
under load. tcpdump came out of the same
research programme: in order to understand what TCP was
actually doing on the wire, the researchers needed a tool
that could watch it, and the existing tools (Sun's Network
Interface Tap among them) were not up to the task.
The first version of tcpdump borrowed code
from Sun's etherfind and was rewritten over
time by McCanne, who eventually replaced the underlying
packet-capture mechanism with something new. That
something new became the BSD Packet Filter.
The BSD Packet Filter
In December 1992, McCanne and Jacobson wrote a paper titled "The BSD Packet Filter: A New Architecture for User-level Packet Capture". It was presented at the USENIX Winter 1993 Conference in San Diego, where it won the Best Student Paper award. It is, by any measure, one of the most quietly consequential systems papers of the 1990s.
The problem the paper addressed was simple. Packet capture had been done in userspace, with the kernel copying every packet across the system-call boundary and userspace then filtering it. On busy networks, this was crushingly expensive: most of the packets would be discarded by the userspace filter anyway, and the kernel had paid the cost of copying them in vain. Sun's Network Interface Tap (NIT) improved on this by evaluating a stack-based filter expression inside the kernel, but the evaluator itself was slow.
BPF replaced the stack-based evaluator with a register-based
pseudo-machine: a tiny instruction set running on a few
virtual registers, optimised for predicate evaluation. A
tcpdump filter expression like
tcp port 443 is compiled by libpcap, in
userspace, into a short BPF program. The program is
uploaded to the kernel once, at the start of the capture
session, and the kernel then runs it on every packet
arriving on the chosen interface. Packets that pass the
filter are copied to userspace; packets that fail are
dropped on the floor. The paper measured BPF as up to
twenty times faster than its predecessor's filter, and up
to a hundred times faster than NIT overall on the same
hardware.
The architecture is unusual for a few reasons. The pseudo-machine is small, simple and verifiable: a kernel can inspect a BPF program before running it and reject anything that loops or reaches outside its bounds. The filter compiler lives in userspace, where it can be replaced or improved without touching the kernel. The boundary between userspace and kernel runs through the filter, not around it, which keeps both halves small.
libpcap, the Library That Ate the World
Around the same time, the LBL group extracted the
packet-capture interface from tcpdump into a
separate library: libpcap. The library
handles the platform-specific business of opening a
capture device, compiling filter expressions into BPF
programs, reading packets back from the kernel, and
parsing the capture-file format. On FreeBSD, OpenBSD,
NetBSD and macOS it uses the BPF device directly. On
Linux, where the BPF device historically did not exist, it
uses an equivalent kernel mechanism
(PF_PACKET, more recently extended via eBPF)
and presents the same API to userland.
The decision to factor libpcap out of
tcpdump is the move that made the rest of the
story possible. By 1995 or so, every serious network tool
that wanted to capture packets was using
libpcap instead of writing its own capture
code. The list grew steadily.
The list. tshark and Wireshark and
dumpcap. Zeek (formerly Bro, also originally
from LBL). snort and suricata
for intrusion detection. nmap for port
scanning. ngrep and dsniff. The
entire pcap-based observability ecosystem, from open
source to commercial network forensics. One does rather
get the impression nobody else has bothered to revisit the
problem, which is, in software, the highest compliment a
piece of work can receive.
The library's .pcap capture-file format became
the de facto standard for sharing packet captures between
tools. A .pcap file recorded by
tcpdump on a FreeBSD machine in 1996 can
still be opened by Wireshark on a Linux laptop in 2026.
Few file formats from the early 1990s have aged as
well.
On FreeBSD
tcpdump and libpcap live in the
FreeBSD base system; the kernel provides BPF under
/dev/bpfN.
No ports, no packages, no plugin model. Capturing packets
on a fresh FreeBSD installation is as immediate as
tcpdump -ni em0. The same is true on OpenBSD
and NetBSD.
tcpdump -ni em0 'tcp port 443'
tcpdump -ni em0 'host 10.0.0.5 and not port 22'
tcpdump -ni em0 -w session.pcap 'port 443'
Compact enough for a terminal, structured enough for
awk and grep, composable by
being text on standard output. The man page is, by
network-tooling standards, frightfully short.
On a FreeBSD host, the relationship between the tool, the
library and the kernel device is openly inspectable. The
userland source for tcpdump lives under
/usr/src/contrib/tcpdump. libpcap
lives under /usr/src/contrib/libpcap. The BPF
kernel device is in sys/net/bpf.c. The whole
stack, from the filter language the operator types to the
in-kernel pseudo-machine that evaluates it, is one
repository away from any administrator who wants to
understand what is happening on their machine.
The Linux Branch
Linux did not have BPF in the original sense. It had
libpcap ported over PF_PACKET,
which worked but did not have the same kernel-level filter
mechanism. Through the 2000s, the Linux kernel slowly grew
its own filter mechanism, originally called Linux Socket
Filtering, structurally borrowed from BPF (and explicitly
credited as such in the kernel documentation).
In December 2014, Alexei Starovoitov extended this
mechanism into what is now called
eBPF:
a wider register file, new instructions, a verifier in the
kernel, and the ability to attach BPF programs to kernel
events other than packet capture (system calls,
tracepoints, kprobes). eBPF is what powers a great deal of
contemporary Linux observability tooling: bcc,
bpftrace, the entire Cilium networking stack,
much of modern container observability. The Linux feature
that everyone has rather suddenly noticed in 2024 is BPF
with new opcodes and a verifier. The lineage runs back to
the same 1993 paper.
It is worth pausing on this. A two-page architecture
decision made by two researchers at LBL in 1992,
originally to make tcpdump fast enough to
capture packets on a busy Ethernet, is the foundation of
more or less every modern observability story being
written in 2026. The thing the engineer types still looks
the same. The cleverness underneath has been quietly
extended and re-extended for three decades.
The Point
The technical beauty of tcpdump is not the
tool. The tool is small. The technical beauty is the way
the responsibilities are split.
The user types a filter expression and reads a one-line summary per packet, because that is what the operator's brain can hold at three in the morning. The library compiles the expression and parses the bytes, because that is the bit where the format details live. The kernel runs the filter and drops everything else, because that is the bit where performance matters. Each layer does the one thing it is in the best position to do, and the boundaries between them are clean enough that the layers can be replaced independently. eBPF replaced the kernel layer; the others stayed where they were.
A man who had also given the world TCP congestion control wrote, with two LBL colleagues, the tool every network engineer reaches for first and the architecture every serious network tool sits on. One rather suspects he knew what the kernel was actually for.
The technical beauty is not the tool. The tool is small. The beauty is the layering: filter language at the surface, compiler in userspace, register-based pseudo-machine in the kernel. Each layer does the one thing it is in the best position to do, and the boundaries between them are clean enough that thirty years and one major Linux extension later, the surface still looks the same.