The Unix Way ⊣ Episode 19
There is a class of error message that every engineer who
has touched a long-running system knows by reflex.
"Cannot remove: text file busy." "Address already in
use." "Mount point is busy." And, perhaps best of all,
the silent contradiction in which df
reports a filesystem at ninety per cent while
du can only account for half the bytes.
They are different messages, but they all refuse the
same question: which process is holding this open?
FreeBSD and Linux both answer the question, and they answer it properly. They keep their own tools, in their own shapes, for what is at heart the same enquiry to the kernel. The tools, and the shapes they prefer, are the substance of this episode.
FreeBSD: procstat (and fstat)
The modern tool is
procstat(1),
introduced by Robert N M Watson and shipped with FreeBSD
9.0 in 2012. It is built on top of
libprocstat(3),
a stable, well-documented C library that every program
needing process state can link against directly. The
FreeBSD ecosystem (top, gstat,
debugger front-ends, monitoring tools) draws from one
common source, and the command-line utility is itself a
thin client of that library. procstat is
the per-process introspection front-end:
procstat -f 4711 # file-descriptor view, one process
procstat -af # file-descriptor view, all processes
procstat -t 4711 # thread view, with states
procstat -v 4711 # virtual-memory map
procstat -k 4711 # kernel stack
procstat -s 4711 # security credentials
Different flag, same tool, same library underneath.
procstat is the focused interview with one
process; the library is the quietly important half,
because the next tool that needs to ask the same
questions does not have to parse text or scrape
/proc. It links against
libprocstat, like every other tool on the
system.
Its classical sibling is
fstat(1),
which has served the base system since 4.3BSD-Tahoe in
1988 and is still installed by default. Where
procstat is the per-process interview,
fstat is the system-wide ledger: every open
file, by every process, on every filesystem the kernel
knows. A handful of flags narrows the listing:
fstat -p 4711 # every open file held by pid 4711
fstat -u www # everything the www user has open
fstat /var/log/messages # every process holding this exact path
fstat -f /var # everything open under the /var mountpoint
fstat is wonderfully general, and for one
specific diagnostic it remains the first reach. The case
that most often justifies either tool in production is
the deleted-but-held file. A long-running daemon (a
database, a log forwarder, a web server) opens a
logfile, then someone or something unlinks the file from
the filesystem while the daemon still holds it open. The
directory entry is gone, so du cannot see
the bytes; but the inode and its blocks remain allocated
until the last fd is closed, so df still
counts them. The disk fills, the alert fires, no obvious
file is to blame.
fstat will show such a file with a
- in the inode column where the link
normally lives:
fstat -p 4711 | awk '$5=="-"' # files this pid holds but that are unlinked
The cure is to close the fd. In practice that means a
graceful restart, a SIGHUP to a daemon that
knows how to reopen its log, or in well-designed
services a deliberate logrotate hook. Identifying the
offender takes seconds with the right tool. Without it,
it takes an evening.
Linux: lsof
On Linux the same question is answered by
lsof(8),
written by Vic Abell at Purdue University and first
released to comp.sources.unix in 1991. Two
years before that, in 1989, he had published ports of
BSD's fstat and ofiles
commands to DYNIX, SunOS and ULTRIX, and
lsof is in a direct line of descent: it is,
recognisably, a generalisation of the BSD idea, written
to work across the many Unix-likes that the early 1990s
contained.
lsof is, in the kindest sense, the universal
hammer. It understands regular files, directories,
sockets (TCP, UDP, raw, Unix-domain), pipes, anonymous
inodes, character and block devices, the various
synthetic entries the kernel exposes through
/proc, and a great many things besides. The
flag vocabulary is dense; the basics suffice for most
working questions:
lsof -p 4711 # everything pid 4711 has open
lsof +D /var/log # everything open beneath a directory tree
lsof -i :443 # processes holding TCP/UDP port 443
lsof -u www-data # everything one user has open
lsof +L1 # files with link count < 1: deleted but held
lsof -n -P # skip DNS and service-name lookups (faster)
lsof +L1 is the Linux answer to the FreeBSD
awk '$5=="-"' pattern: name the
deleted-but-held files and the processes holding them.
It is the command you reach for when df and
du disagree, and it is the right reach the
first time.
The thing to understand about lsof on modern
Linux is that almost everything it reports is derived
from /proc. Each running process has a
directory /proc/<pid>/ containing a
fd/ subdirectory of symbolic links to the
actual files (or sockets, or pipes) that file
descriptors point to, and the kernel-maintained
net/ files describe socket state.
lsof is, in a real sense, a structured tour
of /proc, presented as one consistent
output. If /proc is the kernel's public
filing cabinet, lsof is the librarian who
knows where everything is.
The Shape
Two answers to the same question, two different shapes.
FreeBSD splits the work. procstat is the
focused per-process introspection, fstat
the broad system view that has been part of the base
system since 1988, libprocstat the C
library that any tool can link against to ask the same
questions in its own way. Each piece does one thing and
does it well; the pieces compose, and the next tool that
needs to know what process holds what does not need to
parse text or scrape /proc. It reads from
libprocstat, like everyone else on the
system.
Linux folds the same work into one large, lovingly
maintained binary. lsof understands every
kind of open file and every kind of socket; it reads
from /proc and from kernel symbol tables
and from netlink and from a great many other places, and
it presents the union as one consistent output. There is
no liblsof, because lsof is
itself the interface; its sustained existence as one
program, with one author and now one maintainer
organisation, is what makes its breadth possible.
Neither shape is wrong. The Linux model trades modularity
at the C-library layer for completeness at the
command-line layer, and the result is a single tool that
handles every variety of "open" the kernel exposes. The
FreeBSD model trades that single-binary breadth for a
layered design in which every program that needs process
state asks the same library, in the same way, and the
user-facing tools (procstat,
fstat, top, gstat,
debuggers) are thin clients of that library. The first
is delightful when you have one weird question and a
hurry to answer it. The second is delightful when you
are writing the next tool, or when you want to be sure
that top and your monitoring agent and your
shell prompt are all reading the same kernel reality.
The Point
Both ship with the OS. Neither asks you to install a
runtime. Neither wants JSON. Whether your daily reach is
procstat -f or lsof +L1, the
operative discipline is the same: when a system refuses
to do what you asked, ask it precisely what is in the
way. The kernel will tell you, plainly, on either
platform.
The Unix way prefers parts that compose;
lsofprefers parts that arrive together. The right answer to "which one?" is "whichever ships with the box in front of you, and learn its flags before the page goes off."