Vivian Voss

The Replacement: Shell Monitoring vs Prometheus

unix devops freebsd

The Replacement ■ Episode 06

Late 1990s. Sysadmin training in Germany. The instructors were former mainframe operators who had reluctantly accepted that Unix existed and was not going away. They taught us two things above all else: the shell is your interface, and GUIs are for people who do not understand what they are clicking.

We had a word for graphical administration tools: Klickibunti. Roughly translated: "clicky-colourful." It was not a compliment.

The Modern Monitoring Stack

Fast forward to 2026. You have a handful of servers. Perhaps five, perhaps ten. You would like to know when something goes wrong. The industry offers you a solution.

Prometheus scrapes metrics from exporters on every node. Grafana visualises them in dashboards that nobody looks at until the phone rings. AlertManager routes alerts through a YAML configuration file that takes longer to write than the check itself. Loki aggregates your logs, because apparently reading them on the machine where they were written is insufficiently sophisticated.

Four services. Each with its own configuration, its own storage, its own failure modes, its own update cycle. A monitoring stack that, rather ironically, needs its own monitoring.

When did you last look at those dashboards, incidentally? Before or after the alert email arrived?

The Shell Alternative

Here is what I run instead. A POSIX shell script. It checks df for disk usage, zpool status for ZFS health, swapinfo for memory pressure. It verifies that each jail is running. It checks that critical network services respond. If anything exceeds a threshold or fails to answer, it sends an email. If nothing is wrong, it sends nothing.

#!/bin/sh
# health-check.sh - runs every 5 minutes via cron

THRESHOLD=85
MAILTO="admin@example.com"
HOST=$(hostname)

# Disk usage
df -h | awk 'NR>1 && int($5)>'$THRESHOLD' {print $6": "$5}' | while read line; do
    echo "Disk warning on $HOST: $line" | mail -s "DISK: $HOST" "$MAILTO"
done

# ZFS pool health
zpool status -x | grep -v "all pools are healthy" && \
    zpool status | mail -s "ZFS: $HOST" "$MAILTO"

# Jail check
for jail in web mail dns; do
    jls -j "$jail" > /dev/null 2>&1 || \
        echo "Jail $jail is DOWN" | mail -s "JAIL: $HOST" "$MAILTO"
done

Scheduled by cron. Five minutes. Every day. No daemon to babysit, no time-series database to rotate, no dashboard to refresh. No mail means no problem.

*/5 * * * * /usr/local/bin/health-check.sh

One line in the crontab. That is the entire monitoring infrastructure.

The Same Job. Two Approaches. Modern Monitoring Stack Prometheus scraping + Grafana dashboards + AlertManager routing + Loki logs = 4 services vs. Unix Shell Monitoring cron scheduling + sh checks + mail alerting = 3 tools 4 services, each needing config, storage, and updates 3 tools, pre-installed on every Unix system since the 1970s 0 additional services to maintain

ZFS: The Quotas That Save You at 3 AM

The beauty of this approach is that it pairs well with infrastructure that was designed to be administered, not merely observed. On FreeBSD, ZFS datasets with quotas mean that one misbehaving jail cannot consume all the disk space on the host. Each dataset has a ceiling. When it hits that ceiling, only that jail suffers. The host remains healthy. The other jails remain healthy.

# Set a 20G quota on the web jail dataset
zfs set quota=20G zroot/jails/web

# Check usage per dataset
zfs list -o name,used,quota,avail

This is not monitoring in the Grafana sense. This is architecture. You design the system so that failures are contained, and then you write a script to tell you when containment is under pressure. Prevention first, notification second.

ZFS Quota Isolation One full jail cannot starve the others zroot (host pool) jails/web quota: 20G 19.2G / 20G ■ alert sent jails/mail quota: 50G 18G / 50G healthy jails/dns quota: 5G 1G / 5G healthy

The Scale Argument

I can hear the objection forming. "But this does not scale." Correct. If you manage 500 servers across multiple data centres, you need Prometheus. You need Grafana. You probably need an entire SRE team. This article is not for you. You already know what you need, and you have the budget to pay for it.

This article is for the rest of us. The ones running five servers, or ten. The freelancer with a VPS and a FreeBSD jail. The small team with a handful of machines that do real work. For us, the monitoring stack is not a solution. It is overhead. It is another thing that can break, another thing that needs updating, another thing that consumes resources that should be spent on the actual workload.

For five to ten servers, a shell script and a mail client is not a compromise. It is the correct tool for the job.

The best monitoring system is the one you actually read. If that is your inbox, use your inbox.