18 January 2026 Read on LinkedIn

The Replacement: Shell Monitoring vs Prometheus

unix devops freebsd

The Replacement ■ Episode 06

Late 1990s. Sysadmin training in Germany. The instructors were former mainframe operators who had reluctantly accepted that Unix existed and was not going away. They taught us two things above all else: the shell is your interface, and GUIs are for people who do not understand what they are clicking.

We had a word for graphical administration tools: Klickibunti. Roughly translated: "clicky-colourful." It was not a compliment.

The Modern Monitoring Stack

Fast forward to 2026. You have a handful of servers. Perhaps five, perhaps ten. You would like to know when something goes wrong. The industry offers you a solution.

Prometheus scrapes metrics from exporters on every node. Grafana visualises them in dashboards that nobody looks at until the phone rings. AlertManager routes alerts through a YAML configuration file that takes longer to write than the check itself. Loki aggregates your logs, because apparently reading them on the machine where they were written is insufficiently sophisticated.

Four services. Each with its own configuration, its own storage, its own failure modes, its own update cycle. A monitoring stack that, rather ironically, needs its own monitoring.

When did you last look at those dashboards, incidentally? Before or after the alert email arrived?

The Shell Alternative

Here is what I run instead. A POSIX shell script. It checks df for disk usage, zpool status for ZFS health, swapinfo for memory pressure. It verifies that each jail is running. It checks that critical network services respond. If anything exceeds a threshold or fails to answer, it sends an email. If nothing is wrong, it sends nothing.

#!/bin/sh
# health-check.sh - runs every 5 minutes via cron

THRESHOLD=85
MAILTO="admin@example.com"
HOST=$(hostname)

# Disk usage
df -h | awk 'NR>1 && int($5)>'$THRESHOLD' {print $6": "$5}' | while read line; do
    echo "Disk warning on $HOST: $line" | mail -s "DISK: $HOST" "$MAILTO"
done

# ZFS pool health
zpool status -x | grep -v "all pools are healthy" && \
    zpool status | mail -s "ZFS: $HOST" "$MAILTO"

# Jail check
for jail in web mail dns; do
    jls -j "$jail" > /dev/null 2>&1 || \
        echo "Jail $jail is DOWN" | mail -s "JAIL: $HOST" "$MAILTO"
done

Scheduled by cron. Five minutes. Every day. No daemon to babysit, no time-series database to rotate, no dashboard to refresh. No mail means no problem.

*/5 * * * * /usr/local/bin/health-check.sh

One line in the crontab. That is the entire monitoring infrastructure.

ZFS: The Quotas That Save You at 3 AM

The beauty of this approach is that it pairs well with infrastructure that was designed to be administered, not merely observed. On FreeBSD, ZFS datasets with quotas mean that one misbehaving jail cannot consume all the disk space on the host. Each dataset has a ceiling. When it hits that ceiling, only that jail suffers. The host remains healthy. The other jails remain healthy.

# Set a 20G quota on the web jail dataset
zfs set quota=20G zroot/jails/web

# Check usage per dataset
zfs list -o name,used,quota,avail

This is not monitoring in the Grafana sense. This is architecture. You design the system so that failures are contained, and then you write a script to tell you when containment is under pressure. Prevention first, notification second.

The Scale Argument

I can hear the objection forming. "But this does not scale." Correct. If you manage 500 servers across multiple data centres, you need Prometheus. You need Grafana. You probably need an entire SRE team. This article is not for you. You already know what you need, and you have the budget to pay for it.

This article is for the rest of us. The ones running five servers, or ten. The freelancer with a VPS and a FreeBSD jail. The small team with a handful of machines that do real work. For us, the monitoring stack is not a solution. It is overhead. It is another thing that can break, another thing that needs updating, another thing that consumes resources that should be spent on the actual workload.

For five to ten servers, a shell script and a mail client is not a compromise. It is the correct tool for the job.

The best monitoring system is the one you actually read. If that is your inbox, use your inbox.

Practical notes. A daily health-check script typically covers: disk usage (df -h), ZFS pool status (zpool status -x), swap pressure (swapinfo), jail availability (jls), and network service response (a simple fetch or nc probe). Run it via cron every five minutes for alerts, once daily for a summary report. When does a proper stack make sense? When you have dozens of servers, compliance requirements that demand audit trails and retention policies, cross-team correlation needs, or managers who want dashboards. Entirely valid reasons, all of them. But if your server count fits on one hand, a POSIX shell script and an honest relationship with your inbox will serve you rather well.