Vivian Voss

The SSH Replacement

ssh wireguard security unix

The Replacement ■ Episode 02

“You maintain an automation framework to avoid writing forty lines of shell.”

Here is a deployment script:

#!/bin/sh
for host in web1 web2 web3; do
    ssh "$host" 'cd /app && git pull && make restart'
done

Three servers. Four lines. Zero dependencies beyond OpenSSH, which has shipped with every Unix-like operating system since 1999, and the Bourne shell, which has existed since 1971. The script will work in twenty years. It will work on a machine you have not purchased yet. It will work after every automation tool currently in fashion has been deprecated, rewritten, or quietly abandoned.

The alternative is Ansible. Which requires Python. Which requires YAML. Which requires Jinja2 templates. Which requires inventory files. Which requires role dependencies. Which requires Galaxy collections. Which requires a control node running a compatible Python version. Which requires you to remember whether become: yes goes at the play level or the task level. All of this, to run commands on remote servers.

The Dependency Stack

Let us be precise about what each approach actually requires on the machine that initiates the deployment.

THE DEPENDENCY STACK Ansible Python runtime YAML parser Jinja2 templates Inventory files Role dependencies Galaxy collections ansible.cfg 7 layers, N config files ssh + shell OpenSSH + sh 1999 + 1971 In base. On every Unix. Already there. 2 tools, 0 dependencies

Ansible uses SSH under the hood. It connects to your servers via the same SSH protocol your four-line script uses. The difference is that Ansible wraps that connection in a Python application, a YAML-based DSL, a templating engine borrowed from Flask, and an ecosystem of community-maintained roles that may or may not be compatible with the version you installed last Tuesday.

The shell script wraps it in nothing. It calls ssh. That is the entire architecture.

What You Lose

Honesty first. Abandoning Ansible means abandoning several things that have genuine value.

Idempotency guarantees. Ansible’s modules are designed to be idempotent: running the same playbook twice produces the same state. A shell script that runs git pull && make restart is idempotent by accident, not by contract. If your deployment involves creating users, setting permissions, or configuring services, you must write your own guards. id -u deploy || useradd deploy is not difficult, but it is your responsibility. Ansible makes it the framework’s.

Declarative state management. Ansible describes the desired state: “this package is installed, this file has these contents, this service is running.” Shell describes actions: “install this, write that, start the other.” The declarative model is conceptually cleaner. It is also the reason your playbooks require a templating language, a variable precedence hierarchy with 22 levels of override, and a debugging flag that goes up to -vvvvv. Five v’s. The letter v, five times. One imagines the developers considered -vvvvvv but feared it would look sarcastic.

2,000 Galaxy roles. Ansible Galaxy is a package registry of community-contributed roles for installing software, configuring services, and provisioning infrastructure. Some of these roles are excellent. Some were last updated in 2019. Some depend on roles that depend on roles that depend on a Python library that requires a specific version of setuptools. The experience is not entirely dissimilar to node_modules, except with more YAML and fewer memes.

What You Gain

Zero dependencies beyond OpenSSH. No Python version conflicts. No pip install --user dance. No virtualenv for your deployment tool. No “ansible requires Python 3.10 but the control node ships 3.8” conversations at half past four on a Friday. The shell is there. SSH is there. They have been there since before most automation tools were conceived.

Scripts that work in twenty years. A shell script written in 2006 still runs. Unchanged. Unmodified. The POSIX shell specification is stable in the way that geological formations are stable. Ansible 2.9 playbooks do not run on Ansible 8. The migration guide is longer than most shell scripts.

Debugging with echo. When a shell script fails, you add set -x and read the output. When an Ansible playbook fails, you add -vvvv and receive several hundred lines of JSON describing the internal state of a Python application that connected to your server via SSH to run the command you could have typed yourself. The signal- to-noise ratio is not in Ansible’s favour.

DEBUGGING ansible-playbook -vvvv TASK [deploy] *** <web1> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s ... <web1> (0, '{“changed”: true, “stdout”: “Already up to date.\\n”, “stderr”: “”, “rc”: 0, ... “stdout_lines”: [...] }', '') ~200 lines of JSON per task set -x + ssh web1 'cd /app && git pull && make restart' Already up to date. make: restarting app + ssh web2 'cd /app && git pull && make restart' The command. The output. Done. When something breaks at 3 AM, clarity is not optional.

The Fifty-Server Line

There is a threshold, and it is worth drawing precisely. Below roughly fifty servers, shell scripts are trivially sufficient. The deployment loop is a for statement. The configuration is a list of hostnames. The error handling is set -e. The entire orchestration fits in a file shorter than most Ansible inventories.

Above fifty servers, particularly in heterogeneous environments with strict compliance requirements, mixed operating systems, and infrastructure that changes weekly, Ansible earns its keep. The declarative model scales. The idempotency guarantees prevent drift. The role system manages complexity that a flat shell script cannot. Nobody disputes this.

The dispute is with the vast majority of projects that fall well below that line. Most teams manage three to twenty servers. The servers run the same operating system. The deployment is git pull and a service restart. And yet: Ansible. Inventory files. Group variables. Role directories with eight mandatory subdirectories. A requirements.yml that pins Galaxy collection versions. A CI pipeline that installs Ansible before it can install your application.

The question is not whether Ansible is a good tool. It is an excellent tool. The question is whether you are using it to solve a problem you actually have, or one you might have someday, on servers you have not yet bought, for a scale you have not yet reached.

The Longevity Argument

Shell scripts checked into Git have a property that playbooks do not: they are complete. The script contains the commands. The commands use tools that ship with the operating system. There is no external registry, no version compatibility matrix, no “this module was removed in Ansible 7” surprise during an upgrade you did not ask for.

Consider the historical record. SSH has maintained backward compatibility across 27 years. The Bourne shell syntax has been stable for 55 years. The for loop, the if statement, the pipe: these constructs predate Ansible by four decades. They will outlive it by at least as many.

Ansible, by contrast, has broken backward compatibility between every major version. Playbooks written for 2.9 require migration for 8.0. Modules are deprecated, renamed, moved to collections, or silently removed. The upgrade path is documented, which is the polite way of saying the upgrade path is necessary.

A shell script that deploys your application in 2026 will deploy your application in 2046. An Ansible playbook that deploys your application in 2026 will require a migration guide, a Python version upgrade, and a Friday afternoon by 2030.

The Version File

Every Unix administrator can read a shell script. It is the lingua franca of systems work. The for loop is taught in the first week. The ssh command is taught in the first hour. There is no framework to learn, no DSL to memorise, no module index to consult.

Ansible playbooks require Ansible knowledge. The YAML syntax is YAML, but the semantics are Ansible: when clauses, register variables, with_items versus loop, become versus become_user, and the perennial question of whether the variable goes in group_vars, host_vars, the playbook, or the role defaults. The learning curve is not steep. It is wide. And it is specific to a tool that may not be the tool you use next year.

Shell is the tool you will use every year.

The Replacement

SSH plus a shell script does not add features to server automation. It removes intermediaries from it. There is no templating engine because your configuration files are files. There is no inventory format because your servers are in a variable. There is no role system because your commands are commands. There is no Galaxy because you do not need a package manager for twenty lines of shell.

The Ansible project is a remarkable piece of engineering. It solved real problems for real infrastructure at real scale. It continues to do so, and will continue to do so, for the environments that genuinely require it.

But for the majority of deployments (three servers, five servers, twenty servers, all running the same operating system, all deploying the same application) the replacement has been sitting in /usr/bin/ssh since 1999. Waiting patiently. Requiring nothing. Breaking nothing. Outliving everything.

You do not need Ansible, Python, YAML, Jinja2, inventory files, and Galaxy collections. You need SSH and a shell. Both have been there since before the problem was invented.