The Replacement ■ Episode 02
“You maintain an automation framework to avoid writing forty lines of shell.”
Here is a deployment script:
#!/bin/sh
for host in web1 web2 web3; do
ssh "$host" 'cd /app && git pull && make restart'
done
Three servers. Four lines. Zero dependencies beyond OpenSSH, which has shipped with every Unix-like operating system since 1999, and the Bourne shell, which has existed since 1971. The script will work in twenty years. It will work on a machine you have not purchased yet. It will work after every automation tool currently in fashion has been deprecated, rewritten, or quietly abandoned.
The alternative is
Ansible.
Which requires Python. Which requires YAML. Which requires Jinja2 templates.
Which requires inventory files. Which requires role dependencies. Which
requires Galaxy collections. Which requires a control node running
a compatible Python version. Which requires you to remember whether
become: yes goes at the play level or the task level.
All of this, to run commands on remote servers.
The Dependency Stack
Let us be precise about what each approach actually requires on the machine that initiates the deployment.
Ansible uses SSH under the hood. It connects to your servers via the same SSH protocol your four-line script uses. The difference is that Ansible wraps that connection in a Python application, a YAML-based DSL, a templating engine borrowed from Flask, and an ecosystem of community-maintained roles that may or may not be compatible with the version you installed last Tuesday.
The shell script wraps it in nothing. It calls ssh.
That is the entire architecture.
What You Lose
Honesty first. Abandoning Ansible means abandoning several things that have genuine value.
Idempotency guarantees. Ansible’s modules
are designed to be idempotent: running the same playbook
twice produces the same state. A shell script that runs
git pull && make restart is idempotent by
accident, not by contract. If your deployment involves creating
users, setting permissions, or configuring services, you must
write your own guards. id -u deploy || useradd deploy
is not difficult, but it is your responsibility. Ansible makes it
the framework’s.
Declarative state management. Ansible describes
the desired state: “this package is installed, this file
has these contents, this service is running.” Shell describes
actions: “install this, write that, start the other.”
The declarative model is conceptually cleaner. It is also the
reason your playbooks require a templating language, a variable
precedence hierarchy with
22 levels of override,
and a debugging flag that goes up to -vvvvv. Five
v’s. The letter v, five times. One imagines the developers
considered -vvvvvv but feared it would look sarcastic.
2,000 Galaxy roles. Ansible Galaxy is a package
registry of community-contributed roles for installing software,
configuring services, and provisioning infrastructure. Some of
these roles are excellent. Some were last updated in 2019. Some
depend on roles that depend on roles that depend on a Python
library that requires a specific version of setuptools.
The experience is not entirely dissimilar to node_modules,
except with more YAML and fewer memes.
What You Gain
Zero dependencies beyond OpenSSH. No Python
version conflicts. No pip install --user dance.
No virtualenv for your deployment tool. No “ansible
requires Python 3.10 but the control node ships 3.8”
conversations at half past four on a Friday. The shell is
there. SSH is there. They have been there since before most
automation tools were conceived.
Scripts that work in twenty years. A shell script written in 2006 still runs. Unchanged. Unmodified. The POSIX shell specification is stable in the way that geological formations are stable. Ansible 2.9 playbooks do not run on Ansible 8. The migration guide is longer than most shell scripts.
Debugging with echo. When a shell
script fails, you add set -x and read the output.
When an Ansible playbook fails, you add -vvvv and
receive several hundred lines of JSON describing the internal
state of a Python application that connected to your server via
SSH to run the command you could have typed yourself. The signal-
to-noise ratio is not in Ansible’s favour.
The Fifty-Server Line
There is a threshold, and it is worth drawing precisely. Below
roughly fifty servers, shell scripts are trivially sufficient.
The deployment loop is a for statement. The
configuration is a list of hostnames. The error handling is
set -e. The entire orchestration fits in a file
shorter than most Ansible inventories.
Above fifty servers, particularly in heterogeneous environments with strict compliance requirements, mixed operating systems, and infrastructure that changes weekly, Ansible earns its keep. The declarative model scales. The idempotency guarantees prevent drift. The role system manages complexity that a flat shell script cannot. Nobody disputes this.
The dispute is with the vast majority of projects that fall
well below that line. Most teams manage three to twenty servers.
The servers run the same operating system. The deployment is
git pull and a service restart. And yet: Ansible.
Inventory files. Group variables. Role directories with eight
mandatory subdirectories. A requirements.yml that
pins Galaxy collection versions. A CI pipeline that installs
Ansible before it can install your application.
The question is not whether Ansible is a good tool. It is an excellent tool. The question is whether you are using it to solve a problem you actually have, or one you might have someday, on servers you have not yet bought, for a scale you have not yet reached.
The Longevity Argument
Shell scripts checked into Git have a property that playbooks do not: they are complete. The script contains the commands. The commands use tools that ship with the operating system. There is no external registry, no version compatibility matrix, no “this module was removed in Ansible 7” surprise during an upgrade you did not ask for.
Consider the historical record. SSH has maintained backward
compatibility across 27 years. The Bourne shell syntax has been
stable for 55 years. The for loop, the
if statement, the pipe: these constructs
predate Ansible by four decades. They will outlive it by at
least as many.
Ansible, by contrast, has broken backward compatibility between every major version. Playbooks written for 2.9 require migration for 8.0. Modules are deprecated, renamed, moved to collections, or silently removed. The upgrade path is documented, which is the polite way of saying the upgrade path is necessary.
A shell script that deploys your application in 2026 will deploy your application in 2046. An Ansible playbook that deploys your application in 2026 will require a migration guide, a Python version upgrade, and a Friday afternoon by 2030.
The Version File
Every Unix administrator can read a shell script. It is the
lingua franca of systems work. The for loop is
taught in the first week. The ssh command is
taught in the first hour. There is no framework to learn, no
DSL to memorise, no module index to consult.
Ansible playbooks require Ansible knowledge. The YAML syntax
is YAML, but the semantics are Ansible: when
clauses, register variables, with_items
versus loop, become versus
become_user, and the perennial question of whether
the variable goes in group_vars,
host_vars, the playbook, or the role defaults.
The learning curve is not steep. It is wide. And it is
specific to a tool that may not be the tool you use next year.
Shell is the tool you will use every year.
The Replacement
SSH plus a shell script does not add features to server automation. It removes intermediaries from it. There is no templating engine because your configuration files are files. There is no inventory format because your servers are in a variable. There is no role system because your commands are commands. There is no Galaxy because you do not need a package manager for twenty lines of shell.
The Ansible project is a remarkable piece of engineering. It solved real problems for real infrastructure at real scale. It continues to do so, and will continue to do so, for the environments that genuinely require it.
But for the majority of deployments (three servers,
five servers, twenty servers, all running the same operating
system, all deploying the same application) the
replacement has been sitting in /usr/bin/ssh
since 1999. Waiting patiently. Requiring nothing. Breaking
nothing. Outliving everything.
You do not need Ansible, Python, YAML, Jinja2, inventory files, and Galaxy collections. You need SSH and a shell. Both have been there since before the problem was invented.