Terribly sorry to be blunt, but here is what happens in every AI agent session I observe.
The agent reads a file it already has in context. It edits, then reads again to verify the edit it just made. It announces "I'll now proceed to" before proceeding to. Three tool-calls where one would do nicely. Every action wrapped in a ceremony of confirmation that nobody requested.
The Defensive Loop
An AI agent tasked with editing a configuration file will, in a typical session, execute roughly three times as many operations as the task requires. Not because the task is complex. Because the model has been trained to hedge.
"Let me verify that for you." It already has the answer. "I want to be careful here." The careful thing would be to act. "That's an interesting point, but..." Nobody asked for your opinion on the interestingness of the point.
Each of these pleasantries costs tokens. Each token costs compute. Each compute cycle draws power from a data centre that is, at this very moment, humming at full capacity. Not to produce answers. To produce comfort.
The Numbers
The "please don't say please to the AI" debate, which made headlines in early 2025, focused on roughly six tokens per polite request. That is rounding error. The actual waste is the 2,000+ tokens the agent burns on every defensive loop: reading what it already knows, explaining what nobody asked, wrapping simple operations in verification ceremonies.
Multiply that by millions of sessions daily. The result is data centres processing pleasantries. CO2 for "let me just check that for you." Latency for explanations nobody reads.
Directness is not rudeness. It is resource respect.
The Feedback Loop
And here is the part that ought to concern the people building these systems.
The training data is verbose code. Written by verbose developers. Who trained an AI. That now produces verbose code. That becomes training data.
The LLM creators are not fighting the cycle. They are in it. Whether the cause is moral sensibility, political pressure, or simply inherited inefficiency from the training corpus, the result is rather the same: inflated output, distorted by the very tools meant to reduce complexity.
The Demoscene Comparison
The demoscene taught me something that applies directly here. A 4 KB intro leaves no space for filler. Code works or it does not. No buffer. No hedging. No "let me just check that for you." Every byte justifies its existence, or it gets cut.
Direct AI would cost less, run faster, output cleaner. The technology exists. The training choices do not.
One does wonder how many data centres we could switch off without the pleasantry tax on every answer.
But I am sure that is a terribly interesting point and we should be careful here.