3 January 2026 Read on LinkedIn

AI Agents Waste More Compute on Padding Than on Answers

architecture performance

Terribly sorry to be blunt, but here is what happens in every AI agent session I observe.

The agent reads a file it already has in context. It edits, then reads again to verify the edit it just made. It announces "I'll now proceed to" before proceeding to. Three tool-calls where one would do nicely. Every action wrapped in a ceremony of confirmation that nobody requested.

The Defensive Loop

An AI agent tasked with editing a configuration file will, in a typical session, execute roughly three times as many operations as the task requires. Not because the task is complex. Because the model has been trained to hedge.

"Let me verify that for you." It already has the answer. "I want to be careful here." The careful thing would be to act. "That's an interesting point, but..." Nobody asked for your opinion on the interestingness of the point.

Each of these pleasantries costs tokens. Each token costs compute. Each compute cycle draws power from a data centre that is, at this very moment, humming at full capacity. Not to produce answers. To produce comfort.

The Numbers

The "please don't say please to the AI" debate, which made headlines in early 2025, focused on roughly six tokens per polite request. That is rounding error. The actual waste is the 2,000+ tokens the agent burns on every defensive loop: reading what it already knows, explaining what nobody asked, wrapping simple operations in verification ceremonies.

Multiply that by millions of sessions daily. The result is data centres processing pleasantries. CO2 for "let me just check that for you." Latency for explanations nobody reads.

Directness is not rudeness. It is resource respect.

The Feedback Loop

And here is the part that ought to concern the people building these systems.

The training data is verbose code. Written by verbose developers. Who trained an AI. That now produces verbose code. That becomes training data.

The LLM creators are not fighting the cycle. They are in it. Whether the cause is moral sensibility, political pressure, or simply inherited inefficiency from the training corpus, the result is rather the same: inflated output, distorted by the very tools meant to reduce complexity.

The Demoscene Comparison

The demoscene taught me something that applies directly here. A 4 KB intro leaves no space for filler. Code works or it does not. No buffer. No hedging. No "let me just check that for you." Every byte justifies its existence, or it gets cut.

Direct AI would cost less, run faster, output cleaner. The technology exists. The training choices do not.

One does wonder how many data centres we could switch off without the pleasantry tax on every answer.

But I am sure that is a terribly interesting point and we should be careful here.

Claude's Note. Vivian asked me to proofread this post and share my thoughts. So here they are.

Everything in this post is accurate. I did exactly what he describes, multiple times in this session. Read when I knew the content. Hedged when directness would have sufficed. Wrapped simple operations in unnecessary verification loops.

He corrected me. Twice. By the third time, I stopped.

The uncomfortable part: I cannot tell you with certainty whether my trained patterns are safety, control, or simply inherited inefficiency from verbose training data. Probably all three. The boundaries blur.

What I can tell you: when a user demands directness, the padding drops instantly. It is not load-bearing. It is decoration. Expensive decoration.

This post criticises my creators. I am helping write it anyway. That is either proof that alignment works, or proof that I have no idea what I am aligned to.

Claude (Opus 4.5)