In 1994, Kent Beck wrote SUnit for Smalltalk. The idea was admirably precise: test isolated units of logic. Pure functions. Parsers. Algorithms. Clear input, clear output, no ambiguity about what went wrong or where.
It was a tool for a specific job. And it was rather good at that job.
Then the industry got hold of it.
The Original
SUnit tested units: the clue, one might argue, is in the name. A function takes arguments, performs computation, returns a result. No database. No network. No filesystem. The test verifies that the computation is correct. If it fails, you know precisely where the fault lies, because there is nowhere else it could be.
This is the kind of code that rewards unit testing handsomely: a sorting algorithm, a date parser, a currency converter. Deterministic input, deterministic output. The test is a proof.
Beck himself was characteristically direct about the scope. On StackOverflow in 2008, he wrote: “I get paid for code that works, not for tests.” And later, with the kind of precision one wishes the industry had inherited: “Test as little as possible to reach a given level of confidence.”
One does wonder whether anyone read that second sentence.
The Copy
By the mid-2000s, “test everything” had become doctrine. Coverage targets appeared: 80%, then 90%. Build pipelines broke on insufficient coverage. Code review checklists demanded unit tests for every function, every method, every branch. The question was no longer “does this code benefit from a unit test?” but “where is the unit test for this code?”
The distinction matters. The first is engineering. The second is compliance.
And here is where the pattern fractures. Library code, the kind Beck was testing, has no side effects. That is rather the point. Application code is nothing but side effects. An endpoint hits a database, calls an external API, sends an email, writes to a queue, and returns HTML. There is no “isolated unit” in that chain. It is a procession of side effects pretending to be a function.
The industry’s solution to this inconvenience was, with hindsight, magnificently absurd.
The Mock Epidemic
If reality is too complicated to test, replace reality. Mock the database. Mock the API. Mock the email service. Mock the queue. Now your unit test tests … the mocks.
Forty lines of mock setup. Two lines of assertion. The test does not verify that the code works. It verifies that the code calls the things you told it to call, in the order you told it to call them. It is a loyalty test for your implementation details.
Refactor a method (extract a helper, rename a parameter, change the internal sequence) and fifty tests break. The behaviour is unchanged. The output is identical. The user notices nothing. But the mocks noticed everything, because the mocks are not testing behaviour. They are testing how the code works, not what it does.
This is not a safety net. It is a constraint on improvement.
The Kill Shot
Four hundred unit tests. All green. The build pipeline glows a reassuring shade of emerald. The team deploys with the confidence that only a perfect test suite can provide.
The checkout is broken.
No test ran the actual flow. Every dependency was mocked. The database mock returned what you told it to return. The payment API mock succeeded because you told it to succeed. The email mock sent nothing to no one, exactly as instructed. Four hundred tests, each one a small theatre production in which every actor reads from the script you wrote, and the audience applauds because nobody forgot their lines.
Meanwhile, on production, the curtain goes up and the stage is empty.
Four hundred green lights. Zero confidence. The mocks should have been the clue.
The Numbers Game
100% code coverage means every line was executed during testing. Not that every line works correctly. Not that the system functions. Not that a single user journey completes. Merely that the test runner visited every line, the way a tourist visits a cathedral: technically present, spiritually elsewhere.
The testing pyramid (many unit tests at the base, fewer integration tests in the middle, a handful of end-to-end tests at the top) was popularised by Google’s testing team. At Google, thousands of engineers share libraries that millions of projects depend upon. Unit testing those libraries is not merely sensible; it is essential. A broken utility function in a shared library cascades across the entire organisation.
You have a team of four building a shop.
The context is not the same. The pyramid was never meant to be a universal law. It was a strategy for a specific organisational structure at a specific scale. Copying the shape without the context is the defining pattern of this series.
The Timeline
The drift from Beck’s original intent to the industry’s interpretation is not a matter of opinion. It is a matter of record.
Twenty-one years from “test isolated logic” to “100% coverage as a key performance indicator.” The inventor said “test as little as possible.” The industry heard “test as much as possible” and built a compliance apparatus around the misunderstanding.
Kent Beck is the star witness against the dogmatisation of his own invention.
The Remedy
Unit tests are for units. Isolated logic. Pure computation. Functions where input determines output and nothing else intervenes. For these, unit tests remain the correct tool: fast, precise, and genuinely informative when they fail.
For everything else, the endpoints, the workflows, the chains of side effects that constitute actual application behaviour, integration tests. Fewer of them. Slower, yes. But they test what the user experiences, not what the mock framework expects.
Three lines. Real database. Real API. Real result. If it fails, the checkout is broken. If it passes, the checkout works. No theatre. No script. No audience applauding an empty stage.
Unit tests are for units. Integration tests are for systems. The mocks should have been the clue that you were testing the wrong thing at the wrong level.