Vivian Voss

The Permanent Beta

architecture devops performance

Beta Stories ■ Episode 01

New series. Because software used to ship.

The Disc

Software used to come on a disc. A bug on a pressed CD was a recall, not a hotfix. You shipped when it worked, because shipping when it did not was expensive. Microsoft reportedly spent $300 million marketing Windows 95. A critical defect at that scale meant physical media returned to the factory, corrected, re-pressed, and re-shipped. The economics enforced a simple discipline: be finished before you ship, or pay the price after.

Then the internet removed the disc. And with it, the only mechanism that made "done" a requirement rather than a suggestion.

The Methodology

The Agile Manifesto (2001) arrived at precisely the right moment. "Responding to change over following a plan." What the disappearing disc made possible, methodology made respectable.

Google kept Gmail in "beta" for five years, from April 2004 to July 2009. By 2008, roughly half of Google's products carried the beta label. Larry Page eventually conceded that the label had become "arbitrary." Tim O'Reilly gave the phenomenon a name: "Perpetual Beta." He meant it as a compliment.

Flickr took the concept to its logical conclusion and upgraded from "beta" to "gamma." Quite.

Beta as a distinct phase vanished. No structured testing with actual users. No data from the target audience. At best, a code review from one or five colleagues. Software was no longer "released." It was permanently "in progress."

The Disc Era Transition Permanent Beta 1995 Windows 95 50 MB. On a disc. Done. 2001 Agile Manifesto "Respond to change" 2004 Gmail "beta" 5 years. 1 label. 2010 Continuous Delivery Ship constantly 2026 Weekly patches "Done" is a legacy concept disc → download → deploy → deploy → deploy → …

The Feedback Theatre

The sprint loop promises customer-centricity. The mechanism looks reasonable enough on a whiteboard:

Sprint ends with a review. Review generates feedback. Feedback becomes tickets. Tickets fill the next sprint. Repeat.

Splendid in theory. In practice, those tickets rarely come from actual customers. They come from adjacent departments with urgent requirements and no understanding of the architecture. The feature must be built. The architecture does not support it. So it bends. Then it breaks. Then somebody files a ticket for that, too.

80 per cent of features are rarely or never used (Pendo, 2019). The Standish Group found 64 per cent in 2002. The figure went up, not down. The industry spent $29.5 billion building features nobody wanted.

The Sprint Feedback Loop Sprint Review Feedback (from departments) Tickets Next Sprint No exit. No "done." Actual Customers (not consulted) 80% features unused (Pendo, 2019) $29.5B spent building them (Pendo / PRNewswire) 33% dev time on debt (Stripe, 2018)

The Consequence

There is no state called "done." Finishing is antithetical to the core metric: velocity. When velocity becomes a target, it ceases to be a measure. This is Goodhart's Law applied to software engineering, and the results are precisely what one would expect.

Windows 95: 50 MB. Windows 11: 64 GB. A 1,280-fold increase for an operating system whose fundamental job description has not materially changed. Snapchat: 4 MB to 203 MB. For disappearing photographs.

The Bloat Ledger Same job. More bytes. Every time. Windows 95: 50 MB (1995) 11: 64 GB (2025) 1,280x Office 97: 112 MB 365: 4 GB 36x Snapchat 4 MB (2012) 203 MB (2024) 50x Disappearing photographs. 203 megabytes.

Developers spend 33 per cent of their time on technical debt (Stripe, 2018). Not building. Repairing what the previous iteration broke. Stripe estimated the global GDP impact at $3 trillion. Three trillion dollars spent maintaining code that was never finished in the first place.

The Counter-Evidence

Software that is finished exists. You are using some of it.

TeX: version 3.141592653. Approaching pi. Frozen by design. Donald Knuth's standing instruction: upon his death, all remaining bugs become features. The version number converges to pi. The software converges to done.

SQLite: 156,000 lines of source. 92 million lines of tests. A 590:1 test-to-source ratio. Supported until 2050. Not because it is unfinished. Because someone decided that finished means you keep your promises.

curl: 266 releases in 27 years. Commands from 1998 still work. Daniel Stenberg ships improvements without breaking the contract with every developer, every script, and every automated pipeline that depends on the tool behaving tomorrow exactly as it behaves today.

Software That Chose to Be Finished π TeX 3.141592653 Frozen by design. Bugs become features. 590:1 SQLite test ratio 92M lines of tests. Supported until 2050. 266 curl releases 27 years. Backwards compatible. Always. The Unix tradition maintains this discipline: stable interfaces, backwards compatibility, defined scope. The commercial market does not.

The Unix tradition still maintains this discipline: stable interfaces, backwards compatibility, defined scope. The commercial market, by and large, does not. Chrome moved from a six-week release cycle to four weeks to weekly patches. 38 per cent of top Android applications update weekly. Elite DevOps teams deploy multiple times per day. Not because the software needs it. Because the methodology demands it.

The Pattern

Software shipped on discs was finished because it had to be. Software shipped over the internet is never finished because it does not have to be. Agile did not invent this. It legitimised it. The sprint loop keeps it running, fed by requirements from people who have never read the codebase.

Paul Valéry, 1933: "A work is never finished, only abandoned."

He was talking about poetry. We turned it into a methodology.

The question is not whether software can be finished. TeX, SQLite, and curl prove that it can. The question is whether anyone still has the commercial incentive to try.