Vivian Voss

MongoDB: The Reinvention of the Wheel

sql architecture performance

The Invoice ■ Episode 09

“But we don’t need relationships!”

Your data has relationships whether you model them or not. A customer has orders. An order has items. Items belong to a catalogue. These are not design decisions. They are facts about the domain. The only choice is whether you express them in a system designed to enforce them, or pretend they do not exist and discover the consequences at three in the morning.

In 1970, Edgar F. Codd published “A Relational Model of Data for Large Shared Data Banks”. The paper is fifty-six years old. The problems it solved have not changed. The data has not changed. Only the marketing has.

The Denormalisation Trap

MongoDB’s recommended pattern is to embed related data directly in the document. A customer’s address appears in the customer document, in each of their order documents, in the shipping documents, in the invoice documents. Five copies of the same address, stored in five places, governed by nothing.

The customer moves house. You update the customer document. The four order documents still show the old address. Three shipping documents reference a street that no longer applies. The invoice documents are now factually incorrect.

One Address, Five Documents customer 10 New Road ✓ updated order #1 5 Old Street ✗ stale order #2 5 Old Street ✗ stale shipping 5 Old Street ✗ stale invoice 5 Old Street ✗ stale MongoDB calls this “eventual consistency” PostgreSQL: addresses table One row. Updated once. Consistent everywhere.

MongoDB’s documentation calls this “eventual consistency.” A more honest description would be “inconsistent until someone notices.” In a normalised relational database, the address exists in precisely one place. You update it once. Every query that references it returns the current value. Not eventually. Immediately. This is not a feature of PostgreSQL. It is the entire point of normalisation, and it was solved in 1970.

Reinventing the JOIN

The moment your document model encounters a relationship it cannot embed (and it will, because data has relationships whether you like it or not) you need to fetch related documents. In SQL, this is a JOIN. One query. The database engine optimises it. The query planner has had fifty years of research poured into making it fast.

In MongoDB, you write application code. You fetch the first document, extract the reference, fetch the second document, combine them in memory. Two round trips. No query planner. No optimisation. Just your code, doing badly what a database engine does well.

Or you use the Aggregation Pipeline with $lookup. Which is, and there is no polite way to phrase this, a JOIN. A proprietary, MongoDB-specific JOIN, expressed in nested JSON objects rather than declarative SQL, documented in MongoDB’s own format, and transferable to precisely zero other databases.

The Same Query, Two Ways SQL (PostgreSQL) SELECT o.id, c.name, o.total FROM orders o JOIN customers c ON o.customer_id = c.id; 2 lines. Portable. Optimised. MongoDB Aggregation Pipeline db.orders.aggregate([ { $lookup: { from: "customers", localField: "customer_id", foreignField: "_id", as: "customer" }}, { $unwind: "$customer" } Proprietary. Verbose. MongoDB App-Code Fallback const orders = await db.orders.find(); // then loop, fetch each customer, merge in memory

The SQL version is two lines. It is declarative, portable across every relational database on the planet, and understood by anyone who has spent ten minutes with the language. The Aggregation Pipeline version is a proprietary reinvention of the same operation, expressed in a syntax that transfers to nothing else. The app-code fallback is worse still: two round trips, manual merging, and the quiet certainty that your hand-written loop is slower and buggier than a query planner with decades of optimisation behind it.

The Schemaless Illusion

“Schemaless” is MongoDB’s most celebrated feature. It is also its most misleading.

Your data has a schema. It always has a schema. The question is not whether a schema exists, but where it is enforced. In a relational database, the schema lives in the database itself. It is declared, versioned, and enforced at write time. If you attempt to insert a string where an integer belongs, the database refuses. If you attempt to reference a row that does not exist, the foreign key constraint stops you. The data is guaranteed to be structurally correct at rest.

In MongoDB, the schema lives in your application code. Every function that reads a document must validate its structure. Every write must be manually checked for consistency. The schema has not been removed. It has been moved from a system designed to enforce it (the database) to a system that is not: your application.

The proof is Mongoose, the most popular MongoDB ODM for Node.js, with over 1.8 million weekly downloads. Its primary feature: adding schemas to MongoDB. The community’s most-used tool for MongoDB exists to restore the very thing MongoDB removed. One could not write a more damning review if one tried.

The Timeline of Reinvention

The history is instructive, not because it is complicated, but because it is embarrassingly straightforward.

Codd publishes the relational model in 1970. Fifty-three years of research, optimisation, and battle-testing follow. Then MongoDB 1.0 arrives in 2009, and in 2012 (this is the part that bears emphasis) it ships with no write acknowledgement by default. The database confirms a write before it has actually been written. Kyle Kingsbury’s Jepsen analysis remains essential reading for anyone who stores data they would prefer not to lose.

Multi-document transactions arrive in 2018. ACID compliance, the baseline feature of every relational database since the 1980s, took MongoDB nine years to implement. Also in 2018, MongoDB switches from the AGPL to the Server Side Public License, a licence so restrictive that the Open Source Initiative refuses to recognise it. Your “open source” database is no longer open source by any standard definition.

By 2023, Atlas generates $1.7 billion in revenue. The business model works. The engineering argument, however, has not improved.

The Reinvention Timeline 1970 Codd: relational model published 2009 MongoDB 1.0 2012 No write ack by default 2018 Multi-doc ACID (9 years late) SSPL licence (not open source) 2023 Atlas: $1.7B revenue Meanwhile, PostgreSQL (1996-present): Free. ACID since day one. JSONB for document-shaped data. Real open source. No write acknowledgement surprises. Everything MongoDB rebuilt, PostgreSQL never lost.

The PostgreSQL Counter-Argument

PostgreSQL ships JSONB. A native, indexable, queryable binary JSON type. You can store document-shaped data (the genuinely useful part of the document model) inside a relational database, alongside proper foreign keys, proper transactions, and proper ACID guarantees. No ODM required. No Aggregation Pipeline. No proprietary query syntax.

The document model is not the problem. The problem is making it the only model. Some data is genuinely document-shaped: logs, events, sensor readings, CMS content. PostgreSQL handles these with JSONB. The rest of your data (the customers, orders, inventory, and everything else with relationships) gets proper relational modelling, enforced at the database level.

One database. Both models. Free. MongoDB Atlas, meanwhile, charges for every operation, every gigabyte, and every hour of uptime. The pricing page requires a calculator. PostgreSQL’s pricing page is a single word: free.

The Fair Concession

MongoDB is not without legitimate use cases. Logs, event streams, sensor data, time-series telemetry. Data that is genuinely document-shaped, append-mostly, and rarely joined. In these contexts, a document store is the natural fit, and MongoDB serves well.

The problem is scope. MongoDB was not marketed as a specialist tool for append-heavy, relationship-free data. It was marketed as a general-purpose database replacement. “Say goodbye to complex schemas!” the implication being that schemas are the problem rather than the solution. And an industry that was tired of writing migration scripts believed it.

The Root Cause

How did competent engineers adopt a database that ships without write acknowledgement and calls the absence of schema enforcement a feature?

The same way they adopted every other tool in this series: the marketing arrived before the evaluation. “Schemaless” sounds like freedom. “Document model” sounds modern. “NoSQL” sounds like progress. The words are chosen with care. They bypass the technical question (“Does my data have relationships?”) and replace it with an emotional one: “Do you want to be modern?”

The answer to the first question is almost always yes. Your data has relationships. It has structure. It has constraints that must be enforced somewhere. The only question is whether you enforce them in a system built for the purpose, or scatter them across application code and hope for the best.

The Verdict

MongoDB stores relational data in a document store, then spends a decade rebuilding the relational features it discarded. The Aggregation Pipeline is SQL with worse syntax. Mongoose is the schema the database was supposed to make unnecessary. Multi-document transactions are ACID compliance, nine years late. The SSPL is an open-source licence that is not open source.

Codd solved this in 1970. PostgreSQL implements it for free, with JSONB for the bits that genuinely benefit from a document model. The wheel was invented. It was round. It worked.

MongoDB reinvented it, square, and charged $1.7 billion for the privilege. The invoice is on the table.