Vivian Voss

Technical Beauty: PostgreSQL

postgresql sql unix

Technical Beauty ■ Episode 7

“Design the core for extensibility, and the extensions will design themselves.”

In 1986, a professor at Berkeley who had already built one influential database decided that the entire approach was wrong. Not the relational model itself. That was sound. The problem was what happened next: every conceivable feature was welded into the core, every use case anticipated at compile time, every capability inseparable from the monolith. The result was software that did everything, understood nothing in particular, and grew in precisely one direction: larger.

His name was Michael Stonebraker, and his response was POSTGRES, short for “Post-Ingres,” because even the name was a refusal to carry the old architecture forward. The project started at UC Berkeley. It would become the most consequential database engine of the last four decades.

The Man

Stonebraker’s career reads like a controlled demolition of received wisdom. He built Ingres in the 1970s, one of the first relational databases, contemporaneous with IBM’s System R. He then built POSTGRES to correct Ingres’s limitations. He subsequently built Mariposa (distributed query processing), Aurora and C-Store (column-oriented storage, later commercialised as Vertica), H-Store (in-memory OLTP, later VoltDB), and SciDB (array-based analytics). Each project was a thesis in executable form. Each challenged whatever the industry had settled on as orthodoxy.

In 2014, the ACM awarded him the Turing Award, computing’s closest analogue to a Nobel Prize. The citation reads: “for fundamental contributions to the concepts and practices underlying modern database systems.” This is rather like citing Newton for “contributions to physics.” It is technically accurate and entirely insufficient.

The Architecture

The central insight of POSTGRES was not a new query language or a faster storage engine. It was a separation of concerns so thorough that it borders on philosophical: the core handles transactions, concurrency, the query planner, and the write-ahead log. Everything else (data types, index methods, operators, functions, procedural languages, foreign data wrappers) is an extension.

This is not a plugin system bolted on after the fact. It is the architecture itself. PostgreSQL does not have extensions. PostgreSQL is an extension architecture with an ACID-compliant kernel.

PostgreSQL: The Extension Architecture ACID Core Transactions WAL / Planner MVCC Data Types Index Methods PL/ Languages Operators FDW pgvector (AI) PostGIS Citus pgmq TimescaleDB pg_cron / pg_stat Minimal core. Maximal surface. Extensibility is the architecture.

The consequence is that PostgreSQL can become a geospatial database (with PostGIS), a vector database for machine learning (with pgvector), a time-series database (with TimescaleDB), a distributed query engine (with Citus), or a message queue (with pgmq and the built-in LISTEN/NOTIFY), without the core gaining a single line of code. Each extension slots into the same catalogue system, the same planner, the same transaction guarantees. You do not bolt capabilities onto PostgreSQL. You plug them in, and the kernel treats them as if they had always been there.

This is not accidental modularity. It is Stonebraker’s thesis made concrete: the database should provide the machinery of reliability and let the domain define everything else.

The Governance

PostgreSQL has been community-governed since 1996. There is no corporate owner. There is no single company that can change direction, relicense, or sunset the project. The core team has seven members. There are thirty-one committers. Decisions are made by technical consensus on public mailing lists. The governance model is, in its own way, as minimal as the architecture: no foundation with a seven-figure budget, no corporate advisory board, no “open core” with a premium tier. Just people who write code, review code, and argue about code in public.

This matters rather more than it might appear. In the last five years, the database landscape has undergone a licensing convulsion that has left much of the industry re-evaluating what “open source” actually means.

Licence Stability: Who Changed the Deal Timeline of open-source licence changes 1996 2010 2020 2026 MongoDB AGPL SSPL (2018) Redis BSD Dual (2024) Elastic Apache 2.0 SSPL (2021) PostgreSQL PostgreSQL Licence (BSD): v1 through v17 MongoDB, Redis, Elasticsearch: corporate owners changed the licence. PostgreSQL: no corporate owner. No licence to change. Version 17. Same BSD licence as version 1.

MongoDB moved from AGPL to the Server Side Public Licence in 2018, a licence specifically designed to prevent cloud providers from offering MongoDB as a service without paying. Whatever one thinks of the commercial logic, the effect is that the community’s contributions now serve a corporate licensing strategy.

Redis followed suit in 2024, switching from BSD to a dual licence that restricts commercial use. Again, the community built on the assumption of BSD. The assumption was revoked.

Elasticsearch changed from Apache 2.0 to SSPL in 2021, prompting Amazon to fork the project as OpenSearch. The result: two codebases, divided communities, and a permanent lesson in what happens when a single company controls the copyright.

PostgreSQL’s response to all of this is the most powerful response available: nothing. There is no corporate owner to change the licence. There is no board to overrule the community. The PostgreSQL Licence is functionally identical to the BSD licence. Version 17 ships under the same terms as version 1. The legal foundation is as minimal and as stable as the technical one.

The Numbers

Seven core team members. Thirty-one committers. ACID-compliant from day one. Seventeen major versions. Thirty-eight years of continuous development. The project has survived the object-database movement, the XML-database movement, the NoSQL movement, the NewSQL movement, and the current enthusiasm for embedding everything into a vector store. It survived them by not joining them. The extensions joined. The core remained.

This is not inertia. This is the discipline of a project that understood, from the very first design document, that features are liabilities when they live in the wrong layer. A geospatial index in the core is a maintenance burden for every user who does not need geospatial queries. A geospatial index as an extension is zero cost to everyone except the people who benefit from it. The economics of this separation are so overwhelmingly favourable that it is remarkable how few projects have replicated it.

The Extensions as Evidence

The quality of an extension architecture is not measured by how many extensions exist. It is measured by how ambitious the extensions dare to be.

PostGIS adds a complete geospatial stack (coordinate reference systems, spatial indices, topology, raster processing) and it is the reference implementation against which commercial GIS systems benchmark themselves. It does not patch the core. It uses the core’s type system, operator system, and index API as designed.

pgvector adds vector similarity search for machine learning embeddings. The AI industry’s current favourite party trick, semantic search over high-dimensional spaces, plugs into a database designed in 1986. Stonebraker did not anticipate large language models. He anticipated that someone, someday, would need a data type he had not imagined, and he designed the system so that adding one would not require his permission.

TimescaleDB turns PostgreSQL into a time-series database. Citus turns it into a distributed query engine. pgmq and the built-in LISTEN/NOTIFY turn it into a message broker. Each of these would justify a dedicated product at most companies. In the PostgreSQL ecosystem, they are CREATE EXTENSION statements.

The Principle

There is a phrase from the Unix tradition that applies here with unusual precision: mechanism, not policy. The kernel provides the mechanism: transactions, concurrency control, query planning, crash recovery. The extensions provide the policy: what data types matter, what indices suit the workload, what procedural language fits the team.

This separation is the reason PostgreSQL is simultaneously the most conservative and the most innovative database in production today. The core changes slowly, deliberately, with the caution appropriate to software that guards other people’s data. The extensions change at whatever pace their domain requires. The AI ecosystem moves fast. pgvector moves with it. The geospatial standards body moves slowly. PostGIS moves with it. Neither tempo affects the other. Neither compromises the transaction guarantees.

Stonebraker built a database that is, in the most literal sense, not a database. It is a platform for building databases. The difference is everything.

Extensibility over features. A minimal core that does one thing (keep your data safe) and an extension surface that does everything else. Thirty-eight years proved this right. Seventeen versions. Same licence. Same architecture. Same seven people at the helm.