Technical Beauty ■ Episode 09
In 2009, the database world had a problem it refused to name. Relational databases were slow, and the diagnosis was always the same: not enough threads. Add more threads. Add a connection pool. Add a thread pool for the connection pool. Add a lock manager to coordinate the threads that coordinate the pools. The patient was drowning, and the prescribed treatment was more water.
Salvatore Sanfilippo, known as antirez, looked at this arrangement and arrived at a conclusion so obvious it bordered on impolite: the databases were not slow because they lacked threads. They were slow because the threads were fighting each other. His solution was not to add better threads. It was to remove them entirely.
The Man
Sanfilippo is a Sicilian developer whose career predates the web as most people understand it. Before Redis, he built hping, a network security tool used by penetration testers worldwide. He understood systems programming at the socket level, the level where bytes meet file descriptors and abstractions dissolve into system calls. This matters. Redis is not the product of a database theorist. It is the product of a systems programmer who understood what the kernel actually does when you ask it to wait.
Sanfilippo maintained Redis personally for fifteen years. He wrote the code. He answered the issues. He wrote the documentation. He resisted, with remarkable consistency, every suggestion to make Redis do more things in more ways for more use cases. The discipline of saying “no” to a feature request is not glamorous work, but it is the work that keeps a codebase from collapsing under the weight of its own accommodations.
The Architecture
Redis is single-threaded. One thread. One event loop. Every operation executes sequentially on a single CPU core. The industry’s immediate reaction to this design, in 2009, was disbelief. A database that uses one core? On a 64-core server? Surely this is a limitation, a prototype constraint that the author will fix in the next release?
It was not a limitation. It was the entire point.
Multi-threaded databases spend a non-trivial portion of their CPU time not processing data but managing access to data. Mutexes. Spinlocks. Read-write locks. Compare-and-swap loops. Cache-line bouncing between cores. Every thread that touches a shared data structure must first negotiate permission to do so, and that negotiation has a cost that scales superlinearly with the number of threads. On modern hardware, a lock contention event can cost hundreds of nanoseconds. At 100,000 operations per second, hundreds of nanoseconds per operation is the difference between fast and unusable.
Redis sidesteps the entire problem. One thread means one
execution context. One execution context means no concurrent
access to shared state. No concurrent access means no locks.
No locks means no contention. No contention means the CPU
spends its cycles doing the thing you asked it to do: reading
and writing data. The event loop, built on
epoll
on Linux and
kqueue
on BSD, multiplexes thousands of client connections onto that
single thread without blocking. A GET takes
roughly one microsecond. A SET takes roughly
the same. The event loop processes them in sequence, one after
another, faster than the network can deliver them.
The Numbers
Approximately 90,000 lines of C. Fifteen years of production use. Sub-millisecond latency on commodity hardware. Over 100,000 operations per second on a single core, and substantially more with pipelining. No garbage collection pauses, because there is no garbage collector. No thread pool tuning, because there is no thread pool. No lock contention analysis, because there are no locks to contend.
The users read like a roll call of the internet’s infrastructure: X (formerly Twitter), GitHub, Stack Overflow, Instagram, and a rather long list of others that prefer not to discuss their infrastructure choices in public. When a social network serving hundreds of millions of requests per day chooses a single-threaded database, it is not because the engineering team misread the documentation. It is because the engineering team understood the documentation.
The Data Structures
Redis is frequently described as a key-value store, in much the same way that a Swiss Army knife is frequently described as a blade. The description is technically accurate and comprehensively misleading. Redis is a data structure server. The distinction matters.
A key-value store maps strings to strings. Redis maps strings to data structures: strings, hashes, lists, sets, sorted sets, bitmaps, hyperloglogs, and streams. Each structure is implemented with the correct algorithm for its access patterns. Sorted sets use skip lists, providing O(log n) insertion and range queries. Lists are implemented as quicklists (linked lists of ziplists), optimised for both ends. Hashes use a compact ziplist encoding for small cardinalities and transition to hash tables when the data outgrows it. These are not afterthoughts bolted onto a string store. They are the reason Redis exists.
A leaderboard is a sorted set. A session store is a hash with an expiry. A rate limiter is an incrementing key with a TTL. A message queue is a stream with consumer groups. Each of these problems, in a traditional database, requires a table schema, an index strategy, and a query plan. In Redis, each is a single command with constant-time or logarithmic complexity. The abstraction is not hiding complexity. The abstraction is the correct solution.
The Event Loop
The mechanism that makes single-threaded concurrency possible is the event loop, and it is worth understanding precisely what this means. Redis does not wait for I/O. It never blocks on a socket read. It never blocks on a socket write. It registers interest in file descriptor events with the kernel (via epoll on Linux, kqueue on FreeBSD and macOS) and then processes whichever descriptors are ready. If Client A’s data has arrived and Client B’s has not, Redis processes Client A and moves on. Client B will be processed when its data arrives. No thread sits idle waiting. No thread exists to sit idle.
This is the same model that powers nginx. It is the same
model that Node.js adopted six years later. It is, if one
traces the lineage far enough, the model that Unix file
descriptor multiplexing was designed for in the first place.
The select() system call has existed since 4.2BSD
in 1983. The idea that a single process can handle thousands
of concurrent connections is not new. It is, in fact,
forty-three years old. Redis merely applied it to a domain
where the prevailing wisdom insisted it could not work.
The Persistence
“But it’s in-memory,” says the sceptic. “What happens when the power goes out?” This is a reasonable question, and Redis provides two answers, both of which the sceptic tends to find unsatisfying because they are not dramatic enough to justify the concern.
The first answer is RDB snapshots: periodic point-in-time dumps of the entire dataset to disc. Redis forks the process, the child serialises the data, the parent continues serving requests. Copy-on-write semantics mean the fork is nearly instant regardless of dataset size. The snapshot lands on disc. The parent never noticed.
The second answer is the Append-Only File (AOF): a write-ahead
log that records every mutation. On restart, Redis replays
the log and reconstructs the dataset. The AOF can be
fsync’d every second (the default, and
sensible for most workloads) or after every write (for those
who require the belt, the braces, and the structural engineer
to verify both). In practice, most deployments use both
mechanisms: RDB for fast restarts, AOF for minimal data loss.
The combination provides durability guarantees that satisfy
everyone except those who were not really asking about
durability but about finding a reason not to use Redis.
The Scaling
A single Redis instance handles 100,000 operations per second. For most applications, this is more than sufficient. For those where it is not, Redis scales the way Unix scales: by running more processes. Redis Cluster shards data across multiple nodes using hash slots (16,384 of them, for those who appreciate specificity). Each node is an independent single-threaded instance. Each handles its own subset of the keyspace. The cluster coordinates via a gossip protocol. There is no central coordinator. There is no single point of failure. There is no distributed lock manager, because each shard owns its data exclusively.
This is horizontal scaling without the usual horizontal complexity. No two-phase commit. No distributed transaction coordinator. No consensus protocol for every write. Each shard is autonomous, single-threaded, and fast. The cluster is a collection of independent fast things, not a single complicated slow thing pretending to be fast. The distinction is architectural, and it is the reason Redis Cluster works in practice rather than merely in conference presentations.
The Licence
For fifteen years, Redis was BSD-licensed. In March 2024, Redis Ltd changed the licence to a dual RSALv2/SSPLv1 model. The change was, depending on one’s perspective, either a necessary defence against cloud providers reselling open-source work without contributing back, or a corporate appropriation of a community project. Both perspectives contain truth. Neither contains all of it.
The community’s response was Valkey, a fork maintained by the Linux Foundation with contributions from AWS, Google, Oracle, and others. The architecture remains identical. The codebase diverged at Redis 7.2. The lesson is one the industry has learned before: a licence can change, but an architecture, once proven, is forked rather than abandoned. The single-threaded event loop will outlast the corporate entity that changed the licence, in the same way that Unix outlasted AT&T.
The Reduction
What makes Redis technically beautiful is not its speed, though it is fast. It is not its data structures, though they are well-chosen. It is not its ubiquity, though it is everywhere. What makes Redis beautiful is the clarity of a single architectural decision and the discipline to follow it through to every consequence.
One thread. Therefore no locks. Therefore no contention. Therefore predictable latency. Therefore simple debugging. Therefore auditable code. Therefore correct code. Each consequence follows from the one before it. The entire system is a syllogism in C, ninety thousand lines long, and every line follows from the premise that concurrency is not a threading problem but a design problem.
The industry spent decades throwing threads at latency and wondering why the latency got worse. Sanfilippo spent 2009 removing the threads and watching the latency disappear. The difference between the two approaches is not technical talent. It is the willingness to question the assumption that everyone else had stopped questioning.
One thread. One event loop. One hundred thousand operations per second. No locks required. The fastest path through a problem is sometimes around it.