Vivian Voss

Technical Beauty: Redis

redis architecture performance

Technical Beauty ■ Episode 09

In 2009, the database world had a problem it refused to name. Relational databases were slow, and the diagnosis was always the same: not enough threads. Add more threads. Add a connection pool. Add a thread pool for the connection pool. Add a lock manager to coordinate the threads that coordinate the pools. The patient was drowning, and the prescribed treatment was more water.

Salvatore Sanfilippo, known as antirez, looked at this arrangement and arrived at a conclusion so obvious it bordered on impolite: the databases were not slow because they lacked threads. They were slow because the threads were fighting each other. His solution was not to add better threads. It was to remove them entirely.

The Man

Sanfilippo is a Sicilian developer whose career predates the web as most people understand it. Before Redis, he built hping, a network security tool used by penetration testers worldwide. He understood systems programming at the socket level, the level where bytes meet file descriptors and abstractions dissolve into system calls. This matters. Redis is not the product of a database theorist. It is the product of a systems programmer who understood what the kernel actually does when you ask it to wait.

Sanfilippo maintained Redis personally for fifteen years. He wrote the code. He answered the issues. He wrote the documentation. He resisted, with remarkable consistency, every suggestion to make Redis do more things in more ways for more use cases. The discipline of saying “no” to a feature request is not glamorous work, but it is the work that keeps a codebase from collapsing under the weight of its own accommodations.

The Architecture

Redis is single-threaded. One thread. One event loop. Every operation executes sequentially on a single CPU core. The industry’s immediate reaction to this design, in 2009, was disbelief. A database that uses one core? On a 64-core server? Surely this is a limitation, a prototype constraint that the author will fix in the next release?

It was not a limitation. It was the entire point.

Multi-threaded databases spend a non-trivial portion of their CPU time not processing data but managing access to data. Mutexes. Spinlocks. Read-write locks. Compare-and-swap loops. Cache-line bouncing between cores. Every thread that touches a shared data structure must first negotiate permission to do so, and that negotiation has a cost that scales superlinearly with the number of threads. On modern hardware, a lock contention event can cost hundreds of nanoseconds. At 100,000 operations per second, hundreds of nanoseconds per operation is the difference between fast and unusable.

The Concurrency Problem Multi-threaded DB Threads compete for shared state Thread 1 Thread 2 Thread N Lock Manager Shared Data Mutex waits: ~200 ns each Cache-line bouncing Lock contention scales O(n²) Redis One thread, one event loop Client connections (fd queue) Event Loop epoll / kqueue In-Memory Data Zero locks: 0 ns overhead No cache-line bouncing Latency: predictable O(1) ~40,000 ops/sec (lock-bound ceiling) 100,000+ ops/sec (memory-bandwidth ceiling) The fastest lock is the one you never acquire. Redis removed the locks. The performance followed.

Redis sidesteps the entire problem. One thread means one execution context. One execution context means no concurrent access to shared state. No concurrent access means no locks. No locks means no contention. No contention means the CPU spends its cycles doing the thing you asked it to do: reading and writing data. The event loop, built on epoll on Linux and kqueue on BSD, multiplexes thousands of client connections onto that single thread without blocking. A GET takes roughly one microsecond. A SET takes roughly the same. The event loop processes them in sequence, one after another, faster than the network can deliver them.

The Numbers

Approximately 90,000 lines of C. Fifteen years of production use. Sub-millisecond latency on commodity hardware. Over 100,000 operations per second on a single core, and substantially more with pipelining. No garbage collection pauses, because there is no garbage collector. No thread pool tuning, because there is no thread pool. No lock contention analysis, because there are no locks to contend.

The users read like a roll call of the internet’s infrastructure: X (formerly Twitter), GitHub, Stack Overflow, Instagram, and a rather long list of others that prefer not to discuss their infrastructure choices in public. When a social network serving hundreds of millions of requests per day chooses a single-threaded database, it is not because the engineering team misread the documentation. It is because the engineering team understood the documentation.

The Data Structures

Redis is frequently described as a key-value store, in much the same way that a Swiss Army knife is frequently described as a blade. The description is technically accurate and comprehensively misleading. Redis is a data structure server. The distinction matters.

A key-value store maps strings to strings. Redis maps strings to data structures: strings, hashes, lists, sets, sorted sets, bitmaps, hyperloglogs, and streams. Each structure is implemented with the correct algorithm for its access patterns. Sorted sets use skip lists, providing O(log n) insertion and range queries. Lists are implemented as quicklists (linked lists of ziplists), optimised for both ends. Hashes use a compact ziplist encoding for small cardinalities and transition to hash tables when the data outgrows it. These are not afterthoughts bolted onto a string store. They are the reason Redis exists.

Data Structure Server, Not Key-Value Store Six core structures. Correct algorithms. Constant-time where it matters. String Binary-safe, 512 MB max O(1) GET/SET, SDS encoding Hash Field-value pairs per key O(1) HGET/HSET, ziplist → ht List Doubly-linked quicklist O(1) LPUSH/RPOP, blocking ops Set Unique members, intersections O(1) SADD/SISMEMBER, intset → ht Sorted Set Score-ordered, range queries O(log n) ZADD, skip list + ht Stream Append-only log, consumer groups O(1) XADD, radix tree + listpack Each structure uses the optimal algorithm for its access pattern. Sorted sets: skip lists. Hashes: ziplist to hash table promotion. These are not bolted-on features. They are the reason Redis exists. Leaderboards → ZRANGEBYSCORE. Sessions → HGETALL + EXPIRE. Rate limiting → INCR + EXPIRE. Message queue → XREADGROUP. The right data structure at the right time. In memory. Sub-millisecond.

A leaderboard is a sorted set. A session store is a hash with an expiry. A rate limiter is an incrementing key with a TTL. A message queue is a stream with consumer groups. Each of these problems, in a traditional database, requires a table schema, an index strategy, and a query plan. In Redis, each is a single command with constant-time or logarithmic complexity. The abstraction is not hiding complexity. The abstraction is the correct solution.

The Event Loop

The mechanism that makes single-threaded concurrency possible is the event loop, and it is worth understanding precisely what this means. Redis does not wait for I/O. It never blocks on a socket read. It never blocks on a socket write. It registers interest in file descriptor events with the kernel (via epoll on Linux, kqueue on FreeBSD and macOS) and then processes whichever descriptors are ready. If Client A’s data has arrived and Client B’s has not, Redis processes Client A and moves on. Client B will be processed when its data arrives. No thread sits idle waiting. No thread exists to sit idle.

This is the same model that powers nginx. It is the same model that Node.js adopted six years later. It is, if one traces the lineage far enough, the model that Unix file descriptor multiplexing was designed for in the first place. The select() system call has existed since 4.2BSD in 1983. The idea that a single process can handle thousands of concurrent connections is not new. It is, in fact, forty-three years old. Redis merely applied it to a domain where the prevailing wisdom insisted it could not work.

The Persistence

“But it’s in-memory,” says the sceptic. “What happens when the power goes out?” This is a reasonable question, and Redis provides two answers, both of which the sceptic tends to find unsatisfying because they are not dramatic enough to justify the concern.

The first answer is RDB snapshots: periodic point-in-time dumps of the entire dataset to disc. Redis forks the process, the child serialises the data, the parent continues serving requests. Copy-on-write semantics mean the fork is nearly instant regardless of dataset size. The snapshot lands on disc. The parent never noticed.

The second answer is the Append-Only File (AOF): a write-ahead log that records every mutation. On restart, Redis replays the log and reconstructs the dataset. The AOF can be fsync’d every second (the default, and sensible for most workloads) or after every write (for those who require the belt, the braces, and the structural engineer to verify both). In practice, most deployments use both mechanisms: RDB for fast restarts, AOF for minimal data loss. The combination provides durability guarantees that satisfy everyone except those who were not really asking about durability but about finding a reason not to use Redis.

The Scaling

A single Redis instance handles 100,000 operations per second. For most applications, this is more than sufficient. For those where it is not, Redis scales the way Unix scales: by running more processes. Redis Cluster shards data across multiple nodes using hash slots (16,384 of them, for those who appreciate specificity). Each node is an independent single-threaded instance. Each handles its own subset of the keyspace. The cluster coordinates via a gossip protocol. There is no central coordinator. There is no single point of failure. There is no distributed lock manager, because each shard owns its data exclusively.

This is horizontal scaling without the usual horizontal complexity. No two-phase commit. No distributed transaction coordinator. No consensus protocol for every write. Each shard is autonomous, single-threaded, and fast. The cluster is a collection of independent fast things, not a single complicated slow thing pretending to be fast. The distinction is architectural, and it is the reason Redis Cluster works in practice rather than merely in conference presentations.

The Licence

For fifteen years, Redis was BSD-licensed. In March 2024, Redis Ltd changed the licence to a dual RSALv2/SSPLv1 model. The change was, depending on one’s perspective, either a necessary defence against cloud providers reselling open-source work without contributing back, or a corporate appropriation of a community project. Both perspectives contain truth. Neither contains all of it.

The community’s response was Valkey, a fork maintained by the Linux Foundation with contributions from AWS, Google, Oracle, and others. The architecture remains identical. The codebase diverged at Redis 7.2. The lesson is one the industry has learned before: a licence can change, but an architecture, once proven, is forked rather than abandoned. The single-threaded event loop will outlast the corporate entity that changed the licence, in the same way that Unix outlasted AT&T.

The Reduction

What makes Redis technically beautiful is not its speed, though it is fast. It is not its data structures, though they are well-chosen. It is not its ubiquity, though it is everywhere. What makes Redis beautiful is the clarity of a single architectural decision and the discipline to follow it through to every consequence.

One thread. Therefore no locks. Therefore no contention. Therefore predictable latency. Therefore simple debugging. Therefore auditable code. Therefore correct code. Each consequence follows from the one before it. The entire system is a syllogism in C, ninety thousand lines long, and every line follows from the premise that concurrency is not a threading problem but a design problem.

The industry spent decades throwing threads at latency and wondering why the latency got worse. Sanfilippo spent 2009 removing the threads and watching the latency disappear. The difference between the two approaches is not technical talent. It is the willingness to question the assumption that everyone else had stopped questioning.

One thread. One event loop. One hundred thousand operations per second. No locks required. The fastest path through a problem is sometimes around it.