Vivian Voss

Redis: The One-Thread Design

architecture performance

By Design ■ Episode 4

In 2009, Salvatore Sanfilippo had a problem. His startup, LLOOGG, offered real-time web analytics at a moment when Google Analytics did not provide it. This was not a minor differentiator. It was the entire product. Website owners could watch their visitors arrive, right now, which required performing hundreds of list push and pop operations per second, each demanding an immediate response.

MySQL could not keep up. The architecture was not wrong, precisely. It was simply built for a different problem. So Sanfilippo, working from his home in Sicily, prototyped a memory-based database in Tcl, roughly three hundred lines, and called it LMDB: LLOOGG Memory Database. The prototype worked. He rewrote it in C. He named the result Redis: Remote Dictionary Server.

He made it single-threaded. Deliberately.

The Complaint

"Redis runs on one thread. Modern servers have 64, 96, even 128 cores. Redis uses exactly one. That is a serious architectural limitation."

One does hear this. Usually from someone who has just provisioned a 64-core server, pointed a single Redis instance at it, and noticed that the other 63 cores are contributing nothing whatsoever to the enterprise. The complaint is coherent. The conclusion it implies, that Redis should therefore be multi-threaded, is not.

The Design

The choice was explicit. Sanfilippo built Redis around an event loop, kqueue on BSD and macOS, epoll on Linux, with select as a portable fallback, that processes commands sequentially. One command at a time. When a command arrives, it executes. When it completes, the next begins. No locking. No mutexes. No context switches between executions. No possibility of two operations corrupting shared state simultaneously.

This is not a limitation of ambition. It is a consequence of understanding where the bottleneck actually lives.

In a memory-first system, the bottleneck is not computation. It is contention. Multi-threading does not eliminate work. It adds coordination overhead on top of it. Consider what the threading model actually costs:

The Coordination Tax CPU cycles per operation (scale compressed; bars not linear) Redis GET / SET ~5 cycles Cache line transfer 100–300 cycles Uncontended mutex 100–1,000 cycles Context switch 2,000–8,000 cycles Contested mutex 10,000+ cycles Redis eliminates rows 2 through 5 entirely. There is no mutex to acquire. There is no thread to wake. There is no cache line to transfer.

Redis operates entirely in memory. A GET or SET completes in microseconds. At that speed, the overhead of coordinating multiple threads is not a rounding error. It is a significant fraction of the operation itself. By removing threads from command execution entirely, Redis removes the cost of coordinating them. The event loop provides I/O multiplexing without the overhead of one thread per connection. Sequential execution eliminates lock contention by making contention structurally impossible. The simplicity is the performance.

One Thread, Many Connections Client 1 Client 2 Client 3 Client N ... Event Loop kqueue / epoll 1 thread SET key val GET session INCR counter one at a time No mutexes. No shared state. No contention. Thousands of concurrent connections. One execution thread. The event loop handles multiplexed I/O. Commands execute one at a time. Data structures are never shared. Compare with a thread pool: N threads, shared hash table, one contested mutex.

The Trade-Off

Let us be honest about the costs, because Sanfilippo was.

A slow command blocks everything. KEYS on a production database containing one million entries stops the entire server for the duration of the scan. No other command executes. No other client receives a response. The server is, to a first approximation, unavailable. The documentation recommends SCAN for incremental iteration. Production incidents recommend reading the documentation, usually around 3 AM, usually after the monitoring alert has already fired.

Scaling across cores requires running multiple Redis instances and distributing keys across them via Redis Cluster or application-level sharding. The model is horizontal, not vertical. One instance per core, not one instance for all cores. This is operational complexity that a multi-threaded design would not require. It is a real cost. The design documentation has never pretended otherwise.

The Proof

The numbers from the official Redis benchmark documentation: without pipelining, 180,000 SET operations per second on commodity hardware, with p50 latency below 0.15 milliseconds. With pipelining of 16 commands, over 1.5 million SET operations per second, p50 latency below 0.5 milliseconds. Consistently. Under load. Without a garbage collector pause, because there is no garbage collector.

Twitter and Instagram both adopted Redis in 2010. GitHub uses it for job queues. Stack Overflow runs it as their primary cache, serving billions of requests per month from commodity hardware.

In 2020, Redis 6.0 added I/O threading: network reads and writes now happen in parallel, but command execution remains single-threaded. The 37 to 112 per cent throughput improvement this delivered suggests that network I/O was the actual bottleneck in many workloads, not command execution. The single-threaded model, it turns out, was not the limiting factor.

In March 2024, Redis Ltd. changed the project's licence from BSD to SSPL, effectively removing it from open source. The community forked it as Valkey within weeks, backed by the Linux Foundation, with Amazon, Google, Oracle, and Ericsson among the founding supporters. The fork started from Redis 7.2.4. The single-threaded architecture came with it, unchanged, because nobody wanted to change it. Redis 8 subsequently returned to open source under AGPLv3 and reduced per-command latency by 5.4 to 87.4 per cent across 90 commands, through algorithmic improvements, not threading. Seventeen years after the first commit, the design is still being optimised within its own constraints.

The Principle

Contention is not a problem you solve with more threads. It is a problem you design out.

Sanfilippo identified the actual bottleneck in a memory-first system, coordination overhead rather than computation, and eliminated it by making coordination structurally impossible. The result processes millions of commands per second on a single core and delivers sub-millisecond latency under load. The engineering instinct is to add parallelism when performance matters. Redis demonstrates that removing the overhead of coordination can be more effective than adding it.

Last episode, Rust said no to null, to exceptions, to garbage collection, and to inheritance. Every refusal eliminated a category of bugs. Redis applies the same logic at the architecture level: say no to threads in the execution path, and eliminate the entire category of contention bugs. The constraint is not a concession. It is the guarantee.

Every application that has ever cached a session token, enforced a rate limit, maintained a leaderboard, or coordinated a distributed lock has relied on the fact that Redis will not corrupt its own data structures under concurrent access. Concurrent command execution is, by design, impossible. That is not incidental. It is the architecture.

The limitation is the architecture. The architecture is the feature.

In 2009, a Sicilian developer built a database in three hundred lines of Tcl because MySQL could not perform list operations fast enough. He rewrote it in C, made it single-threaded by design, and named it Redis. Seventeen years later, it runs at internet scale on one thread. The other 63 cores are still not needed.