Consistent Hashing

The math trick that makes growing a sharded system possible.

The problem with simple hashing

In the basic sharding setup we used shard = hash(user_id) % N. Pick a shard by taking the hash mod the number of shards.

Now you grow from 4 shards to 5. The formula becomes shard = hash(user_id) % 5.

The hash did not change. But the modulo did. Almost every key now belongs to a different shard. You have to move about 80 percent of your data. For a database with terabytes, that is a disaster. It takes hours or days of downtime and bandwidth.

There has to be a better way.

The ring idea

Consistent hashing works on a ring. Picture a circle that holds all possible hash values, from 0 to 2 to the 32.

Place each shard somewhere on the ring. You can hash the shard's name to pick the spot. Place each key on the ring by hashing the key. Each key belongs to the nearest shard going clockwise.

Adding a new shard? Put it on the ring. It "steals" a slice of keys from one neighbor. Only the keys between its position and the shard counterclockwise from it.

Only that slice of keys has to move. The other 95 percent of keys stay where they were. Adding a shard moves 1 out of N keys instead of (N - 1) out of N.

Removing a shard? Its keys flow to the next clockwise shard. Same small amount of movement.

Virtual nodes

Here is a refinement. A single shard placement can land in an unlucky spot on the ring. It can take a much bigger slice of keys than the others. Some shards end up with 30 percent of traffic. Others get 5 percent.

The fix is virtual nodes. Place each real shard at many positions on the ring. About 100 virtual positions per real shard is common. Keys are now spread across many small slices. Load is even across shards.

This is how real systems do consistent hashing. Each machine owns hundreds of virtual positions. Load stays even. Adding or removing nodes still moves only a small slice of data.

Where you see it in real systems

Consistent hashing is not just for databases.

Memcached and Redis Cluster use it to spread keys across cache nodes. Adding a cache node does not blow away the whole cache.

DynamoDB and Cassandra partition data across nodes using consistent hashing.

Load balancers with session affinity use it so the same user always lands on the same backend, even when backends come and go.

CDN edge selection uses it to decide which edge serves a given URL.

Anywhere you need to map items to buckets and care about not moving most items when the bucket count changes, consistent hashing is the answer.

It is an elegant idea from MIT in 1997. It quietly powers a lot of the internet.

Now build it yourself →