Consitent hashing
Last updated
Last updated
Now we understand the definition of consistent hashing, let us find out how it works. Assume SHA-1 is used as the hash function f, and the output range of the hash function is: x0, x1, x2, x3, ..., xn. In cryptography, SHA-1’s hash space goes from 0 to 2^160 - 1. That means x0 corresponds to 0, xn corresponds to 2^160 – 1, and all the other hash values in the middle fall between 0 and 2^160 - 1. By collecting both ends, we get a hash ring as shown
Using the same hash function f, we map servers based on server IP or name onto the ring.
One thing worth mentioning is that the hash function used here is different from the one in “the rehashing problem,” and there is no modular operation. As shown in the picture below, 4 cache keys (key0, key1, key2, and key3) are hashed onto the hash ring
To determine which server a key is stored on, we go clockwise from the key position on the ring until a server is found. The picture below explains this process. Going clockwise, key0 is stored on server 0; key1 is stored on server 1; key2 is stored on server 2 and key3 is stored on server 3.
Using the logic described above, adding a new server will only require the redistribution of a fraction of keys. In the picture below, after a new server 4 is added, only key0 needs to be redistributed. k1, k2, and k3 remain on the same servers. Let us take a close look at the logic. Before server 4 is added, key0 is stored on server 0. Now, key0 will be stored on server 4 because server 4 is the first server it encounters by going clockwise from key0’s position on the ring. The other keys are not redistributed based on a consistent hashing algorithm.
When a server is removed, only a small fraction of keys require redistribution with consistent hashing. In the picture below, when server 1 is removed, only key1 must be remapped to server 2. The rest of the keys are unaffected.