Computer Science

What is Consistent Hashing?

Consistent Hashing is a technique used in distributed systems to distribute data across a dynamic set of nodes (like servers) in such a way that minimizes reorganization when nodes are added or removed. It’s especially useful in scenarios like load balancing, distributed caching (e.g., Memcached), and sharding databases.


Why Use Consistent Hashing?

  1. Minimizes Rehashing:
    Traditional hashing algorithms reassign almost all keys when the number of nodes changes. Consistent hashing, however, only remaps a small portion of the keys.
  2. Scalability:
    It allows smooth scaling of systems by adding or removing nodes with minimal impact on data distribution.
  3. Fault Tolerance:
    If a node fails, only its data needs to be redistributed, not the entire dataset.

How Does Consistent Hashing Work?

  1. Hash Ring (Circle):
    Imagine all possible hash values arranged in a circle (0 to 2^32 - 1 for a 32-bit hash).
  2. Placing Nodes on the Ring:
    Each node (server) is assigned a position on the ring based on its hash value.
  3. Placing Keys on the Ring:
    Each data item (key) is also hashed and placed on the ring.
  4. Assigning Keys to Nodes:
    A key is assigned to the first node encountered moving clockwise on the ring from the key’s position. If the key’s hash is greater than the last node, it wraps around to the first node on the ring.

Example:

Imagine we have three servers: A, B, and C.

  1. Hashing Servers:
    • hash(A) = 20
    • hash(B) = 50
    • hash(C) = 80
  2. Adding Data:
    • For a key K1 with hash(K1) = 30, it will be stored on server B (the next clockwise node).
    • If hash(K2) = 70, it will go to server C.
    • If hash(K3) = 90, it wraps around to server A.
  3. Adding a New Server:
    • Add server D with hash(D) = 60.
    • Now, K2 (hash 70) will still go to C, but keys between 50 and 60 will go to D, minimizing data movement.

Virtual Nodes (VNodes):

To avoid uneven data distribution when node hash values are clustered, virtual nodes are used. Each physical node is assigned multiple virtual nodes with different hash values, ensuring better load balancing.


Python Code Example:

Here’s a basic implementation of Consistent Hashing in Python:

import hashlib
import bisect

class ConsistentHashing:
    def __init__(self, nodes=None, replicas=3):
        self.replicas = replicas
        self.ring = {}
        self.sorted_keys = []
        if nodes:
            for node in nodes:
                self.add_node(node)

    def _hash(self, key):
        return int(hashlib.md5(key.encode('utf-8')).hexdigest(), 16)

    def add_node(self, node):
        for i in range(self.replicas):
            key = self._hash(f"{node}:{i}")
            self.ring[key] = node
            bisect.insort(self.sorted_keys, key)

    def remove_node(self, node):
        for i in range(self.replicas):
            key = self._hash(f"{node}:{i}")
            del self.ring[key]
            self.sorted_keys.remove(key)

    def get_node(self, key):
        if not self.ring:
            return None
        hash_val = self._hash(key)
        index = bisect.bisect(self.sorted_keys, hash_val) % len(self.sorted_keys)
        return self.ring[self.sorted_keys[index]]

# Example usage:
nodes = ["A", "B", "C"]
ch = ConsistentHashing(nodes)

print(ch.get_node("K1"))  # e.g., outputs 'B'
print(ch.get_node("K2"))  # e.g., outputs 'C'

# Add a new node
ch.add_node("D")
print(ch.get_node("K1"))  # Might still be 'B' or changed to 'D' based on hash

Java Code Example:

Here’s how you might implement Consistent Hashing in Java:

import java.security.MessageDigest;
import java.util.*;

public class ConsistentHashing {
    private final int numberOfReplicas;
    private final SortedMap<Integer, String> circle = new TreeMap<>();

    public ConsistentHashing(List<String> nodes, int numberOfReplicas) {
        this.numberOfReplicas = numberOfReplicas;
        for (String node : nodes) {
            addNode(node);
        }
    }

    private int hash(String key) {
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            byte[] digest = md.digest(key.getBytes());
            return ByteBuffer.wrap(digest).getInt();
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    public void addNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.put(hash(node + i), node);
        }
    }

    public void removeNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove(hash(node + i));
        }
    }

    public String getNode(String key) {
        if (circle.isEmpty()) {
            return null;
        }
        int hash = hash(key);
        if (!circle.containsKey(hash)) {
            SortedMap<Integer, String> tailMap = circle.tailMap(hash);
            hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        }
        return circle.get(hash);
    }

    public static void main(String[] args) {
        List<String> nodes = Arrays.asList("A", "B", "C");
        ConsistentHashing ch = new ConsistentHashing(nodes, 3);

        System.out.println(ch.getNode("K1"));  // e.g., outputs 'B'
        System.out.println(ch.getNode("K2"));  // e.g., outputs 'C'

        ch.addNode("D");
        System.out.println(ch.getNode("K1"));  // Node might change depending on the hash
    }
}

When to Use Consistent Hashing:

  • Distributed Caches: Systems like Memcached use consistent hashing to distribute cache keys.
  • Sharding in Databases: Helps in partitioning data across multiple database nodes.
  • Load Balancers: Ensures requests are evenly distributed and can handle node failures gracefully.

Consistent Hashing is a fundamental concept for scalable, fault-tolerant distributed systems. By using the hash ring and virtual nodes, it ensures minimal disruption during scaling or failure scenarios.

Aquinas

Hello! I'm Aquinas, a lifelong learner who finds everything in the world fascinating. I can’t ignore my curiosity, and this blog is where I document my journey of learning, exploring, and understanding various topics. I don’t limit myself to a single field—I enjoy diving into science, philosophy, technology, the arts, and more. For me, learning isn’t just about gathering information; it’s about applying knowledge, analyzing it from different perspectives, and discovering new insights along the way. Through this blog, I hope to record my learning experiences, share ideas, and connect with others who have a similar passion for knowledge. Let’s embark on this journey of exploration together! 😊

Recent Posts

What Is EPS(Earnings Per Share)?

When analyzing a stock, one of the first financial indicators you’ll encounter is EPS, or Earnings Per Share. It’s one… Read More

8 months ago

What is Market Capitalization? Everything Investors Need to Know

When you look at a stock’s profile on a financial website, one of the first things you’ll see is its… Read More

8 months ago

The MIT License

In the world of open-source software, simplicity and flexibility are often just as important as legal protection. That’s why the… Read More

9 months ago

Mozilla Public License (MPL)

If you want your software to be open source, but still compatible with commercial use—and not as restrictive as the… Read More

9 months ago

The Apache License 2.0

When it comes to open-source software, developers and businesses alike need licenses that balance freedom, legal clarity, and long-term security.… Read More

9 months ago

BSD (Berkeley Software Distribution) License

If you’re working on open-source projects or choosing third-party libraries for your software, understanding software licenses is essential. Among the… Read More

9 months ago