
The Count-Min Sketch (CMS) is a probabilistic data structure used for frequency estimation in streaming and big data applications. It allows us to quickly approximate the frequency of items in a data stream while using sublinear memory.
CMS consists of:
[d x w] d: Number of hash functions (rows).w: Number of columns per row (hash table size).To increment the count of an item X:
d hash values (h1(X), h2(X), ..., hd(X)) to find column indices.To estimate the frequency of X:
d hash values (h1(X), h2(X), ..., hd(X)).Let’s say we process a data stream with words:
"apple", "banana", "apple", "cherry", "banana", "apple"
We construct a 3×10 Count-Min Sketch:
[ 0 1 2 3 4 5 6 7 8 9 ]
H1 -> [ 0 0 1 2 1 0 0 1 1 0 ]
H2 -> [ 1 0 0 1 0 2 1 0 0 0 ]
H3 -> [ 0 1 0 0 1 1 0 1 0 1 ]
"apple" → Take min(2,2,1) = 1"banana" → Take min(1,1,0) = 0import hashlib
import random
class CountMinSketch:
def __init__(self, width=10, depth=3):
self.width = width
self.depth = depth
self.table = [[0] * width for _ in range(depth)]
self.hash_seeds = [random.randint(0, 10000) for _ in range(depth)]
def _hash(self, item, seed):
"""Hash function using hashlib with a seed."""
return int(hashlib.md5((str(item) + str(seed)).encode()).hexdigest(), 16) % self.width
def update(self, item, count=1):
"""Increment the count of an item."""
for i in range(self.depth):
index = self._hash(item, self.hash_seeds[i])
self.table[i][index] += count
def estimate(self, item):
"""Estimate the frequency of an item."""
return min(self.table[i][self._hash(item, self.hash_seeds[i])] for i in range(self.depth))
# Example Usage
cms = CountMinSketch()
cms.update("apple", 3)
cms.update("banana", 2)
cms.update("apple", 1)
print("Estimated count of apple:", cms.estimate("apple")) # Approx. 4
print("Estimated count of banana:", cms.estimate("banana")) # Approx. 2
print("Estimated count of cherry:", cms.estimate("cherry")) # Approx. 0
import java.security.MessageDigest;
import java.util.Random;
class CountMinSketch {
private int width, depth;
private int[][] table;
private int[] hashSeeds;
private Random rand = new Random();
public CountMinSketch(int width, int depth) {
this.width = width;
this.depth = depth;
this.table = new int[depth][width];
this.hashSeeds = new int[depth];
for (int i = 0; i < depth; i++) {
hashSeeds[i] = rand.nextInt(10000);
}
}
private int hash(String item, int seed) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] digest = md.digest((item + seed).getBytes());
return Math.abs(digest[0]) % width;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
public void update(String item, int count) {
for (int i = 0; i < depth; i++) {
int index = hash(item, hashSeeds[i]);
table[i][index] += count;
}
}
public int estimate(String item) {
int min = Integer.MAX_VALUE;
for (int i = 0; i < depth; i++) {
int index = hash(item, hashSeeds[i]);
min = Math.min(min, table[i][index]);
}
return min;
}
public static void main(String[] args) {
CountMinSketch cms = new CountMinSketch(10, 3);
cms.update("apple", 3);
cms.update("banana", 2);
cms.update("apple", 1);
System.out.println("Estimated count of apple: " + cms.estimate("apple")); // Approx. 4
System.out.println("Estimated count of banana: " + cms.estimate("banana")); // Approx. 2
System.out.println("Estimated count of cherry: " + cms.estimate("cherry")); // Approx. 0
}
}
| Feature | Count-Min Sketch | Hash Table |
|---|---|---|
| Space Efficiency | ✅ Very Efficient (O(log n)) | ❌ High (O(n)) |
| Update Time | ✅ O(1) | ✅ O(1) |
| Query Time | ✅ O(1) | ✅ O(1) |
| Accuracy | ❌ Approximate | ✅ Exact |
| Scalability | ✅ Works with massive data | ❌ Memory grows with data |
Count-Min Sketch is a powerful tool for real-world streaming analytics and large-scale data processing! 🚀
When analyzing a stock, one of the first financial indicators you’ll encounter is EPS, or Earnings Per Share. It’s one… Read More
When you look at a stock’s profile on a financial website, one of the first things you’ll see is its… Read More
In the world of open-source software, simplicity and flexibility are often just as important as legal protection. That’s why the… Read More
If you want your software to be open source, but still compatible with commercial use—and not as restrictive as the… Read More
When it comes to open-source software, developers and businesses alike need licenses that balance freedom, legal clarity, and long-term security.… Read More
If you’re working on open-source projects or choosing third-party libraries for your software, understanding software licenses is essential. Among the… Read More