Understanding Bloom Filter
It is a probabilistic space-efficient data structure with O(1) insertion and retrieval. It is used to find if a key exists (with false positives) or not (no false negatives). It can be used as a data layer to check if a key exists before querying the actual database (expensive read vs cheap read).
Pros-
Uses constant space (Space efficient).
O(1) insertion and retrieval
Cons-
- False positives. It can not accurately tell if a key exists for sure in the bloom filter. But it can tell if a key does not exist in there.
How it works?
It requires some hash functions and a binary hash table.
First, we set all the bits in hash table to 0
Let’s say we have two hash functions H1 and H2
Insertion:-
→ Insert “Hello”. H1(“Hello”) = 0, H2(“Hello”) = 3
→ Since bits at 0 and 3 positions are set to 0, we toggle it to 1.
→ Now the hash table looks like this
→ Insert “World”. H1(“World”) = 5, H2(“World”) = 3
→ Toggle bits. Now the hash table looks like this
Searching:-
→ Search “World”. H1(“World”) = 5, H2(“World”) = 3.
→ Check bits at 3rd and 5th index. Both are set to 1. So it would return 1. Maybe exists case
→ Search “Foo”. H1(“Foo”) = 0, H2(“Foo”) = 5
→ Check bits at 0th and 5th index. Though it will return 1. But “Foo” does not exist. False positive case.
Applications:-
It is mostly used in conjugation with primary DB to check if an element is present or not and then search the primary DB. It can be used at places where we are okay with false positives and a quick lookup can save the expensive DB read.
Used by Medium to show if an article is read by the user.
Malicious sites blocker by Google.
Password and username duplicate check.
Used to show target ads.
Default support with databases
Postgres — Postgres supports bloom index which is pretty useful for membership check cases. Link — https://www.postgresql.org/docs/current/bloom.html
Redis — Redis (4.0 +) has ReBloom library which has bloom filter application. Link — https://redis.com/blog/bloom-filter/
Some databases like HBase and Cassandra by default use bloom filter to filter out queries.
Subscribe to my newsletter
Read articles from Surabhi Suman directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Surabhi Suman
Surabhi Suman
Hey! I am a software engineer by profession. Passionate about Ruby, Database internals, optimization, distributed systems