Bloom Filters- An Intro

Bloom Filters are quite simply put space-efficient, probabilistic data structures which help answer the question of set membership. From a mathematical lens say that you need to verify whether an element x belongs to a set A , if the Bloom Filter as a data structure returns true for x, what this means is that either x is definitely in the set A or it may not be in the set A. An implication that follow from this is that Bloom Filters are quite efficient at preprocessing and filtering out large lists, for further detailed set operations such as joins and intersections.

Image Courtesy — https://www.slideshare.net/hamidfzm/bloom-filter-56450239 — Hamid Feizabadi

The next questions which arise are how can we create Bloom Filters ? What exactly is involved in Bloom Filter probabilistic data structures ?

For a start let us consider a set S such that:

Let B be a bit array of size m (m > 1), initialized with 0s. B’s elements are B[0], B[1], B[2], …, B[m–1]. The memory required for storing array B is only a fraction of that needed for storing the whole set S. By selecting the bigger bit vector (array B), it is possible to decrease the probability rate of false positives.

Let

  • {H1, H2, …, Hk} be a set of k hash functions. If Hi(xj) = a, then set B[a] = 1. SHA1, MD5, and Murmur as hash functions can be used as hash functions.

• To check if x ε S, check B at Hi(x). All k values must be 1.

  • It is possible to have a false positive; all k values are 1, but x is not in S.

Applications of Bloom Filters

Image Courtesy — https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutReads.html — www.datastax.com

Since Bloom Filters are a space efficient data structure they have a wide array of applications across various fields. For example, a Bloom Filter is used under the hood in the Apache Cassandra distributed database to facilitate read requests to the database. Some of the earliest applications of Bloom Filters include usage in the early development for spell checkers on a dictionary. Some of the more modern applications that they are used for include a variety of use cases varying from usage in network routers and applications to bioinformatics to blockchain implementations.

Co-founder of Helios P2P. Sri Lankan. Interested in Finance, Advanced Analytics, BI, Data Visualization, Computer Science, Statistics, and Design Thinking.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store