# Data Structures – Bloom Filter

## Description

A Bloom filter is a data structure which is similar to a set. The Bloom filter allows for the question Is this item a member of the set? to be answered quickly. The filter will never return No if the item is in the set; but, it may return Yes if the item is not in the set. This is one downside to using a Bloom filter, there is a chance for false positive results when checking if an item is in the set. An upside to using a Bloom filter is that adding an item to the set and checking if the item is in the set is a constant time O(n) operation.

## Example

The following example uses a Bloom filter to create a friend list. The example implementation uses three hashing functions. The hashing functions ingest a string (a friend name) and calculate a single value for the string based on the number of spots in the Bloom filter.

Create filter as an array of 10 indices. A `0` indicates no item is at that index.

`[0,0,0,0,0,0,0,0,0,0]`

A user adds David to the friend list. The string (`'David'`) is put through several hashing functions which return `0`, `4`, and `8`, respectively. The values from the hashing functions are used to update the filter array at those indices.

The filter indices are updated using those hashed values. A `1` indicates an item has been added at that index.

`[1,0,0,0,1,0,0,0,1,0]`

A user adds Rosie to the friend list. The hashing functions return `3`, `4`, and `6` for `'Rosie'`. The filter indices are updated using the hashed values.

`[1,0,0,1,1,0,1,0,1,0]`

There is a check if Chuck is a member of the friend list. The string `'Chuck'` is put through the hashing functions returning `1`, `3`, and `6`. When the filter array is checked at those indices, it returns `0`, `1`, and `1`. Because one index has a `0` value, Chuck is definitely not a member of the list.

There is a check if Maja is a member of the friend list. The string `'Maja'` is put through the hashing functions, returning `0`, `6`, and `8`. When the filter array is checked at those indices, it returns `1`, `1`, and `1`. Because all three indices have a value of `1`, Maja may already be a member of the list. This is a false positive result.

## Considerations

Bloom filters allow for quick look up to determine if a value is possibly a member of the set or definitely not a member of the set. The more items added to the Bloom filter the rate of false positive results when checking if an item is a member of the set will increase. One way to decrease the rate of false positive results is to increase the size of the array. Although this is a compromise because the larger the array, the greater memory it will occupy. It is necessary to determine an acceptable rate of false positive results for a given array size.