![]() |
In this article implementation of the Counting Bloom Filter is going to be discussed. Following are the topics that are going to be covered.
Why do we need Probabilistic Data Structures?In this Big Data Age, every company was processing a large amount of data each day. e.g., IBM has stated that people create 2.5 quintillion bytes of data every day, this number is growing constantly, so companies need some technique to quickly understand the data to find value and act on it. However, the traditional technologies which include data structures and algorithms, become ineffective when dealing with Big Data. The best solution for this type of problem is using Probabilistic Data Structures. A deterministic data structure can also perform all the operations that a probabilistic data structure does but only with low data sets. If the data set is too big and couldn’t fit into the memory, then the deterministic data structure fails and is simply not feasible. PDS(Probabilistic Data Structures) always provide approximated answers but with reliable ways to estimate possible errors. Fortunately, the potential losses and errors are fully compensated for by extremely low memory requirements, constant query time, and scaling, the factors that become essential in Big Data applications. What are some areas we apply to?As mentioned previously the advantage of PDS appears in Big Data Problems like
What is the membership type of problem?A membership problem for a dataset is a task to decide whether some element belongs to the dataset or not. Unlike deterministic data structures which find the place where the exact match occurred. In many of the scenarios, we don’t need to know which element from the set has been matched, only that a match has been made and, therefore it is possible to store only signatures of the elements rather than the whole value. How is counting Bloom Filter different from traditional Bloom Filter?The main disadvantage of classical Bloom Filter is deletion operation is not possible in that i.e., only the element can be inserted and whether the element is present in the data or not can be checked. Even if anyone try to change the bits in the bit array corresponding to the k positions it may lead to false negatives. Fortunately, missing deletion support is not a problem for many real-world applications, Counting Bloom Filter and its ImplementationThe most popular extension of the classical Bloom filter that supports deletion is the Counting Bloom filter, proposed by Li Fan, Pei Cao, Jussara Almeida, and Andrei Z. Broder in 2000. Counting Bloom Filter introduces an array of m counters {Cj}mj=1 corresponding to each bit in the filter’s array. The Counting Bloom filter allows approximating the number of times each element has been seen in the filter by incrementing the corresponding counter every time the element is added. The associated CountingBloomFilter data structure contains a bit array and the array of counters of length m, all initialized to zeros. When a new element is inserted into CountingBloomFilter, first compute its corresponding bit-positions, then for each position, we increment the associated counter, These are some basic ideas to code in any preferred language.
Now, finally the important deletion part. The deletion is quite similar to the insertion but in reverse. To delete an element x , we compute all k hash values hi = {hi(x )}ki=1 and decrease the corresponding counters. Below is the implementation of the above Algorithm Python3
In the above code for hashing the elements FNV hash function(you may also use the murmur hash function) is used which is widely used in probabilistic data structures. n is the expected no of elements that you are going to enter into the filter. Counter_size is the number of bits for each counter in the bucket. bucket_size is the m number of buckets we are going to use in the filter and finally, the no_hashfn is k the number of hash functions that we are going to use. Now test the class with some examples save the file as counting_bloom_filter.py now create another file for testing. Python3
Output: Properties: The Counting Bloom filter inherits all the properties of the classical Bloom filter, including false-positive error estimation and recommendations for the optimal choice of m and k refer. |
Reffered: https://www.geeksforgeeks.org
Python |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 9 |