![]() |
Universal hashing is a technique used in computer science and information theory for designing hash functions. It is a family of hash functions that can be efficiently computed by using a randomly selected hash function from a set of hash functions. The goal of universal hashing is to minimize the chance of collisions between distinct keys, which can lead to degraded performance in hash table operations. In the traditional approach to hashing, a fixed hash function is used to map keys to an array index. However, if the distribution of keys is not uniform, collisions may occur frequently, causing the hash table to degrade into a linked list, which can severely impact performance. Universal hashing attempts to solve this problem by choosing a hash function at random from a family of hash functions. The family of hash functions is designed to minimize the probability of collisions, regardless of the distribution of keys. By randomly selecting a hash function from the family for each new key, the chance of collisions is further reduced. Universal hashing has several advantages, including:It provides a high degree of randomness in the selection of hash functions, which reduces the likelihood of collisions. However, there are also some disadvantages to using universal hashing, including:It can be computationally expensive to generate a large number of hash functions. Hashing is a great practical tool, with an interesting and subtle theory too. In addition to its use as a dictionary data structure, hashing also comes up in many different areas, including cryptography and complexity theory. Universal Hashing refers to selecting a hash function at random from a family of hash functions with a certain mathematical property. This ensures a minimum number of collisions.
A set H of hash functions is called a universal hash function family if the procedure “choose h ∈ H at random” is universal. (Here the key is identifying the set of functions with the uniform distribution over the set.) Theorem: If H is a set of the universal hash function family, then for any set S ⊆ U of size N, such that x ∈ U and y ∈ S, the expected number of collisions between x and y is at most N/M. Proof: Each y ∈ S (y ≠ x) has at most a 1/M chance of colliding with x by the definition of “universal”. So,
Corollary: If H is a set of the universal hash function family, then for any sequence of L insert, lookup, or delete operations in which there are at most M elements in the system at any time, the expected total cost of the L operations for a random h ∈ H is only O(L) (viewing the time to compute h as constant). For any given operation in the sequence, its expected cost is constant by the above theorem. Therefore, the expected total cost of the L operations is O(L) by the linearity of expectation. Constructing a universal hash family using the matrix method: Let’s say keys are u-bits long and the table size M is the power of 2, so an index is b-bits long with M = 2b. What we will do is pick h to be a random b-by-u binary matrix, and define h(x) = hx, where hx is calculated by adding some of the columns of h (doing vector addition over mod 2) where the 1 bits in x indicate which columns to add. (e.g., the 1st and 3rd columns of h are added in the below example). These matrices are short and fat. For instance: Now, take an arbitrary pair of keys (x, y) such that x ≠ y. They must differ someplace, let’s assume they differ in the ith coordinate, and for concreteness say xi = 0 and yi = 1. Imagine we first choose all of h but the ith column. Over the remaining choices of the ith column, h(x) is fixed. However, each of the 2b different settings of the ith column gives a different value of h(y) (in particular, every time we flip a bit in that column, we flip the corresponding bit in h(y)). So there is exactly a 1/2b chance that h(x) = h(y). |
Reffered: https://www.geeksforgeeks.org
Data Structures |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 12 |