Standard 19: Hash Tables

Short description: Analyze hash tables and random hashing processes; pigeonhole, birthday paradox.

Relevant notes: Topic G: Hash Tables and P vs NP

Full description

Analyze the complexity of a sequence of hash table insertions or retrievals under specific hash functions or under the Uniform Hashing Assumption. Analyze random hashing phenomena such as the birthday paradox.

Example problem

Suppose we are hashing 64-bit integers to $\{1,\dots,1024\}$. We will use a naive hash function, $h(x) = x % 1024, where % is the "modulo" operator.

Part a. Give a set of $1024$ integers that are "perfectly hashed": each one goes to a different location. Briefly explain.

Part b. Give a set of $1024$ integers that are hashed in the worst way: they all go to the same location. Briefly explain.

Part c. Let us replace $1024$ by $n$. After building the hash table as in Part a, what is the asymptotic running time of calling Get() on all $n$ integers? And in Part b? Briefly explain.

Sample solution

Sample solution.

Part a. For example, we can take $\{0,1,\dots,1023\}$. Each nubmer is hashed to itself.

Part b. For example, $\{0, 1024, 2048, \dots\}$, that is, multiples of $1024$. All of these are hashed to slot 0.

Part c. For Part a, it's $O(n)$. For each of $n$ Get() calls, we go to the hash location and there is only one element there, so each call is $O(1)$. For Part b, it's $O(n^2)$. For each of the $n$ Get() calls, we need to traverse the list until we find it. The items are at locations $1,2,\dots,n$ in the list. This takes $O(1 + 2 + \cdots + n) = O(n^2)$ time.

Rubric

Rubric.

To pass, a solution must follow the course requirements for writing.

To pass, a solution must pass all three parts. For each part, the answer must be correct and clear. In Part c, the answers are $O(n)$ and $O(n^2)$ respectively. The explanations must be brief and to the point.

The numerical rubric for this problem is:

4 (Pass, Near-Perfect): correct answers and explanations, clear, brief.
3 (Pass, Proficient): correct answers and explanations. May give brief explanations and may hand-wave slightly. In Part c, may be very brief if the answer is correct, as we essentially covered this point in class.
2 (Fail, Progress): may have one or more incorrect answers or may include an incorrect statement in the explanations. May give a convoluted or difficult to follow response, even if it contains correct components.
1 (Fail, Wrong Track): May give multiple incorrect answers and/or appear to mis-use the given hash function.