Short description: Analyze hash tables and random hashing processes; pigeonhole, birthday paradox.
Relevant notes: Topic G: Hash Tables and P vs NP
Analyze the complexity of a sequence of hash table insertions or retrievals under specific hash functions or under the Uniform Hashing Assumption. Analyze random hashing phenomena such as the birthday paradox.
Suppose we are hashing 64-bit integers to $\{1,\dots,1024\}$. We will use a naive hash function, $h(x) = x % 1024, where % is the "modulo" operator.
Part a. Give a set of $1024$ integers that are "perfectly hashed": each one goes to a different location. Briefly explain.
Part b. Give a set of $1024$ integers that are hashed in the worst way: they all go to the same location. Briefly explain.
Part c. Let us replace $1024$ by $n$. After building the hash table as in Part a, what is the asymptotic running time of calling Get() on all $n$ integers? And in Part b? Briefly explain.
Part a. For example, we can take $\{0,1,\dots,1023\}$. Each nubmer is hashed to itself.
Part b. For example, $\{0, 1024, 2048, \dots\}$, that is, multiples of $1024$. All of these are hashed to slot 0.
Part c. For Part a, it's $O(n)$. For each of $n$ Get() calls, we go to the hash location and there is only one element there, so each call is $O(1)$. For Part b, it's $O(n^2)$. For each of the $n$ Get() calls, we need to traverse the list until we find it. The items are at locations $1,2,\dots,n$ in the list. This takes $O(1 + 2 + \cdots + n) = O(n^2)$ time.
To pass, a solution must follow the course requirements for writing.
To pass, a solution must pass all three parts. For each part, the answer must be correct and clear. In Part c, the answers are $O(n)$ and $O(n^2)$ respectively. The explanations must be brief and to the point.
The numerical rubric for this problem is: