CS 306 Quiz 2

CS 306, Computer Algorithms II

Quiz 2, 30 September 2008

Can keys be deleted from a Bloom filter? Explain your answer.
Nope. Bloom filters don’t store keys, making it impossible to tell which keys hash to an index. Inverting a bit for one key inverts it for all keys hashing to the same index, an unhappy property for a hash table to have.
Here’s a hash function for an open-addressed hash table:
h(k) = h₁(k + i) mod B
where h₁ is an auxilliary hash function, i is the number of steps in the probe sequence (starting from i = 0), and B is the size of the scatter table.

Explain the advantages and disadvantages of h in collision resolution when compared to linear probing.

Although it depends on the quality of h₁, the index sequence generated by h is much more unpredictable than the index sequence generated by linear probling, which is essentially h₁(k), (h₁(k) + 1) mod B, …, (h₁(k) + B) mod B. This should result in smaller cluster sizes.
The disadvantages are a particular key value may not uncover empty scatter-table buckets, resulting in false-negative insertion failures, and the index sequence may visit the same scatter-table bucket several times.
A colleague tells you a great idea: replace the linked lists in a chained scatter table with binary search trees. Do you think you colleague’s idea is great? Explain.

Not really. The main advantage binary search trees (bst) have over linked lists is better search time on average: order log n vs. order n; the worst-case search time is O(n) for both. Linked lists have faster insertion times: exactly constant vs. order log n on average and O(n) worst case.

N would have to be large for the bst search advantage to show up; that is, either the hash function is bad or the load factor is high. A bad hash function can only be fixed by using a better hash function. If the load factor is high, the bst overhead — at least one extra pointer per node — is better spent in increasing the scatter table size to reduce the load factor, in which case the bst search advantage is reduced and the linked list retains its insertion advantage, as well as the advantage of overall simplicity.

On the other hand, if your colleaue’s hash table uses big buckets, as might be the case for dynamic hash tables, then bst might have a significant advantage over linked lists.
True or false: given a chained hash table, bad hash functions lead to high load factors. Explain your answer.
False: the load factor depends on the total number of keys in the scatter table, not their distribution. A bad hash function may mal-distribute keys in the scatter table, but it won’t change the number of keys in the table.