CS 503, Advanced Programming I

Quiz 6, 18 April 2006


  1. Explain why it is you have to be careful when discussing load factors for Bloom filters.


    The load factor is the number of entries stored in the buckets divided by the number of buckets in the table. Bloom filters don't resolve collisions; when more than one element hashes to a bucket in a Bloom filter only the bucket involved gets set. This tends to underestimate the load factor, and while low load factors are normally good, in Bloom filters it indicates the potential for increased collisions, leading to more false positives.

    Most of the answers indicated a general notion of the relation between load factors and performance, but missed the Bloom filter's special collision behavior and its effect on the load factor.


  2. Suppose you were to implement a hash table in which adding and removing elements from the table occurs more frequently than searching for them. Explain how you would handle collisions under these conditions.


    The difference between adding, removing, and searching operations is a bit of distraction because they all require searching (adding may search for an open bucket). There are two choices: open addressing and chaining. Assume a good hash function; if the hash function is bad it doesn't matter what you do.

    Open addressing has the advantage that it doesn't require storage management, but it does require searching for an open bucket on collisions. In addition, open addressing needs some extra machinery to deal with clustering. Chaining flips the advantages and disadvantages; collisions are resolved simply (assuming the bucket linked lists are maintained simply by adding new elements at the head) but requires storage management for the bucket linked lists.

    A good answer to this question needed to cover both open addressing and chaining, which most answers didn't. All the answers that covered open addressing confused it with collision handling (linear or quadratic probing or double hashing), which is generally o.k., but not a good idea on a test.


  3. Explain why the birthday problem is relevant to hashing.


    See Nyhoff page 710.


  4. Describe a test case you would write to force the following code to fail.
    std::string word;
    char buf3[256];
    
    // whatever
    
    memset(buf3, '\0', 256);
    word.copy(buf3, 256);
    
    You may assume that you can place any string value you want in word and that any failure you induce will be detected.


    Two good rules for finding problems in code are:

      LI() Know your tools. LI() Look at the extreme cases.

    In this problem the tools are memset() and the string standard library, in particular the copy() member function. memset() sets a block of bytes to a particular value. string::copy() copies the character data in a string to an array of characters.

    When dealing with arrays, the usual extreme cases are empty and full. Because strings can contain an arbitrary number of characters, and so don't get full, the cases apply to buf3.

    Nothing bad goes wrong when word contains the empty string; copy() does nothing and memset() sets buf3[0] to the null byte, a proper empty array of characters in C. However, if word contains exactly 256 characters, it fills all of buf3. In addition, copy() does not append a null byte, so the character array in buf3 is not null terminated, which is the problem.

    Despite repeated hints that this question might be important, most of the answers were wrong.



This page last modified on 27 April 2006.