See the assignment turn-in page (last modified on 18 January 2006) for instructions on turning in your assignment.
const unsigned char_values[26] = { /* ??? */ } unsigned hash_function(const std::string & str) { assert(not str.empty()); const char first_char = str[0], last_char = str[str.size() - 1]; assert(isalpha(fist_char) and isalpha(last_char)); return str.size() + char_values[tolower(first_char)] + char_values[tolower(last_char)]; }
Two strings are said to collide if they are different strings but produce the same hash value; that is
s1 ≠ s2
but
hash_function(s1) = hash_function(s2)
Because letter-case is ignored, "Hello"
does not collide with
"hello"
: they are the same string.
char_values
?
Write a program that reads a set of words from std-in and writes to
std-out the values to be stored in char_values
; fifteen is the
largest value that can be assigned to a letter. The hash function
defined by the output values should not collide for any pair of the input
strings; that is, if
s1 ≠ s2
are strings given as input and hash_function
is using the character values
output, then
hash_function(s1) ≠ hash_function(s2)
In addition, your program should be efficient, which means it should take an amount of time that is within a factor of two (on the high side) of the amount of time taken by my program to find a set of values for the same input.
Given the definition of hash_function()
, it may be impossible to generate
non-colliding character values for some inputs. For example, the strings
"boat"
and "boot"
will always collide, even though they are
different, because the hash function views them as being identical. If your
program can't find a suitable set of character values, it should print a brief,
informative error message to std-err
and exit with no further output.
a
through z
of either case; letter case is ignored.
Words containing any non-letter characters are invalid; if your program reads
an invalid word it should print a brief, informative error message to
std-err and exit with no further processing.
std-out
one per line starting with the value
assigned to a
and ending with the value assigned to z
.
Brief, informative error messages should be sent to std-err
, preceded by
the characters "! "
(an exclamation point followed by a space).
gen-words
can be used to generate a set of words that can be used as input
to your program; gen-words
can be found in the public class directory
/export/home/class/CS-503/a2
. gen-words
accepts an optional command-line
argument which specifies the maximum number of words to generate. For example
$ ./gen-words 5
will generate sets that contain at most five words. Because gen-words
generates a new set of words each time it's run, you may find it convenient to
capture gen-words
output in a file, which can then be repeatedly input
into your program:
$ ./gen-words > words $ ./gen-cvals < words
gen-cvals
, my solution to the problem (compiled for Linux), can also be
found in the same directory.
You can use the Unix time command to determine your program's running time:
$ time ./gen-cvals < words 5 1 [ blah blah blah ] 0 0 4.29s real 4.29s user 0.01s system $
Use the real (total) time for comparisons; in this case, the program took 4.3 seconds to find the letter encodings. See the time man page for more details.
check-hash
in the same directory can be used to check your program's
output. It reads letter codes from std-in and accepts the name of a file
containing words on the command line. If it detects anything wrong, including
a collision between the given words using the letter codes, check-hash
prints an error message to std-err; if there are no problems, there is no
output. An example use:
$ ./gen-words > words $ ./gen-cvals < words | ./check-hash words $
This page last modified on 15 February 2006. |