Has your colleague invented a useful data structure or not? Explain your answer.
Unfortunately no, your colleague has not invented a useful heap data structure.
The problem is that the bottom level of a heap may be unordered, but it's not
random: each element i
in the bottom level must obey the heap property
h[parent(i)] >= h[i]
. Re-arranging the bottom half of the heap to form a
second heap may violate the heap property in the top half.
Take, for example, the max-heap 9, 5, 8, 3, 4, 7, 6. The last four elements in
the max-heap can be re-arranged to form the min-heap 7, 6, 4, 3 (with the
minimum heap top at the right this time). However, the combination of the two
heaps 9, 5, 8, 7, 6, 4, 3 violates the max-heap property because
h[parent(4)] = h(2) = 5 >= h[4] = 7
is false.
This is not to say that it wouldn't be possible to form a double-binary heap. All that needs to be done is to store the heap in descending order; for example, 9, 8, 7, 6, 5, 4, 3 is a double-binary heap. However, maintaining a sorted list in the presence of insertions and (perhaps) deletions requires O(n) work (although you can find the insertion point with O(log n) work, you may have to move O(n) elements to do the insertions), so the O(log n) heap behavior is lost.
Every tree of a binomial heap is a binomial tree, so we need to answer the question for binomial trees.
The binomial tree B0 has one node at level 0. Let's assume the binomial tree Bi has one node at level i and consider the binomial tree Bi + 1. Bi + 1 comprises the two binomial trees Bi, one of which is the left-most child of the other. By our assumption, the leftmost-child binomial tree has a single node at level i, which becomes a node at level i + 1 in Bi + 1. Ignoring its left-most child, the parent binomial tree has height i and so has no nodes at level i + 1. From this Bi + 1 has one node at level i + 1.
The binomial tree Bi has one node at level i and so does every tree in a binomial heap.
The load factor on the close-addressed table Tc is
ac = nc/mcwhere nc is the number of elements stored in Tc and mc is the number of slots in Tc. The amount of space used by Tc is
sc = 12nc + 4mc
Using similar notation, the load factor on the open-addressed table To is
ao = no/moBecause the To needs to hold the elements of Tc, no = nc. The amount of space used by To is
so = 8mo
The trick to converting from closed- to open-addressed hash tables is the load factor: ac can be greater than 1, while ao can be at most one. If the load factor on the converted table can't be made at most 1, then the close-addressed table can't be converted. Because both hash tables must store the same number of entries, the smallest ao occurs with the largest number of slots possible. This implies that
so = scor
8mo = 12nc + 4mcor
mo = (3nc + mc)/2With that slot count, the load factor for To is
nc/(3nc + mc)/2 <= 1or
2nc <= 3nc + mcor
-mc <= ncThis is always going to be true, which means that whenever Tc's load factor is at least 1, it should be converted to an open-addressed table having a load factor of at most 1. In retrospect, this is obvious because it takes at least 12 bytes to store an entry in in Tc, while it only takes 8 bytes to store it in To. Using Tc's space for To will always provide enough room to store all Tc's entries in To. In fact, assuming a thinner margin of safety was acceptable, your colleague need take only 12(ceiling(nc/3)) bytes from Tc for To because every two entries in Tc uses enough extra space (the space for the pointers) to store another entry.
What happens when ac is less than 1? The load factor on the converted table must be at most the load factor on the original table, or
ao <= acor
nc/mo <= nc/mcor
2/(mc + 3nc) <= 1/mcor
2mc <= mc + 3ncor
mc <= 3ncor
1/3 <= acThat is, when ac is less than 1, Tc should be converted only when there are three times as many elements as there are slots. In retrospect, this seems, if perhaps not obvious, at least reasonable. Converting four-byte mc slots into eight-byte mo slots reduces the number of slots by half. Because the number of entries is the same in each table, the load factor for the converted open-addressed table is higher than the load factor for the original, closed-address table. Reducing ao requires more slots, which come from the space used to store the entries in Tc; to reduce ao enough, there have to be enough entries in Tc.
To summarize, your colleague should convert any closed-address table with a load factor of at least 1/3, using all storage in the closed-address table for the open-address table. In addition, if the closed-address load factor is at least 1, the space requirements for the open-address table can be reduced to 12(ceiling(nc/3)) bytes, assuming your colleague is willing to run with a smaller margin of error.
Accessing string characters can be done in constant time. There are a constant number of character that can be used to distinguish among the month names. These two facts can combine to serve as the basis of a hash function.
Each month contains at least three characters, so it makes sense to try and distinguish among the months using the first three characters. The first character alone can't distinguish among the months because, for example, three months start with 'J'. The second character isn't any good either because, for example, three months have 'u' as their second character; similarly, two months have 'r' or 'n' as their third character.
Considering two of the three initial characters may work. The first and second characters don't distinguish between "June" and "July". The first and third characters don't distinguish between "June" and "January". Fortunately, the second and third characters distinguish between all twelve months:
j a n m a r m a y o c t f e b d e c s e p n o v a p r a u g j u l j u n
The hash function could be derived from the ASCII value of the characters
themselves, but the characters represents a range of 25 values from 0 ('a'
- 'a'
) to 24 ('y' - 'a'
), which requires 5 bits, resulting in a 10-bit
has value, which is two bits larger than it should be. To get an 8-bit hash
value, the characters will have to be mapped to something other than their
ASCII values.
There are 6 unique second characters, which takes three bits to represent (using binary to represent the character values):
There are ten third characters, so four bits are needed to give them all values (the characters 'c' and 'p' are also second characters; their values reflect the three-bit values assigned above):
a 000 c 001 e 010 o 011 p 100 u 101
b 0000 c 001 g 0010 l 0011 n 0101 p 100 r 0110 t 0111 v 1000 y 1001
A hash function using these character mappings
produces the month mappings// a b c d e f g h i j k l m n o p q r s t u v w x y z int c2i[] = { 0 0 1 0 2 0 2 0 0 0 0 3 0 5 3 4 0 6 0 7 5 8 0 0 9 0 } int hash(char m[]) return c2i[m[1] - 'a']*4 + c2i[m[2] - 'a']
jan 5 mar 6 may 9 oct 23 feb 32 dec 33 sep 36 nov 56 apr 70 aug 82 jul 83 jun 85
The hash table can be shortened up a bit by noting that it isn't necessary to distinguish all the third characters from each other; only the third characters with the same second character need to be distinguished. Because at most three third characters have the same second character, the third characters can be mapped into two bits (the values of 'o' and 'p' have been swapped to give 'p' a value with two significant bits when used as a third character; the value of 'c' already has at most two significant bits):
A hash function using this character mapping
a 000 c 001 e 010 o 100 p 011 u 101 b 00 c 001 g 10 l 01 n 00 p 011 r 01 t 00 v 00 y 10
produces the month mapping// a b c d e f g h i j k l m n o p q r s t u v w x y z int c2i[] = { 0 0 1 0 2 0 2 0 0 0 0 1 0 0 4 3 0 1 0 0 5 0 0 0 2 0 } int hash(char m[]) return c2i[m[1] - 'a']*4 + c2i[m[2] - 'a']
jan 0 mar 1 may 2 oct 4 feb 8 dec 9 sep 11 apr 13 nov 16 jun 20 jul 21 aug 22
Hash functions that have no collisions are known as perfect hash functions.
This page last modified on 12 March 2000.