CS 305 & 503 Quiz 4

Data Structures & Algorithms
CS 305 & 503

A colleague of yours, knowing that good hash functions are random, writes a hash function that, among other operations, multiplies the key value by a number randomly chosen in the range 1 to n, where n is the size of the scatter table (this multiplication takes place before the key value is reduced mod n). Is this a good idea? Justify your answer.

It's not a good idea. A hash function's most important characteristic is that it be a function, that is, if h(k) = v, then h(k) always equals v. If a hash function multiplies in a random number when computing v, it's likely that successive values computed for k will be different.

The answers, approximately verbatim, and in no particular order:

No. A good hash function — rather, a valid hash function — must produce the same hash code for similar input data. If a random multiplier is used then it would not produce the same value every time for the same key. By “similar input data” I mean data which is considered to have the same value.
No, this is not a good idea. Good hash functions require random and widely spread index.¹ If multiplies a random number, the key value will be changed, and the mod n will also be changed.

1: Yes...? Wouldn't adding randomness make that more possible?
Yes. Given that the other operations the hash function performs are generating a completely random hash value² of the item. The multiplication of the number if randomly selected each time a hash is performed will cause the hash to be offset from the others. This offset will help avoid collisions when inserting items into the hash data structure, as well as cut down on grouping.

2: It shouldn't be completely random, but random with respect to a key sequence. For any particular key, it should be a function.
Yes. As we know the two different values collide when they produce the same hash index,³ so multiplying the key value by a number randomly chosen in the range 1 to n is a good idea. This will create unique keys for values.⁴

3: They collide if the key values are different.

4: How can that work if there are more keys than there are scatter-table indices, as there usually are?
Yes, since the properties of a good hash function are to be random, uniform and wide spread. ???⁵ the hash function following all these properties is a good hash function.⁶

5: Can't read it.

6: Maybe, but what about being a function?
It is not a good idea to multiply the key value by a random number from 1 to n because it will create a pattern that can be detected rather easily if the size of the scatter table is small.⁷ Also, this will not cause the hash function to be uniform because it could cause the values to be packed together, thereby creating a cluster.⁸

7: Is that a problem? The hash indices are created from key values, so any pattern in a sequence of key values will produce a pattern of hash indices.

8: What mechanism would cause multiplying by a random number to produce clusters?
This would not be a good idea if it's going to take place before mod n because it can change the key value with the wrong random number⁹ before reducing it to mod n.

9: What would be the right random number?
Yes, this is a good idea. This helps because a key value is also referred by another index in the array.¹⁰ Another benefit to this idea is that the multiplied value is within the range of scatter table, which would to avoid the problem of going into an infinite loop.¹¹

10: By another index? What does that mean? Why does an index refer to the key value? Why is the index in the array, as opposed to of the array?

11: How would an infinite loop occur otherwise?
No, we use hash function to create index for key value. If we use (k*(1 to n)) % n there is a good chance k = bn,¹² which means multiply the number randomly chosen does not change.¹³ The index key values with k will land in the same index. A good hash function requires widely spread index.

12: What's a good chance? It seems bk has the same chance as any other value. And b is what?

13: What doesn't change? How could multiplying anything by something other than 1 not change a value?

Bubble sort can be modified to stop early when it detects that the array that its working on has become sorted. Can insertion sort be modified in a similar way so that it too will stop when it detects a sorted array? Justify your answer. Note that the modifications don't explicitly check for a sorted array, but rather detect some characteristic of the sort's usual operation and conclude that the array's sorted or not.

Insertion sort can't be modified to take advantage of sorted array segments as easily as bubble sort can. Each bubble sort pass scans the entire unsorted array part, and it's the scan that allows early termination (if nothing's swapped, nothing's out of order, so count the swaps per scan and stop if the count's 0). Insertion sort, however, doesn't scan the unsorted array part, and so has no easy way to discover anything about it.

The answers, approximately verbatim, and in no particular order:

Because a bubble sort shuffles values in an array which can detect when it has become sorted.¹ An insertion will select the next value size value in the array with out moving other values, with this action it can't stop early as a bubble sort.²

1: Don't all sorts shuffle values in an array? What's so special about bubble sort.

2: Are you perhaps thinking of selection sort?
Yes, it can be modified in a similar way.³ Insertion sort will stop when it finished comparing each element in an array and swapping; it takes O(n²).⁴
```
for (int ...)    ⇒ outer loop takes O(n).
  for(int ...)   → inner loop takes O(n).
                 ⇒ O(n²).
```
3: What way is that?

4: Yes? Why is that relevant?
No. Bubble sort can stop early when it detects that the array has been sorted because the bubble sort doesn't need to compare the rest of the array.⁵ but insertion sort is different, even the array were sorted, we still need compare between the insert value and the sorted value in the array.

5: “the rest of the array” means what?
No, you cannot modify insertion sort similarly to bubble sort to stop when detects a sorted array. Because insertion sort is a sort-in-i/p sort.⁶ It sorts (finalize the position) in coming element at the time of insertion itself.
Since insertion sort inserts new elements in sorted order, we cannot modify it to check remaining array is sorted or not.⁷

6: What is a “sort-in-i/p sort?”

7: This is the sentence you need; the previous paragraph was a waste of time.
No. Insertion sort requires a comparison of the currently picked index as a min value to all other values in the sequence being sorted.⁸ This is what leads insertion sort to having O(n²) behavior.⁹

8: Which part of the sequence?

9: True, but what does having O(n²) behavior have to do with being able to stop early?
Yes, insertion sort can be modified to stop when it detects a sorted array. Both bubble sort and insertion sort move many values around, and work in an similar way. Insertion sort slides the value to the right place, and can be modified to keep track of the sorted parts of the array.¹⁰

10: True, but it's the unsorted part that's of interest.
Can not, we use gap to insert one unsorted value to sorted values. Can not track the values in unsorted, have to go through unsorted values to fill in the gap.
Bubble sort can detect a sorted array when it makes a pass through the sequence and does not perform a single swap.
Insertion sort looks at the first value in the sequence of unsorted data, then inserts it at the appropriate location in the sorted sequence. The best you can do is have it sweep the sorted data right-to-left instead of left-to-right. This makes it so you only have to do one comparison for each value in a sorted sequence to determine if it's in the right location. However, there is no way to immediately determine the array is sorted based on a characteristic of the sort's usual operation.
Yes, it is possible to stop insertion sort after it detects it a sorted array. Bubble sort starts from the bottom and goes up, sorting and swapping those values. Insertion sort goes through the array and sees a value that is smaller than value that is in the sort array and inserts the smaller value at the proper place. Once all values are sorted, there are no more unsorted values and the insertion sort would stop.¹¹

11: Yes, but that's when it should stop. The question was, can insertion sort figure out to stop sorting earlier?

A heap sort producing ascending order can use either a min heap or a max heap. Which heap is the better choice? Justify your answer.

Either will work, but max heaps are simpler. Heapsort builds the heap into the array being sorted. As elements are removed from the heap, the heap shrinks from the bottom (at the leaves), opening up space in the array. Assuming an ascending sort and the tree-array indexing given in class, space in the array opens up at the right-end of the array; that is, for the larger values. Building a max heap opens up space in the proper end of the array for the values removed from the heap.

A min heap would have to open up space at the left end of the array, which means the heap would have to build right-to-left (root-to-leaf) in the array. Right-to-left heaps are easy to do (just pretend the heap is left-to-right, then subtract n - 1 from each index, where n is the array size, to flip the index around), but are marginally more complicated than left-to-right heaps.

The answers, approximately verbatim, and in no particular order:

Max heap. Using heap we start from array beginning to end,
1. If use min heap, pop the min value should be stored at beginning of array. We do this by
  1. remove min from heap,¹
  2. add new unsorted value to the heap.²
2. If use max heap, after find max value in the heap, we simply swap it with the end value of array.
1: And do what with the min?

2: From where does the new unsorted value come?
Max heap is the better choice. Because max heap, the root is the largest one and all the children is less than parents. So we can put the root to the end of the order and after that every time we got the rest of largest values. We can put it into the end of the rest order.³

3: And a min heap would fail how?
A min heap is the best choice because we want a heap sort producing ascending order, not descending order.⁴

4: True, but what about min heaps force one ordering over another?
For an ascending order heapsort, the better choice would be a min heap, since it is an ascending order you need to start with the smaller value, then chose the next greater value (basically start with the smallest value then chose the next smallest value, and so on).⁵

5: True, but is that the way heapsort works?
Min heap is the best choice. Being that we want the data to be sorted to ascending order, we want the least value at the first position in the sequence. A min heap has the least value at its head, since it guarantees all of its descendant values are at least its value. The array based implementation of a min heap further supports this as the head of the heap is stored at the first position in the sequence (index 0).⁶ This property, the heap property, continues recursively in the heap.⁷

6: True, but then what happens?

7: Yes, but now you have two heaps (the children of the root).
Min heap would be a better choice because the property of min heap would help to produce a sort in an ascending order in more time efficient manner.⁸ The property of the min heap that every value in parent node is less than its subnodes. So, we know the values start from the smallest and increases ascendingly. So, using a min heap is a better choice, when all the values need to be ascending order.

8: More efficient than what? A max heap perhaps?
A max heap is the better choice because a heap sort repeatedly takes the max heap element and stores it in an open space. In a max heap, the value of the parent is always greater than the value of its child, therefore a heap sort can start at the root and work its way down to its children.⁹

9: As opposed to always dealing with the root? What does working down to the children mean?
Heap¹⁰ is a guaranteed O(n log n) complexity algorithm. For ascending behavior thus asymptotically both are same.¹¹ But the expense of swapping elements is more in max heap compared to min heap.

∴ min heap is better.

10: Do you mean heapsort?

11: Both of what are the same?
A min heap would be a better choice for arranging elements in ascending order. This would provide more efficient access to elements of lower value prior to elements of grater value.¹² One would be able to access elements of lower value from a higher level¹³ then grater values instead of having to traverse fully to leaf nodes at the bottom of the tree.¹⁴

12: More efficient in what way?

13: To what does “higher level” refer?

14: That's not the way heaps work. Why would you want to do that?

Selection sort and bubble sort are both O(n²) sorts, but usually selection sort is faster than bubble sort on the same array. A colleague of yours explains the difference by noting that selection sort swaps values that are usually far away from each other, and so is more efficient that bubble sort. What do you think of your colleague's explanation? Justify your answer.

The explanation almost completely misses the point. Both bubble sort and selection sort are O(n²) sorts, but only if you're counting data touches or comparisons. Both sorts repeatedly scan the unsorted array part, but bubble sort drags along the extreme value, while selection sort remembers its index. When counting data movements per scan, bubble sort is worst-case O(n), but selection sort is worst-case O(1), and that's why selection sort can be more efficient than bubble sort, assuming expensive data movements.

The answers, approximately verbatim, and in no particular order:

I do not think my colleague's explanation is accurate. The reason selection sort is usually more efficient than bubble sort is because selection sort keeps track of the smallest value, whereas bubble sort moves much data because it “bubbles” each value into the correct place. Selection sort uses fewer assignments than bubble sort, causing selection sort to be cheaper as well.
He is on the right track. Bubble sort is slow when a value being swapped is far from its final destination. This is because it swaps the value X times where X is the distance between the value and its destination. Worst case, a value is swapped (n - 1) times. Selection sort solves this issue by just pointing to values that need to be swapped an once it selected the proper values it preforms only one swap. It is the nature of bubble sort which limits its swaps to adjacent values that makes it slower than selection sort.
I think my colleague's explanation is wrong. It does not matter if selection sort swaps values far away from each other or near each other. It swaps values when necessary,¹ for example, if we want to sort element in ascending order, then if a > b then swaps it²
1. largest element in sorted array.
2. smallest element in unsorted array.
1: But doesn't bubble sort do that too?

2: Which it? A or b?
Yes, I believe the explanation. Bubble sort starts from bottom of array, swap the lowest value to the top and starts again from the bottom of the array and goes to the top swapping the value and sorts an array.
Selection sort starts from the top of an array and goes to the bottom, selects the values to be swapped in order to make a sorted array. Selection sort is thus faster, because it can select a value and swap it to the proper place in an array.
The reason selection sort usually faster than bubble sort is because selection sort does not swap values in the inner loop. It swaps only after it finds the min value, so the number of swaps is less than bubble sort, not because swap values are far away from each other.
I think that's right, because both bubble sort and selection sort swap values. Bubble sort compare two values which are nearby; selection sort can compare two values which are far away.³ For example, in a sequence list maybe only two value are not in order. So we can just swap the two values without comparing each one.⁴ So the selection sort should be more efficient than bubble sort in some particular array.

3: But the cost of comparisons don't change from sort to sort, or with the distance.

4: But how do you find the out-of-order elements?
I think the explanation is not relevant. The selection sort is faster than bubble sort because it's number of swaps per iteration are less than bubble sort.

Selection sort scans entire sequence and selects one max/min value and place in its position. Swaps = O(1) per iteration.

Bubble sort compares every element and makes swaps at every new element till the max value reaches its position (i.e., the bubbling up behavior). Swaps = O(n) per iteration.
Thus: The expense of swaps is the difference between selection and bubble sorts.
I agree that selection sort is faster than bubble sort.⁵ This is because selection sort uses 2/3 less assignments than bubble sort.⁶

5: But do you agree with why selection sort's faster?

6: In the worst case, selection sort uses asymptotically fewer assignments than does bubble sort.
Swapping values from “far away” is one property of selection sort that is different from bubble sort. It does allow more flexibility in the items that insertion sort⁷ may swap as opposed to bubble sort. The only swapping that bubble sort may do is on elements that neighbor each other. This characteristic causes bubble sort to perform more operations in the form of swaps to move an element the same number of index values that insertion sort could do in one swap. They are correct but in an over-simplified way without justification.

7: Do you mean “selection sort?”

Radix sort was described in class as working from left to right (from most significant to least significant) through the digits. Would the sort work correctly if it worked from right to left (from least significant to most significant)? Justify your answer.

No, not using the method discussed in class. When making piles with right-to-left order, each pile has a relation to every other pile; for example, every card in the 900s pile is larger than every card in every other pile. This is why right-to-left order can repeatedly subdivide the piles and still end up with sorted numbers.
Using left-to-right order to repeatedly subdivide piles would result in chaos, because there would be no strong relation between piles; for example, nothing useful can be said about the relation between the cards in the 1s and 5s pile, except that they all end in the same digit.
Yes, but using a method different from that discussed in class. After the first right-to-left pass, the individual piles would have to be re-assembled into a single pile for the second pass over the 10s digits. However, the relation between 1s piles, although weak, is still necessary: for example, after the second pass, for any 10s pile, any card in 2s pile should be before any card in the 5s pile (assuming an ascending sort). The order can be maintained by thinking about queues rather than piles. A single queue is distributed into ten queues, one for each digit. After all numbers have been distributed, each of the digit queues is re-assembled into a single queue in the appropriate order for the next pass.
The right-to-left radix sort was (and may still be in some specialized areas, such as voting) was an economically and historically significant sort.

The answers, approximately verbatim, and in no particular order:

Yes, the sort would work correctly because it sorts through decimal digit numbers.¹ It puts elements into groups such as
100s: 000-099, 100-199, ...²
10s: 00-09, 10-19, ...

So it doesn't matter because it puts elements into groups.³

1: It works for any radix, although the sort needs adjusting for a particular radix.

2: Right, but this is the first pass going right to left. This would be the last pass going from right to left.

3: But what is the meaning assigned to those groups? And is that meaning consistent with the kind of sort (ascending or descending) being performed?
No. Radix sort is doing bit comparison operations on the items within the structure.⁴ Sorting sequence is dependent upon the left-to-right characteristics of the bits each element is represented by within system memory. For example, ordering 3-digit numbers in ascending order would require 100's ordering then 10s ordering then ones ordering in standard left-to-right comparison. In right-to-left ordering items with the same 1s would group followed by same 10s grouping then 100s grouping, possibly only creating an order by their position within the sequence relative to the closest ordered grouping they were first part of.
In a right-to-left ordering

101 100

001 ⇒ 010

100 001

010 101

4: “Structure” means the items, or the sequence of items? Or something else?
No. The sort can not work correctly because from the right to left the digits will change. for example: if we go through the order should be

left → right right → left

000 100

001 010

010 100

011 110

100 001

101 011

110 101

111 111

So the most of order⁵ is different from left-to-right order.

5: The order of what?
No. By observing the comparisons of most significant digits first you are gaining extra information than just the order of that digit. The extra information is that every number whose most significant digit is less than another number of the same number of digit's most significant digit is therefore less than that number. That is key to the sorting algorithm since it is grouping the numbers into increasingly ordered groups.
If you were to look at the least significant digits first, you would not gain this information. For example, if you look at the numbers 91 and 82, you can see 82 is less than 91 just by comparing the most significant digits 8 < 9. However, if you were to look at the least significant digits, the fact that 2 > 1 has no relation to the order of 91 and 82.
No, assume have three-digit numbers.
[picture]
We start from right, get 4 0s top and 4 1s bottom. Then go to 2nd column and to 3rd column, the values in 3rd column is not ordered.
No, the sort won't work correctly.
Sorting from right-to-left (from least significant to most significant) may place same range element in different buckets which cannot be reunited again.
E.g., seq = { 0 82 2 102 1000 100 20 1082 }
[picture]
The o/p⁶ is not sorted sequence, thus will not sort correctly.

6: “o/p” means what? Output?
No radix sort would not work correctly if it worked from least significant to most significant through the digits. This is true because radix sort is designed to begin with the most significant digit, and work its way through.⁷

7: What is it about the design that makes left to right correct but right to left wrong?
No, the most efficient way to sort⁸ in radix sort is to go from left-to-right. This helps because we are sorting the most significant to the least significant. The values that are least significant on the right side would be done, once the sorts from left, as it will reach the end the values would be sorted. But doing it from the right-to-left would take more time⁹ and thus would be less efficient way to use it.

8: The question asked about correctness, not efficiency.

9: Take more time than going from left to right, I'm guessing. But why would it take more time?
The radix sort should work just as efficient¹⁰ if it were to work from right-to-left because as long as it is consistent¹¹ but only in reverse it should be able to work from least significant to the most significant (following the same sequence but only in reverse).

10: The question asked about correctness, not efficiency.

11: Consistent in what way?

left → right		right → left
000		100
001		010
010		100
011		110
100		001
101		011
110		101
111		111

Data Structures & AlgorithmsCS 305 & 503

Quiz 4, 15 December 2010

Data Structures & Algorithms
CS 305 & 503