Generally, knowing the subtree heights in a binary tree isn’t too useful because the data are unordered, and most non-constant worst-case operations are going to be linear in the number of nodes in the tree.
Inserting: inserting a new node into a binary tree is a constant-effort operation (create a new root with the old root, if any, as a child). Exploiting subtree height while inserting data could create balanced trees, but insertion becomes more expensive (logarithmic effort) and the other operations can’t take advantage of the balance.
Searching: searching for a node in a binary tree takes linear worst-case effort, and knowing the subtree heights provides no advantage during the search. There might be some average-case advantage to searching the subtree with smaller height first, but without further information about the binary-tree structure it is unclear the advantage would offset the smaller probability of successfully searching the smaller subtree.
Deleting: deleting a node from a binary tree takes worst-case effort linear in the number of nodes in the tree to find the node, and again linear worst-case effort to remove the node if found. There may be some small advantage to finding a replacement node in the child subtree with the smaller height, but in the absence of a scheme to maintain overall balance it is unclear what the advantage may be.
The answers, in no particular order and approximately verbatim:
1: One of the values, anyway.
2: How?
3: One what? Operation?
4: What’s being implemented? Operations?
5: Why does an unbalanced tree make it harder to traverse the tree? It may take longer in an unbalanced tree to get to one leaf than another leaf, but how is it any harder?
6: True, but the new value can also be inserted as a new root.
7: Or perhaps less depth, indicating leaves closer to the root.
8: How is that an advantage in searching? And is the advantage greater than searching the subtree with less depth first?
9: Sort how? By depth or by value stored?
10: True, but maintaining balance wasn’t part of the question.
11: The ends of what?
12: Because you’re not dealing with binary search trees, wouldn’t be easier to insert a new value at the root; that requires no traversal at all.
13: What number? The depth?
14: How does insertion work in a binary tree without depth information?
15: The depth isn’t necessary to find a leaf. It is necessary to find the deepest leaf (furthest away from the root), but that’s forcing the insert to do the maximum amount of work.
16: How does that work? What’s the relation between values stored in the tree and the depths at which they’re stored?
17: But that also takes the maximum amount of work. Why not take the leaf at least depth?
The depth could be exploited by these operations because you can see how far through the tree you must go while searching.18 Therefore exploiting the depth.
When inserting a node, the node will then hold the depth of which subtree it is at.19
Also, when deleting a node you will exploit the depth because if it20 is not a leaf, then a node will be bubbled up into its place. Therefore exploiting the depth.21
18: Searching goes far enough into the tree to find what it’s looking for, then it stops. What does that have to do with depth?
19: True, but is that an advantage? If it is, how is it exploited by insert()
?
20: Presumably the node being deleted.
21: How does that exploit depth? It reads like the same algorithm used to delete a node in a tree without depth information.
Through searching, the values22 would not change and would not be exploited.23
With insertion and deletion, the values must be changed and if they are not changed after every move, the data inside each node would be wrong.24 If you did not do this all values above the new or deleted node, including the root would be off.
22: The depth or the key?
23: What about looking in the shallower subtree first?
24: True, but is there any way to exploit the depths when inserting and deleting?
For searching, the depth can be used to choose which subtree to search first. By choosing the subtree with less depth first, the method can increase its efficiency.25
For inserting, the depth would not help since the new value would be added at the top to become the new root.
For deleting, the depth can be used to choose which subtree to traverse to find the leaf fastest. Once the quickest path to a leaf is found, than its value can be hoisted up to the node to be deleted.
25: On average; in the worst case searching’s still O(n).
shows the path lengths in a binary tree. Is the binary tree balanced? Justify your answer.
The easiest way to answer this question is to draw a small balanced binary tree and count the paths. It should quickly lead to the conclusion that a balanced tree has 2i nodes with path length i, which means the histogram should look exponential:
The given histogram isn’t exponential, so the associated binary tree can’t be balanced, or can’t be well balanced.
Alternatively, you could observe that if a tree is balanced, then all leaves can be found at level i or i + 1 for some i. The three path lengths starting at the second from the left have decreasing count, which means there are leaves at each of those path lengths (or corresponding levels). Because leaves appear at more than two consecutive levels, the tree can’t be balanced.
The answers, in no particular order and approximately verbatim:
1: In height.
2: What’s the relation between path length and subtree height?
3: Is such a thing possible in a binary tree?
4: What? What does it mean to go through a path? Traverse it?
5: The absolute numbers aren’t important; it’s the relation between them that matters.
6: Which subtree? There may be two of them.
7: More than probable; it is certainty, otherwise the path length wouldn’t have been counted.
8: The children’s path lengths will be one more than the parent’s path length, which isn’t much shorter. More distant descendants will have shorter path lengths. And how do you tell the parents from the children from the histogram?
9: This is a bad way to start an answer.
10: What does it mean for leaf nodes being unequal?
11: The bars are not different enough.
12: Maybe “differences”?
13: Assuming a node has children, it’s not possible for the children’s path lengths to be different: each child has a path length one more than the parent’s path length.
14: What are roughly equal nodes?
15: Which subtree? There are a lot of them.
16: True, but one (or both) of the subtrees could be empty.
kthSmallest()
method for a binary search tree ADT accepts the integer
k
and returns the k
th smallest value in the bst. For example,
bst.kthSmallest(0)
returns the smallest value in bst
, and
bst.kthSmallest(bst.size() - 1)
returns the largest value. What traversal
would provide the simplest and most efficient implementation of
kthSmallest()
? Justify your answer. Do not be concerned with illegal
values of k
.
Starting at the root, assume the left subtree contains l nodes. If k ≤ l, then the left subtree contains the kth smallest value and the algorithm should traverse to the left subtree and start over. If k = l + 1, then the root contains the kth value and the algorithm’s done. Otherwise k > l and the right subtree contains the kth smallest value; the algorithm should recurse down the right subtree, looking for the (k - (l + 1))th smallest value. This algorithm implements an in-order traversal.
The answers, in no particular order and approximately verbatim:
kthSmallest()
due to the fact that once you traverse the left child you have an idea of all the smallest values1 then you traverse the right and find/order those values. Once you have the orders of both the left and right you have the full order except for the root which will be the middle term.2 This traversal will give you a fast access to the ordering of the children and then automatically place the root as the middle.
1: But what particular idea do I need to know about the smallest values to find the kth smallest value?
2: But the root comes after both the left and right children in a post-order traversal, so how does it get to the middle?
kthSmallest()
would be to first traverse the root node, then the left or right nodes depending on the value stored in k.3 The reason for starting at the root is that presumably k is either less than or grater than the value held by the root.4 If it is less, the ADT can traverse the left nodes in an efficient time. If it is greater, the ADT can instead traverse the right nodes. If k equals the value found at the root, the traversal is finished and the number of the traversals will be, in the best possible case, only 1.
3: Depending how? Also, what kind of traversal behaves like this?
4: k has no direct relation to the values stored in the tree; it is an ordinal position.
5: But before you check the root, don’t you have to know how many children are in the root’s left subtree? If the left subtree contains more than k children, then there’s no need to check the root.
I would say, to implement the kthSmallest()
method the most efficient is traversing through using a post order traverse. This is because the smallest value in a BST is found at the left most node, the largest is at the right most node of the tree.8 You can get to the smallest first in post order and if you want the largest value, that would be the next node to check in post order traversing.
6: From the left subtree, to the root, then to the right subtree.
7: All traversals start at the root, but they process the root at different times during the traversal.
8: Right, but kthSmallest()
isn’t interest in the largest or smallest value, just the first or last value.
kthSmallest()
would be to go down the left most subtree and work up to the root.10 Then go down the right subtree also on the leftmost side since everything to the left is smaller than the right. So for example: where the # inside the node represents the data value stored and it would represent k in the kth smallest value. It is checking the nodes in the form of LNR. Starting from the left, then checking the node, and moving to the right.
9: What does the relative sizes of the node values have to do with anything?
10: Why work up to the root? What kind of work is needed?
11: The kthSmallest()
method doesn’t change the tree; why is maintaining the BST property important?
12: By what information? k is an ordinal number, not a value stored in the tree.
kthSmallest()
. We talked in class that this was the fastest traversal13 and since k could be any value within the tree, it would be best to start at the root and work our way down.14 This way it would only be long if “k” was the very least or most value in the tree.
13: For searching, but this is a different operation.
14: What useful can be done at the root before knowing how many values each child has?
15: The smallest values, relative to the root, are only on the left; the larger (or no smaller) values are on the right.
merge()
is a common operation to include in a heap ADT. h1.merge(h2)
adds the values in h2
to h1
. Assuming heaps are implemented using
arrays, provide a worst-case asymptotic analysis of the heap merge algorithm.
You should assume the heap parameter (h2
in the example call) is left
unchanged by the merge.
merge()
can be implemented as
void merge(heap h) for i = 0 to h.size() - 1 add(h.elements[i])
Adding and element to a heap takes O(log n) worst case. N elements are added to the heap, for a total of O(n log n) work, where n can be the sum of the sizes of h1
and h2
.
The answers, in no particular order and approximately verbatim:
h2
to. Therefore, in a dynamic array implementation, a new larger array would have to be created. In this would generate O(n2) because all the values from h1
must be copied from the old array and inserted into the new array, as well as adding the values from h2
.1
1: But is it resonable to assume that every insertion into h1
will resize the array?
h2
to h1
you will still go through the array and touch each element one time, just as you would with h1
. h1
was already O(n)2 already so O(n) + O(n) = O(n).
2: What is it that is O(n) in h1
?
h2
to h1
can possibly cause more work.3
3: More work than what? And how much more work?
merge()
would add the entire h2
heap to a rear of h1
. Any leaf will do, therefore only half of the h1
heap would be traversed to find a leaf.4 The entire h2
heap would then be bubbled-up to maintain the heap property. Since half of the original heap can be ignored for this merge, the worst-case asymptotic analysis would be O(log n).5
4: Why would you need to traverse the heap at all? The next available leaf is at the end of the occupied array elements.
5: Per new leaf, or for all leaves?
6: True, but how much work gets done on each traversal through the heap?
h2
to h1
. Since we are doing a worst case analysis I am going to assume the array for h1
becomes full and we are going to have to make a new array and adding all the values from the old array as well as the new values from the array being added. This causes merge()
to have linear behavior or O(n).7
7: How much work gets done as each value is added to the heap?
8: Per value added, or in total?
merge()
is that once h2
is added to h1
, the order of the heap will require resorting. This will require rotating and searching the heap in order to reobtain balance. From a worst case standpoint, the most iterations done by the algorithm to balance out the effects of the merge would be O(n), where n = h1 + h2.9
9: And how much work is done per iteration?
This page last modified on 15 April 2010.