CS 512 Homework 2

Homework 2 - Search Trees

CS 512, Algorithm Design, Spring 2000

Devise an algorithm that accepts a binary search tree T and returns a binary search tree of smallest height equivalent to T. Two binary search trees are equivalent if the operators search(), minimum(), maximum(), predecessor(), and successor() returns the same key values for both trees when called with the same key value. Explain why your algorithm's correct.

Let L be the in-order list of keys from the original binary search tree T. Construct a new binary search tree T' by calling reconstruct(L):

reconstruct(L)

  if size(L) = 0, return nil

  m <- floor(size(L)/2)
  n <- new node
  n->key = L[m]
  m->left = reconstruct(L[0 .. m - 1])
  m->right = reconstruct(L[m + 1 .. size(L) - 1])

  return m

We need to show that T' is equivalent to T and that T' has minimum height.

To show equivalence, let L be the in-order list of keys (k₁, k₂, ..., k_n) from a binary search tree. search(k) returns true if k is in L and false otherwise; minimum() returns k₁ and maximum() returns k_n; predecessor(k_i) returns k_{i - 1} if i > 1 successor(k_i) returns k_{i + 1} if i < n. From these facts comes the observation that that two binary search trees are equivalent if they both produce the same in-order list of keys.

If size(L) < 2, reconstruct(L) returns a tree that produces L as its in-order list.

Assume that the in-order list produced by the tree returned by reconstruct(L) is identical to L when size(L) < i for some i >= 1. Consider the in-order list produced by the tree T' returned by reconstruct(L) when size(L) = i.

With m = floor(size(L)/2), the left subtree of T' was built with a recursive call to reconfigure() with a list containing less than i keys. Because reconfigure() doesn't change subtrees once created, the assumption holds and left subtree of T' produces the in-order list of keys L[1..m - 1]. A similar argument holds for the right subtree of T'. Because both subtrees of T' produce matching in-order lists, the in-order traversal of T' matches the list L, and T' is equivalent to the original tree T.

Now for minimal height. A binary tree of n nodes has minimum height when it can't be reconfigured as a binary tree with the same number of nodes and a smaller height. An almost complete binary tree of height h contains a minimum of 2^h nodes (because a complete binary tree of height h - 1 contains 2^{h - 1 + 1} - 1 nodes), so if a binary tree of height h contains at least 2^h nodes, it must be minimal.

We need to show that reconstruct(), when called with a list of length l, builds a binary tree of height h such that 2^h <= l.

If the list contains a single key, then reconstruct() returns a one-node tree of height 0, and 2⁰ = 1. Assume that, for any list of length l no more than i >= 1, reconstruct() returns a binary tree of height h such that 2^h <= l. Now consider what happens when reconstruct() is called with a list of length i + 1.

There are two cases to consider. If i + 1 is odd, then reconstruct() forms two subtrees of floor((i + 1)/2) nodes and of height h. Because floor((i + 1)/2) no greater than i, the assumption holds and 2^h <= floor((i + 1)/2). From this we get

2^h + 2^h <= floor((i + 1)/2) + floor((i + 1)/2)

2^{h + 1} <= 2floor((i + 1)/2)

< 2floor((i + 1)/2) + 1

< i + 1

That is, the assumption holds when i + 1 is odd.

When i + 1 is even, one of the two subtrees formed by reconfigure() will have one node less than the other; without loss of generality, let the left subtree be the smaller of the two subtrees. The number of nodes in the right subtree, n_r, is less than i + 1 and the assumption holds to give 2^h <= n_r where h is the height of the right subtree. The right subtree has one more node than the left subtree, which gives

2^h + 2^h <= n_r + n_r

2^{h + 1} <= i + 1

and the assumption holds when i + 1 is even.

The assumption holds when i + 1 is odd or even, and so holds always and reconfigure() produces minimal binary trees.

The RB-Insert() algorithm in Section 14.3 (page 268) uses both p[x] and p[p[x]] without checking to see if they're defined. Explain why the algorithm can safely do this. Remember to consider all possible cases for x in your explanation.
At first glance it seems that this question is easily answered by considering the levels in a red-black tree at which node x could be. If x is at level two or higher, then its grandparent - which will be at level 0 or higher - exists and p[p[x]] is defined. If x is at level 0, then x is the root and the loop terminates without accessing p[p[x]].
The case when x is at level 1 seems equally as easy as the other cases: using the property that the root of a red-black tree is always black, x has a parent - the root - and the root is black, so the test color[p[x]] = red fails and p[p[x]] is never accessed.
Unfortunately, the black-root assumption is not as obvious as it seems. There are two ways x could be at level 1: it could have been inserted as a leaf at level 1, or it could have been inserted at a higher level and pushed up the tree towards the root. If x was inserted at level 1, then the black-root assumption holds because the tree is otherwise unchanged. In pushing x up the tree, however, note that lines 9 and 15 in RB-Insert() set x's grandparent's color to red. To show that RB-Insert() is correct, we have to show that x can never be pushed to level 1 with a red root.
Because RB-Insert() reaches no higher in the tree than the grandparent, the root's color doesn't change when x is at level 3 or higher; moving x up to level 1, if it ever happens, preserves the black-root assumption, causing the loop to terminate without accessing p[p[x]]. If x is at level 2, its grandparent is the root, which may be set to red; the analysis of this case has to be more detailed.
There are three cases to consider when x is at level 2 (there are actually six cases, but the other three are symmetric with these three). Following the book's terminology, in case 1, lines 7 to 10 in RB-Insert(), the root is set to red, but x is pushed to the root, so the loop terminates without accessing p[p[x]]. In case 3, lines 14 to 16 in RB-Insert(), the root is set to red, but is right-rotated out of the root, being replaced by x's parent, which is colored black. x is at level 1 and the black-root assumption holds; the loop terminates without accessing p[p[x]]. In case 2, x is set to its parent, then left-rotated into case 3, which operates correctly by the previous argument.
What are the largest and smallest number of keys that can be stored in a B-tree of height h and degree t? Explain your answer.
The root can have at most 2t - 1 keys, which leads to 2t children, each of which can have at most 2t - 1 keys, and so on. The total number of keys in this case is

sum(i = 0 to h, (2t)ⁱ(2t - 1)) = (2t - 1)sum(i = 0 to h, (2t)ⁱ)

= (2t - 1)(((2t)^{h + 1} - 1)/(2t - 1))

= (2t)^{h + 1} - 1

Any non-root node must have at least t - 1 keys, leading to t children, each of which must have at least t - 1 keys, and so on. The total number of keys in this case is

sum(i = 0 to h - 1, tⁱ(t - 1)) = (t - 1)sum(i = 0 to h - 1, tⁱ)

= (t - 1)((t^{h - 1 + 1} - 1)/(t - 1))

= t^h - 1

The root can have a minimum of one key, leading to two subtrees, and the number of keys in this case is 2(t^h - 1) + 1 = 2t^h - 1.
Given the red-black tree T, insert x into T and then delete x from T. Under which situations is the original tree identical to the final tree?
The easy cases first: if the tree is empty or contains one node, then inserting and deleting a node leaves the tree unchanged. If RB-Insert() doesn't execute the while loop, then it has inserted a red leaf node, and RB-Delete() deletes red leaf nodes without further changes to the tree, leaving the tree in its original state.
Now it gets harder. On the first iteration through the RB-Insert() while loop, case 1 leaves the new node a red leaf, as does case three. Case 2, however, rotates the new node into the interior of the tree, and from there case 3 rotates the new node one level higher into the tree and changes its color to black. Note also that all three cases make changes to the tree in excess of adding a new node.
The second and third insertion cases terminate the while loop; only the first case leads to further iterations. Each iteration of case 1 moves up to the grandparent of the current node (node x in RB-Insert()), so all three insertion cases leave the inserted node in the state it had after the first iteration.
If RB-Insert() makes one or more iterations through the while loop and end with the inserted node being a red leaf, then RB-Delete() will just delete the inserted node without making any other changes, the the final tree will be different from the original tree because of the changes made by the three RB-Insert() cases.
The last case to consider occurs when the inserted node is black and rotated up into the tree interior by RB-Insert() cases 2 and 3. In this case, the new node will have two children, the left its parent and the right its grandparent. Because the new node has two children, it will be replaced by either the predecessor in the left child or successor in the right child when deleted. If the predecessor is used, the final tree is different because the grandparent is in a different place in the original and final trees. Replacing the inserted node by its successor has the same problem, except when the right child is a single node. Further, the right child of the inserted node is red, so it can adopt the extra 1 for black height held by the deleted node. If RB-Insert() made one iteration of the while loop when inserting the node, and so didn't change any more of the tree, the final tree will be the same as the initial tree.

This page last modified on 23 February 2000.

2^h + 2^h	<=	floor((i + 1)/2) + floor((i + 1)/2)
2^{h + 1}	<=	2floor((i + 1)/2)
	<	2floor((i + 1)/2) + 1
	<	i + 1

sum(i = 0 to h, (2t)ⁱ(2t - 1))	=	(2t - 1)sum(i = 0 to h, (2t)ⁱ)
	=	(2t - 1)(((2t)^{h + 1} - 1)/(2t - 1))
	=	(2t)^{h + 1} - 1

sum(i = 0 to h - 1, tⁱ(t - 1))	=	(t - 1)sum(i = 0 to h - 1, tⁱ)
	=	(t - 1)((t^{h - 1 + 1} - 1)/(t - 1))
	=	t^h - 1