Lecture Notes for Computer Algorithms I

9 December 2003 - Binary Search Trees


  1. An ordered linked-list is a doubly linked-list in which each element is no smaller than its predecessor (if any) and no larger than its successor (if any).

    1. This is ascending order; there's also descending order.

  2. What kind of tree do you get if you fold an ordered linked list?

    1. The predecessors are in the left subtree, and the successors are in the right subtree.

    2. The values in the left subtree (if any) are no larger than the value in the root.

    3. The values in the right subtree (if any) are no smaller than the value in the root.

    4. A binary tree with this property is known as a binary search tree (or bst).

    5. Note that a binary search tree is defined only by the relation between node values.

      1. It says nothing about the shape of the binary tree.

      2. This will cause us problems.

  3. With a meaningful relation between the values in a bst, traversals can be more efficient than exhaustive search.

    1. But, unfortunately, that's not enough to ensure efficient traversals.

    2. And, unfortunately, bst manipulations become more complicated and expensive.

    3. find()

      bool find(v)
      
        node p, c = find'(v, &psuedo-root, pseudo-root.right)
      
        return c != null
      
      
      node, node find'(value v, node p, node c)
      
        if c == null or c.v == v
          return p, c
        if c.v < v
          return find'(v, c, c.right)
        else
          return find'(v, c, c.left)
      

    4. add()

      void add(value v)
      
        node parent, child = find'(v, &pseudo-root, pseudo-root.left)
      
        if (child == null)
          if (parent == &pseudo-root) or (parent.v > v)
            parent.left = new node(v)
          else
            parent.right = new node(v)
      

    5. remove()

      void remove(value v)
      
        node parent, child = find(v, &pseudo-root, pseudo-root.left)
      
        if (child != null)
          remove_child(parent, child)
      
      
      void remove_child(node parent, node child)
      
        if (parent->left == child) 
          parent->left = remove_root(child)
        else
          parent->right = remove_root(child)
      
      
      node remove_root(node root)
      
        if root.left != null
          root.v = find_max(root, root.left)
        else if root.right != null
          root.v = find_min(root, root.right)
        else 
          delete root
          root = null
      
        return root
      
      
      value get_max(node p, node c)
      
        assert p != null and (p.left == c or p.right == c)
        assert c != null
      
        if c.right != null
          return get_max(c, c.right)
        else
          v = c.v
          remove_child(p, c)
          return v
      
      
      value get_min(node * p, node * c)
      
        assert p != null and (p.left == c or p.right == c)
        assert c != null
      
        if c.left != null
          return get_min(c, c.left)
        else
          v = c.v
          remove_child(p, c)
          return v
      

      See the complete code.

  4. The problem of the middle becomes crucial for efficient (logarithmic) traversals.

    1. A folded list is balanced - the difference in height between any two sibling subtrees is at most one.

      1. Making a left-right decision throws away half the tree.

    2. Adding and removing values from a binary search tree doesn't preserve balance.

      1. At worst, the tree could end up looking like a list for linear behavior.

    3. Maintaining balance in the presence of adding and removing values is a hard (but solved) problem.

      1. There are three general approaches (enable Java before visiting for the full effect):

        1. Always be in balance. AVL trees are an example. Complicated to implement, but performance is within a constant factor of optimal (logarithmic).

        2. Don't get too far out of balance. Red-black trees are an example. Less complicated to implement than approach 1, but performance may be suboptimal by a bounded amount.

        3. Get arbitrarily unbalanced on insert and deletes, but eventually drift back into balance on finds. Splay trees are an example. Less complicated to implement than approach 2, but performance may be arbitrarily suboptimal and unpredictable.


This page last modified on 9 December 2003.