Advanced Programming I Lecture Notes

Advanced Programming I Lecture Notes

14 March 2006 • Balancing Trees

Outline

A Question

Here are two binary search trees containing the same data.
Why are they different?

Data Insertion Order

Data-insertion order determines a tree's shape.
Tree shape influences tree operation performance.

Tree Levels

Given a tree T, a node n is at level i in T if n is i edges away from the root.
The ith level of T is the nonempty set of all nodes at level i in T.
The height h (or depth d) of T is the largest level h in T.

Tree Shape

Tree operations have good performance when each decision throws away half the data.
This occurs when each subtree has half the data.
A balanced tree has this property.
A tree T is balanced if the two child subtrees of any node in T differ in height by at most 1.

Example Tree Shapes

Each node stores the difference in its children's heights.
- The sign indicates the larger subtree, the magnitude by how much.

Getting Well-Shaped Trees

Several techniques produce well-shaped trees:
- Shuffle the data before inserting it.
  - May not be practical or always applicable.
- Ocassionally reconstruct a balanced tree.
  - Effective but possibly expensive.
- Rebalance the tree after every operation.
  - Effective, moderately expensive, complicates operations.

Rebalancing Trees

A balanced tree can be unbalanced by at most 1.
The rotation is the basic rebalancing operation

Node Rotations

A node rotation lifts nodes up in the tree.

Example Rebalancing

an example rebalancing

AVL Trees

The AVL tree is a binary search tree that's always in balance.
- Developed in 1962 by Georgii Adelson-Velsky and Yevheniy Landis.
Each unbalancing insertion or deletion restores balance as part of its operation.
Not so popular these days.
- More efficient and simpler alternatives are available.

AVL Rotations

AVL trees require four kinds of rotations.
The left-right and right-left rotations are double rotations.
- They each have three cases.

AVL Balance

At most one (possibly double) rotation is enough to restore balance after an insertion.
Restoring balance after a deletion may require O(log n) rotations.
Search is always O(log n).

Almost-Balanced Trees

Constantly maintaining balance can be expensive.
- Particularly if insertions and deletions outnumber searches.
Let the tree drift out of balance, and fix it up when it gets too out-of-balance.
- Most operations are now cheaper.
- Some operations are more expensive.
  - However, they're still O(log n).

Red-Black Trees

The red-black tree is an example of an almost-balanced tree.
- Developed in 1972 by Rudolf Bayer.
A red-black tree is a binary search tree.
- The edges of the tree are either red or black.
Red-black trees implement the STL sorted-associative containers.

The Red-Black Property

The red-black property generalizes the notion of height.
1. Every path from root to leaf has the same number of black edges.
  - All paths are roughly in balance.
2. No path contains two consecutive red edges.
  - Paths do not drift too far out of balance.
  - "Too far" in this case means "at most ×2".

Example Red-Black Tree

example red-black tree

Inserting into the left sub-tree requires rotations.
There are (potentially) two rotation-free insertions into the right subtree.

Maintaining Almost Balance

Insertion and deletion operations have to maintain the red-black propery.
- They rely on AVL-tree rotations and edge color-flipping.
The no consecutive red-edges property requires keeping track of more information.
- Parents and grandparents.
- Rotations cover larger portions of the tree.
- There are more cases to implement.

Abandoning Balance

Data access is frequently unbalanced.
- A book chapter on Italy contains a lot of "Rome" but little of "Paris".
- Seniors register for classes two days before juniors do.
Balance is unhelpful when data accesses are skewed.
- Improve performance by recognizing and exploiting unbalanced access.

Caching

Caching exploits unbalanced access.
A cache is a high-performance database.
- Both search and access are fast.
- Caches are also usually expensive and small.
Skewed accesses should be satisfied out of the cache.
- Other accesses go to the main data set (a cache miss).

A List Cache

Searching a linked list has linear cost.
- Ordering doesn't change that.
When an element is accessed, move it to the list head.
The list head is the cache.
- Skewed accesses are satisfied within a few elements.
- The cache adjusts to changes in access skew.

Caching Tree Accesses

Tree accesses can be cached by moving data to the root.
But unordered trees are just expensive lists.
- Worst case behavior rises from O(log n) to O(n).
- Extra space and complexity overhead.

Splay Trees

A splay tree is a caching binary search tree.
- Developed by Daniel Sleator and Robert Tarjan in 1985.
Splay-tree operations provide amortized O(log n) operations.
- Maintaining search-tree order has its price.

Splaying

Move-to-the-root is done by splaying operations.
- Each operation also applys splays to the element accessed.
A splaying operation is a sequence of rotations.
- Each rotation moves a node up the tree.

This page last modified on 16 March 2006.
This work is covered by a
Creative Commons License.