Computer Algorithms II Lecture Notes

27 November 2007 • Balancing Trees

Outline

Tree Shape
Rebalancing Trees
- AVL Trees
- Red-Black Trees
Caching
- Splay Trees

a balanced tree

A Question

Here are two binary search trees containing the same data.
Why are they different?

Data Insertion Order

Data-insertion order determines a tree's shape.
Tree shape influences tree operation performance.

Tree Levels

A node n is at level i in a tree T if n is i edges away from the root.
The ith level of T is the nonempty set of all nodes at level i in T.
A node's depth d is the node's level.
A tree's height h is the largest level in the tree.

Tree Shape

Tree operations have good performance when each decision throws away half the data.
- This occurs when each subtree has half the data.
This property is captured by the notion of balance.
A tree is balanced if any two sibling subtrees differ in height by at most 1.

Examples

how trees balance

Representing Balance

For efficiency, each node maintains balance information for its subtree.
Balance information can be an integer down to no overhead (pointer encoding).
Each node stores the difference in its children's heights.
- The sign indicates the larger subtree, the magnitude by how much.

Getting Balanced Trees

Several techniques produce balanced trees:
- Shuffle the data before inserting it.
  - May not be practical or always applicable.
- Ocassionally reconstruct a balanced tree.
  - Effective but possibly expensive.
- Rebalance the tree after every operation.
  - Effective, moderately expensive, complicates operations.

Tree Balance

A balanced tree can be unbalanced by at most 1.
Shifting a node from one child to the other restores balance.
- For some interpretation of “shifting”.

Node Rotations

A node rotation lifts nodes up in the tree.
The rotation is the basic rebalancing operation

Tree Rebalancing

An insertion or deletion can unbalance the whole tree.
- But only by one.
A rotation restores balance locally.
- Walk back to the root, rotating into balance as needed.
This does logarithmic work per insertion or deletion.

Example Rebalancing

an example rebalancing

AVL Trees

The AVL tree is a binary search tree that's always in balance.
- Developed in 1962 by Georgii Adelson-Velsky and Yevheniy Landis.
Each unbalancing insertion or deletion restores balance as part of its operation.
Not so popular these days.
- More efficient and simpler alternatives are available.

AVL Rotations

AVL trees require four kinds of rotations.
The left-right and right-left rotations are double rotations.
- They each have three cases.

AVL Balance

At most one (possibly double) rotation is enough to restore balance after an insertion.
Restoring balance after a deletion may require O(log n) rotations.
Search is always O(log n).

Almost-Balanced Trees

Constantly maintaining balance can be expensive.
- Particularly if insertions and deletions outnumber searches.
Let the tree drift out of balance, and fix it up when it gets too out-of-balance.
- Most operations are now cheaper.
- Some operations are more expensive.
  - However, they're still O(log n).

Red-Black Trees

The red-black tree is an example of an almost-balanced tree.
- Developed in 1972 by Rudolf Bayer.
A red-black tree is a binary search tree.
- The edges of the tree are either red or black.
Red-black trees implement the STL sorted-associative containers.

The Red-Black Property

The red-black property generalizes height.
1. Every path from root to leaf has the same number of black edges.
  - All paths are roughly in balance.
2. No path contains two consecutive red edges.
  - Paths do not drift too far out of balance.
  - "Too far" in this case means "at most ×2".

Example Red-Black Tree

example red-black tree

Inserting into the left sub-tree requires rotations.
There are (potentially) two rotation-free insertions into the right subtree.

Maintaining Almost Balance

Insertions and deletions must maintain the red-black propery.
- Using tree rotations and edge color-flipping.
The no consecutive red-edges property requires keeping track of more information.
- Parents and grandparents.
- Rotations cover larger portions of the tree.
- There are more cases to implement.

Abandoning Balance

Data access is frequently unbalanced.
- A book chapter on Italy contains a lot of “Rome” but little of “Paris”.
- Seniors register for classes two days before juniors do.
Balance is unhelpful when data accesses are skewed.
- Improve performance by recognizing and exploiting unbalanced access.

Unbalanced Trees

balanced vs unbalanced bsts

With unknown (or random) access patterns, balance is best.
With biased access patterns, unbalance the tree to match the bias.
- But what if the access bias changes?

Caching

Caching exploits unbalanced access.
A cache is a high-performance database.
- Both search and access are fast.
- Caches are also usually expensive and small.
Skewed accesses should be satisfied out of the cache.
- Other accesses go to the main data set (a cache miss).

A List Cache

Searching a linked list has linear cost.
- Ordering doesn't change that.
When an element is accessed, move it to the list head.
The list head is the cache.
- Skewed accesses are satisfied within a few elements.
- The cache adjusts to changes in access skew.

Caching Tree Accesses

Tree accesses can be cached by moving data to the root.
But unordered trees are just expensive lists.
- Worst case behavior rises from O(log n) to O(n).
- Extra space and complexity overhead.

Splay Trees

A splay tree is a caching binary search tree.
- Developed by Daniel Sleator and Robert Tarjan in 1985.
Splay-tree operations provide amortized O(log n) operations.
- Maintaining search-tree order has its price.

Splaying

Move-to-the-root is done by splaying operations.
- Each operation also appliess splays to the element accessed.
A splaying operation is a sequence of rotations.
- Each rotation moves a node up the tree.

Summary

Ordinary binary-search tree performance is data dependent.
Self balancing trees guarantee performance.
- At an implementation-complexity cost.
Sometimes “almost” is good enough.
- Particularly if it's much less expensive.
The caching heuristic is valuable; recognize and exploit it.

Credits

Under this tree, I took a nap by Marcus Derencius, AT CC licensed.

This page last modified on 26 November 2007.
This work's CC license.