Advanced Programming I Lecture Notes
14 March 2006 • Balancing Trees
Outline
A Question
Here are two binary search trees containing the same data.
Why are they different?
Data Insertion Order
Data-insertion order determines a tree's shape.
Tree shape influences tree operation performance.
Tree Levels
Given a tree
T
, a node
n
is at
level
i
in
T
if
n
is
i
edges away from the root.
The
ith
level of
T
is the nonempty set of all nodes at level
i
in
T
.
The
height
h
(or
depth
d
) of
T
is the largest level
h
in
T
.
Tree Shape
Tree operations have good performance when each decision throws away half the data.
This occurs when each subtree has half the data.
A balanced tree has this property.
A tree
T
is
balanced
if the two child subtrees of any node in
T
differ in height by at most 1.
Example Tree Shapes
Each node stores the difference in its children's heights.
The sign indicates the larger subtree, the magnitude by how much.
Getting Well-Shaped Trees
Several techniques produce well-shaped trees:
Shuffle the data before inserting it.
May not be practical or always applicable.
Ocassionally reconstruct a balanced tree.
Effective but possibly expensive.
Rebalance the tree after every operation.
Effective, moderately expensive, complicates operations.
Rebalancing Trees
A balanced tree can be unbalanced by at most 1.
The rotation is the basic rebalancing operation
Node Rotations
A
node rotation
lifts nodes up in the tree.
Example Rebalancing
AVL Trees
The
AVL tree
is a binary search tree that's always in balance.
Developed in 1962 by Georgii Adelson-Velsky and Yevheniy Landis.
Each unbalancing insertion or deletion restores balance as part of its operation.
Not so popular these days.
More efficient and simpler alternatives are available.
AVL Rotations
AVL trees require four kinds of rotations.
The left-right and right-left rotations are double rotations.
They each have three cases.
AVL Balance
At most one (possibly double) rotation is enough to restore balance after an insertion.
Restoring balance after a deletion may require
O
(log
n
) rotations.
Search is always
O
(log
n
).
Almost-Balanced Trees
Constantly maintaining balance can be expensive.
Particularly if insertions and deletions outnumber searches.
Let the tree drift out of balance, and fix it up when it gets too out-of-balance.
Most operations are now cheaper.
Some operations are more expensive.
However, they're still
O
(log
n
).
Red-Black Trees
The
red-black tree
is an example of an almost-balanced tree.
Developed in 1972 by Rudolf Bayer.
A red-black tree is a binary search tree.
The
edges of the tree
are either red or black.
Red-black trees implement the STL sorted-associative containers.
The Red-Black Property
The
red-black property
generalizes
the notion of height.
Every path from root to leaf has the same number of black edges.
All paths are roughly in balance.
No path contains two consecutive red edges.
Paths do not drift too far out of balance.
"Too far" in this case means "at most ×2".
Example Red-Black Tree
Inserting into the left sub-tree requires rotations.
There are (potentially) two rotation-free insertions into the right subtree.
Maintaining Almost Balance
Insertion and deletion operations have to maintain the red-black propery.
They rely on AVL-tree rotations and edge color-flipping.
The no consecutive red-edges property requires keeping track of more information.
Parents and grandparents.
Rotations cover larger portions of the tree.
There are more cases to implement.
Abandoning Balance
Data access is frequently unbalanced.
A book chapter on Italy contains a lot of "Rome" but little of "Paris".
Seniors register for classes two days before juniors do.
Balance is unhelpful when data accesses are skewed.
Improve performance by recognizing and exploiting unbalanced access.
Caching
Caching
exploits unbalanced access.
A
cache
is a high-performance database.
Both search and access are fast.
Caches are also usually expensive and small.
Skewed accesses should be satisfied out of the cache.
Other accesses go to the main data set (a
cache miss
).
A List Cache
Searching a linked list has linear cost.
Ordering doesn't change that.
When an element is accessed, move it to the list head.
The list head is the cache.
Skewed accesses are satisfied within a few elements.
The cache adjusts to changes in access skew.
Caching Tree Accesses
Tree accesses can be cached by moving data to the root.
But unordered trees are just expensive lists.
Worst case behavior rises from
O
(log
n
) to
O
(
n
).
Extra space and complexity overhead.
Splay Trees
A
splay tree
is a caching binary search tree.
Developed by Daniel Sleator and Robert Tarjan in 1985.
Splay-tree operations provide amortized
O
(log
n
) operations.
Maintaining search-tree order has its price.
Splaying
Move-to-the-root is done by splaying operations.
Each operation also applys splays to the element accessed.
A
splaying operation
is a sequence of rotations.
Each rotation moves a node up the tree.
This page last modified on 16 March 2006.
This work is covered by a
Creative Commons License
.