Advanced Programming I Lecture Notes
15 March 2007 •
N
-Way Trees
Outline
External Storage
Node-Page Mapping
B-Trees
Searching B Trees
B-Tree Insertions
B-Tree Deletions
B-Tree Variations
Tree Data Storage
Balanced trees are a good way to store data.
Lots of data is kept close to hand.
“Close to hand” means “within
O
(log
n
) access.”
What happens with lots and lots of data?
More data than can fit into primary store?
How do trees and external storage mix?
The answer? Databases.
External Storage
External storage is larger, slower, and cheaper than primary storage.
By orders of magnitude.
Individual external storage devices are idiosyncratic.
For any interpretation of “individual.”
Successful tree-based external-storage algorithms are complex, custom-build, and fragile.
Assumptions
A generic disk represents external storage.
A disk is an array of pages.
Minimal time is an essential characteristic.
That means minimal disk accesses.
Practically anything can be traded-off against disk accesses.
The trees should be balanced for good access.
Trees and Disks
How to map a half-billion node binary search tree to disk?
Try the obvious: one node per disk page.
Pointers turn into disk-page indices.
This works, but not too well.
Each disk page is almost empty.
Too many disk accesses.
Potentially 29 for a half-billion nodes.
Node-Page Mapping
Store many nodes per page.
This turns the binary tree into an
n
-way tree.
Reducing the height of the tree.
log
128
2
29
≈ 5.
Page Nodes
Grouping binary-tree nodes in a page is inefficient.
Treat each page node as an alternating sequence of child links and keys.
N
is the page node's
order
.
A binary tree has order 1 nodes.
Searching Page Nodes
The binary search tree property extends to page nodes.
The values in child
C
i
are at most
K
i
.
The values in child
C
i
+ 1
are at least
K
i
.
Balance
Good access performance also requires balanced trees.
Rotation-based balance is too expensive.
Too many pages are involved.
Multi-key rotations are complex.
Different balancing schemes give rise to different disk-based trees.
B-Trees
An
order n B-tree
is a page-node tree such that
Every node has at most
n
keys.
Almost every node at least
n
/2 keys.
The root may have fewer keys.
All leaf nodes appear at the same level.
B-Trees were developed by Bayer and McCreight in 1972.
Example B-Tree
An example order-4 B-tree.
Searching B Trees
If key
k
equals
k
i
for some node, stop.
Otherwise find
k
i
- 1
<
k
<
k
i
and check out the node at
C
i
.
The node search can be linear or a binary search for large order B trees.
B-Tree Insertions
Insert 10 and then 17.
Inserting into B Trees
Find the apropriate leaf node.
If the leaf has less than
n
keys, add the new key.
If the leaf is full,
Add a new leaf to the tree.
Move half the keys from the full leaf to the new leaf.
Insert the median key from the old and new leaves into their parent.
B-Tree Deletions
Delete 10, then 11.
Leaf-node deletions are easy.
Interior-node deletions follow the binary-tree pattern.
Deleting in Minimal Nodes
The minimum-key property complicates deletions.
The solution is to
shift keys from well-endowed neighbors, or
merge two impoverished pages into one.
Deleting Examples
Delete 13 from
B-Tree Variations
There are lots of B-tree variations to exploit important but niche system characteristics.
The
B+ tree
is a general-purpose extension of B trees.
All data are in the leaves.
Internal nodes contain copies of leaf keys.
The leaves form a doubly-linked list.
B* trees are an earlier variant of B+ trees.
Primary-Store B Trees
B-tree concepts can be usefully applied to trees in primary store too.
Exploit that nice, simple balancing behavior.
The order-2 B tree (a.k.a the binary tree).
Consider order-3 B trees.
Each node can contain one or two keys.
And have zero, two or three children.
These trees are known as
2-3 trees
.
2-3 Tree Nodes
A 2-3 tree node can have one or two keys.
2-3 tree nodes can be implemented with binary-tree nodes.
An extra bit marks the horizontal edge.
2-3-4 Trees
A
2-3-4 tree
is an order-4 B tree.
Each node has two, three, or four keys.
2-3-4 tree nodes can be simulated by binary tree nodes.
2-3-4 Trees are strongly related to red-black trees.
The red edges in red-black trees correspond to the horizontal edges in binary implementation of 2-3-4 trees.
Points to Remember
Adding external storage always changes things.
Grouping nodes into pages leads to B trees.
Almost all databases are based on B+ trees.
File systems tool
Primary-store B trees are interesting too.
But not that important, relatively.
References
Multiway Trees by Donald Knuth, Section 6.4.2 of
The Art of Computer Programming
, Vol. 3: Sorting and Searching, Addison Wesley, 1973.
The Ubiquitous B-Tree
by Doug Comer,
ACM Computing Surveys
, June, 1979.
B Trees by Donghui Zhang in
Handbook of Data Structures and Applications
, Dinesh Metha and Sartaj Sahni editors, Chapman & Hall/CRC, 2005.
This page last modified on 8 January 2006.
This work is covered by a
Creative Commons License
.