B-trees
Balanced-tree algorithm: packs much more keys/pointers into a single node than a binary tree to make a flat tree and avoid disk seeks as much as possible.
1.Goal:
- Access a vast amount of information
- External search in symbol tables that are kept on a disk or on the web
2.Cost model:
- Page: a contiguous block of data
- Page access: the number of time a page is accessed, for read and write.
- Probe: the first access to a page. Accessing a page involves reading its contents into local memory. A page could be a file or web page.
- Goal of the algorithm: use a small number of probes to find any given key.
3.Nodes and link-pairs:
- M: root can have [2, M - 1] link-pairs, other node can have [M/2, M-1] link-pairs, by convention it's an even number
- Root node
- Internal nodes, which associate copies of keys with pages
External nodes, which have references to the actual data
*
is a special key, known as sentinel, it is defined to be less than all other keys.
4.Search
private boolean contains(Page h, Key key) {
if (h.isExternal()) return h.contains(key);
return contains(h.next(key), key);
}
5.Insert
public void add(Key key) {
put(root, key);
if (root.isFull()) {
Page lefthalf = root;
Page righthalf = root.split();
root = new Page(false);
root.put(lefthalf);
root.put(righthalf);
}
}
public void add(Page h, Key key) {
if (h.isExternal()) {
h.put(key);
return;
}
Page next = h.next(key);
put(next, key);
if (next.isFull()) h.put(next.split());
next.close();
}
6.Overhead
time: O(log_M^N) - O(log_(M/2)^N)
, because of the link-pairs of internal nodes
example: let M = 1000, N = number of items. B-tree height = 4 if N = 62.5 billion
space: (of interest in practical applications) M*ln2 (from Algorithms p874)
key waste a lot of space if nodes have M/2 links
code
https://github.com/nyannko/leetcode-java/blob/master/src/ds/BTree/BTree.java