B-trees

Balanced-tree algorithm: packs much more keys/pointers into a single node than a binary tree to make a flat tree and avoid disk seeks as much as possible.

1.Goal:

  • Access a vast amount of information
  • External search in symbol tables that are kept on a disk or on the web

2.Cost model:

  • Page: a contiguous block of data
  • Page access: the number of time a page is accessed, for read and write.
  • Probe: the first access to a page. Accessing a page involves reading its contents into local memory. A page could be a file or web page.
  • Goal of the algorithm: use a small number of probes to find any given key.
  • M: root can have [2, M - 1] link-pairs, other node can have [M/2, M-1] link-pairs, by convention it's an even number
  • Root node
  • Internal nodes, which associate copies of keys with pages
  • External nodes, which have references to the actual data

  • * is a special key, known as sentinel, it is defined to be less than all other keys.

4.Search

private boolean contains(Page h, Key key) {
    if (h.isExternal()) return h.contains(key);
    return contains(h.next(key), key);
}

5.Insert

public void add(Key key) {
    put(root, key);
    if (root.isFull()) {
        Page lefthalf = root;
        Page righthalf = root.split();
        root = new Page(false);
        root.put(lefthalf);
        root.put(righthalf);
    }
}

public void add(Page h, Key key) {
    if (h.isExternal()) {
        h.put(key);
        return;
    }

    Page next = h.next(key);
    put(next, key);
    if (next.isFull()) h.put(next.split());
    next.close();
}

6.Overhead

time: O(log_M^N) - O(log_(M/2)^N), because of the link-pairs of internal nodes

example: let M = 1000, N = number of items. B-tree height = 4 if N = 62.5 billion

space: (of interest in practical applications) M*ln2 (from Algorithms p874)

key waste a lot of space if nodes have M/2 links

code

https://github.com/nyannko/leetcode-java/blob/master/src/ds/BTree/BTree.java

results matching ""

    No results matching ""