Ayende @ Rahien

Refunds available at head office

Voron Performance, let go off that tree, dude!

B-Trees has a few really awesome properties, chief among them is that they are playing very well with the hierarchical nature of data access that we have in our systems (L1 – L3 caches, main memory, disk, etc). Another really nice property is that searches in a B-Trees have very low costs.

However, it became apparent to us during performance tests that actually searching the tree was very costly. Assuming that we have the following pseudo code:

   1: for i in range(1, 1000 * 1000):
   2:    add(i, "val")

What is the cost of actually doing this? Well, an interesting tidbit about this is that every time we add an item to the tree, we need to search it, so we can find where in the tree that new value goes.

Cost of search the tree, based on size:

Number of entries

Cost

1

0.19

1,000

1.93

10,000

2.57

25,000

2.83

50,000

3.02

100,000

3.21

250,000

3.47

500,000

3.66

1,000,000

3.86

2,000,000

4.05

The cost, by the way, is O(log36 (N+1)). The log36 comes from the number of entries that we can fit in a single page. This cost ignores the actual search inside a page. But it is a good enough approximation for our needs.

Now, the cost of actually inserting 1 million items is the sum of all of those costs. Which means that the cost for 1 million is 3,576,242.35. This is another example of Schlemiel the Painter algorithm.

What we did was introduce a small cache, which remember the last few pages that we inserted to. That turned the cost down from searching the tree to checking the cache, where we can usually find the value, and gave a nice performance boost.

Comments

Rafal
12/18/2013 10:05 AM by
Rafal

Well, it's much cheaper to push items on stack or add to a linked list. But if you want that data to be indexed,you need to maintain the index. There's no way to do that in constant time. Poor Shlemil can't carry such huge bucket of paint everywhere on his back.

Rafal
12/18/2013 10:10 AM by
Rafal

... but you gave him a nice little can of paint that he can carry everywhere and refill when it's empty. (after realizing i missed the last sentence)

Ayende Rahien
12/18/2013 10:18 AM by
Ayende Rahien

Rafal, Yes, that is pretty much it. By not having to check the full tree very often, we save a lot of back & forth that would get the same result.

Infinitas
12/18/2013 12:10 PM by
Infinitas

I am impressed by Mehdi Gholam's MGIndex in Raptor found here: http://www.codeproject.com/Articles/316816/RaptorDB-The-Key-Value-Store-V2 Would love to hear your thoughts on this index design vs b-trees. An interesting high performance index in c#.

Kirill
12/18/2013 12:20 PM by
Kirill

CCB+ Tree in action =).

alex
12/18/2013 12:23 PM by
alex

@Ayende have you considered reusing "the cursor" from the last active transaction as part of a caching strategy? This appears to be especially useful in case inserts or searches are sequential.

Ayende Rahien
12/18/2013 12:31 PM by
Ayende Rahien

Alex, This is effectively what I am doing.

Lex Lavnikov
12/18/2013 06:59 PM by
Lex Lavnikov

Why does the cache remember only last pages to insert to?

It has to remember last most accessed pages. Even in case of bulk inserts, the insert location is not guaranteed to be the same.

Ayende Rahien
12/19/2013 05:49 AM by
Ayende Rahien

Lex, The cache remember the last few pages because if we tried to remember the most accessed pages, we would have a lot more complexity in the cache. Instead, we choose to remember the last few accessed, which make it a lot easier to handle. In practice, it doesn't matter. A cache is local to a transaction anyway.

Ayende Rahien
12/19/2013 06:30 AM by
Ayende Rahien

Infinitas, The major problem there is that he seems to be doing several things differently than us: * Concurrent write access to the tree - while we allow only a single writer. * It is a lot more costly in terms of CPU, it appears. * No support for anything like transactions or ACID * And while it is hard to compare, it is easy to see that we are at least one order of magnitude faster than it according to their benchmark.

So it isn't really an interesting topic.

Howard Chu
12/21/2013 09:18 PM by
Howard Chu

In LMDB cursor_set, we check to see if the new key belongs to the page the cursor is pointing to, before doing a full top-down tree traversal. Thus sequential inserts are always fast/constant time.

Comments have been closed on this topic.