Voron Performance, let go off that tree, dude!

Dec 18 2013

Voron Performance, let go off that tree, dude!

time to read 3 min | 470 words

B-Trees has a few really awesome properties, chief among them is that they are playing very well with the hierarchical nature of data access that we have in our systems (L1 – L3 caches, main memory, disk, etc). Another really nice property is that searches in a B-Trees have very low costs.

However, it became apparent to us during performance tests that actually searching the tree was very costly. Assuming that we have the following pseudo code:

   1: for i in range(1, 1000 * 1000):

   2:    add(i, "val")

What is the cost of actually doing this? Well, an interesting tidbit about this is that every time we add an item to the tree, we need to search it, so we can find where in the tree that new value goes.

Cost of search the tree, based on size:

Number of entries	Cost
1	0.19
1,000	1.93
10,000	2.57
25,000	2.83
50,000	3.02
100,000	3.21
250,000	3.47
500,000	3.66
1,000,000	3.86
2,000,000	4.05

The cost, by the way, is O(log36 (N+1)). The log36 comes from the number of entries that we can fit in a single page. This cost ignores the actual search inside a page. But it is a good enough approximation for our needs.

Now, the cost of actually inserting 1 million items is the sum of all of those costs. Which means that the cost for 1 million is 3,576,242.35. This is another example of Schlemiel the Painter algorithm.

What we did was introduce a small cache, which remember the last few pages that we inserted to. That turned the cost down from searching the tree to checking the cache, where we can usually find the value, and gave a nice performance boost.

Tweet Share Share 11 comments

Tags:

Comments

18 Dec 2013
10:05 AM

Rafal

Well, it's much cheaper to push items on stack or add to a linked list. But if you want that data to be indexed,you need to maintain the index. There's no way to do that in constant time. Poor Shlemil can't carry such huge bucket of paint everywhere on his back.

18 Dec 2013
10:10 AM

Rafal

... but you gave him a nice little can of paint that he can carry everywhere and refill when it's empty. (after realizing i missed the last sentence)

18 Dec 2013
10:18 AM

Ayende Rahien

Rafal, Yes, that is pretty much it. By not having to check the full tree very often, we save a lot of back & forth that would get the same result.

18 Dec 2013
12:10 PM

Infinitas

I am impressed by Mehdi Gholam's MGIndex in Raptor found here: http://www.codeproject.com/Articles/316816/RaptorDB-The-Key-Value-Store-V2 Would love to hear your thoughts on this index design vs b-trees. An interesting high performance index in c#.

18 Dec 2013
12:20 PM

Kirill

CCB+ Tree in action =).

18 Dec 2013
12:23 PM

alex

@Ayende have you considered reusing "the cursor" from the last active transaction as part of a caching strategy? This appears to be especially useful in case inserts or searches are sequential.

18 Dec 2013
12:31 PM

Ayende Rahien

Alex, This is effectively what I am doing.

18 Dec 2013
18:59 PM

Lex Lavnikov

Why does the cache remember only last pages to insert to?

It has to remember last most accessed pages. Even in case of bulk inserts, the insert location is not guaranteed to be the same.

19 Dec 2013
05:49 AM

Ayende Rahien

Lex, The cache remember the last few pages because if we tried to remember the most accessed pages, we would have a lot more complexity in the cache. Instead, we choose to remember the last few accessed, which make it a lot easier to handle. In practice, it doesn't matter. A cache is local to a transaction anyway.

19 Dec 2013
06:30 AM

Ayende Rahien

Infinitas, The major problem there is that he seems to be doing several things differently than us: * Concurrent write access to the tree - while we allow only a single writer. * It is a lot more costly in terms of CPU, it appears. * No support for anything like transactions or ACID * And while it is hard to compare, it is easy to see that we are at least one order of magnitude faster than it according to their benchmark.

So it isn't really an interesting topic.

21 Dec 2013
21:18 PM

Howard Chu

In LMDB cursor_set, we check to see if the new key belongs to the page the cursor is pointing to, before doing a full top-down tree traversal. Thus sequential inserts are always fast/constant time.

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB