Ayende @ Rahien

My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:


+972 52-548-6969

, @ Q c

Posts: 6,007 | Comments: 44,760

filter by tags archive

Voron Performance, let go off that tree, dude!

time to read 3 min | 470 words

B-Trees has a few really awesome properties, chief among them is that they are playing very well with the hierarchical nature of data access that we have in our systems (L1 – L3 caches, main memory, disk, etc). Another really nice property is that searches in a B-Trees have very low costs.

However, it became apparent to us during performance tests that actually searching the tree was very costly. Assuming that we have the following pseudo code:

   1: for i in range(1, 1000 * 1000):
   2:    add(i, "val")

What is the cost of actually doing this? Well, an interesting tidbit about this is that every time we add an item to the tree, we need to search it, so we can find where in the tree that new value goes.

Cost of search the tree, based on size:

Number of entries






















The cost, by the way, is O(log36 (N+1)). The log36 comes from the number of entries that we can fit in a single page. This cost ignores the actual search inside a page. But it is a good enough approximation for our needs.

Now, the cost of actually inserting 1 million items is the sum of all of those costs. Which means that the cost for 1 million is 3,576,242.35. This is another example of Schlemiel the Painter algorithm.

What we did was introduce a small cache, which remember the last few pages that we inserted to. That turned the cost down from searching the tree to checking the cache, where we can usually find the value, and gave a nice performance boost.



Well, it's much cheaper to push items on stack or add to a linked list. But if you want that data to be indexed,you need to maintain the index. There's no way to do that in constant time. Poor Shlemil can't carry such huge bucket of paint everywhere on his back.


... but you gave him a nice little can of paint that he can carry everywhere and refill when it's empty. (after realizing i missed the last sentence)

Ayende Rahien

Rafal, Yes, that is pretty much it. By not having to check the full tree very often, we save a lot of back & forth that would get the same result.


I am impressed by Mehdi Gholam's MGIndex in Raptor found here: http://www.codeproject.com/Articles/316816/RaptorDB-The-Key-Value-Store-V2 Would love to hear your thoughts on this index design vs b-trees. An interesting high performance index in c#.


CCB+ Tree in action =).


@Ayende have you considered reusing "the cursor" from the last active transaction as part of a caching strategy? This appears to be especially useful in case inserts or searches are sequential.

Ayende Rahien

Alex, This is effectively what I am doing.

Lex Lavnikov

Why does the cache remember only last pages to insert to?

It has to remember last most accessed pages. Even in case of bulk inserts, the insert location is not guaranteed to be the same.

Ayende Rahien

Lex, The cache remember the last few pages because if we tried to remember the most accessed pages, we would have a lot more complexity in the cache. Instead, we choose to remember the last few accessed, which make it a lot easier to handle. In practice, it doesn't matter. A cache is local to a transaction anyway.

Ayende Rahien

Infinitas, The major problem there is that he seems to be doing several things differently than us: * Concurrent write access to the tree - while we allow only a single writer. * It is a lot more costly in terms of CPU, it appears. * No support for anything like transactions or ACID * And while it is hard to compare, it is easy to see that we are at least one order of magnitude faster than it according to their benchmark.

So it isn't really an interesting topic.

Howard Chu

In LMDB cursor_set, we check to see if the new key belongs to the page the cursor is pointing to, before doing a full top-down tree traversal. Thus sequential inserts are always fast/constant time.

Comment preview

Comments have been closed on this topic.


No future posts left, oh my!


  1. Speaking (3):
    23 Sep 2015 - Build Stuff 2015 (Lithuania & Ukraine), Nov 18 - 24
  2. Production postmortem (11):
    22 Sep 2015 - The case of the Unicode Poo
  3. Technical observations from my wife (2):
    15 Sep 2015 - Disk speeds
  4. Find the bug (5):
    11 Sep 2015 - The concurrent memory buster
  5. Buffer allocation strategies (3):
    09 Sep 2015 - Bad usage patterns
View all series



Main feed Feed Stats
Comments feed   Comments Feed Stats