Setting the baseline for performance testing for Voron

time to read 3 min | 580 words

After finishing up the major change of moving Voron to a Write Ahead Journal, it was time to actually start doing some performance testing.

To make things interesting, I decided that we shouldn’t just compare this in isolation, but we should actually compare it to its peers.

Those are early results, and we are going to have to do a lot more work to make sure that everything works faster.

We have run those tests on the following machine:

All the tests were run on a freshly formatted 512GB SSD drive. Note that we are currently showing only the fast runs, we also have a set of tests for much larger data sets (tens of GB) and another for performance over time, but we will deal with those separately. All of the current tests are for writing of 1 million items. Consisting of a 4 bytes integer and a 128 bytes value.

We have tested: SQLite, SQL CE, LMDB, Esent and Voron.

For LMDB, because it needed a fixed file size, we set the initial file size to be 64 GB. All the databases were run using the default configuration options, no secondary indexes were used. All the tests were done using a single thread.

Note that in all cases we used managed code to run the test. This may impact some of the results because some of those engines are native, and there might be some overhead there.

The first test was to see how it performs with sequential writes:

Esent really shines in this, probably because this is pretty much the sweat spot for it. Voron is the second best, but the reason that we do those sorts of tests is to see where we have problems, and I think that we have a problem here, we are supposed to be much better here. In fact, we have earlier tests that show much better performance, so we appear to have a regression. We’ll work on that next.

Next, let us look at sequential reads:

Here, LMDB eclipses everyone else by far, this is its sweet spot. I am pretty happy about Voron’s performance here, especially since it appears to be close to twice as fast as Esent is for this scenario.

Next, we have random writes:

Surprisingly, Voron is doing pretty badly here, even though it is doing much better than LMDB (this is its weak spot) or SQLite.

For random reads, however, the situation is nicer to us:

So, we have our baseline. And I want to see how we can do better. Expect the future posts to focus on what exactly is slowing our writes down.

In the meantime, we do have some really good news, we tested Voron with and without concurrent flushing to the data file, and there isn’t any meaningful difference between the performance of the two options in our current test run.

Tweet Share Share 28 comments

Tags:

raven

Comments

14 Nov 2013
11:51 AM

Khalid Abuhakmeh

Really interesting results. I have to be honest though, I didn't think Esent would be as good as it was. Those numbers make me question whether to just stick with Esent. I'm sure you aren't done yet, so I'll hold my judgement till you feel you are done.

14 Nov 2013
11:56 AM

Ayende Rahien

Khalid, A few things to note. * This is probably the best result possible for Esent. Pure sequential writes with small values, no secondary indexes. * It gets much worse when you start dealing with bigger values and multiple secondary indexes. * It gets bad when we start getting to random writes. * Note the numbers for reads, which are much worse.

14 Nov 2013
16:24 PM

Phil

How are you going about testing Voron?

A suite of unit tests?

Obviously some kind of stress tester/performance tool. It'd be great when the Voron branch is public to dig into these.

14 Nov 2013
16:38 PM

macrohard

Can you publish the tests? I'd like to see how this compares against other embedded DBs, like LevelDB, Bangdb, BerkeleyDB. Those are unmanaged, but I think it's valuable to see how they compare because speed is obviously of the essence.

14 Nov 2013
17:06 PM

Ayende Rahien

Marcohard, We are going to test against LMDB & LevelDB. And yes, we will publish those.

14 Nov 2013
17:08 PM

Ayende Rahien

Phil, There are several levels here. We have a suite of unit tests, then we have the stress testing (the perf test also serve to test that). Then we have other things that are already built on that, which verify that it works well.

14 Nov 2013
18:34 PM

Jeremy Wiebe

I'm assuming part of the motivation for writing Voron is to open up a cross-platform story for RavenDB. So have you done any testing on Linux or Mac? I'm not a RavenDB user but have been following this interesting series on the work you're doing on Voron! Thanks.

14 Nov 2013
19:47 PM

alex

Are these test using a single key/value pair per transaction or are you batching multiple items in a transaction?

14 Nov 2013
20:25 PM

Ayende Rahien

Jeremy, Running on linux is certainly a goal. We want to get it working & stable on Windows, then port it. We expect pretty much all of it to be portable, except for the low level storage stuff.

14 Nov 2013
20:26 PM

Ayende Rahien

Alex, Those tests use 100 items per transaction, 10,000 transactions total.

14 Nov 2013
22:48 PM

alex

It is quite awesome that Voron is already showing competitive performance compared to many highly optimized and established storage solutions.

If my calculus is not too far off, using the numbers you mentioned, for sequential journal writes that would be roughly 3 pages/transaction (2 B+Tree pages and 1 checkpoint page)? So a total of around 30,000 pages in a run time of 42 seconds?

In that case, I think that with the design you are using, once you start optimizing, you will find that you may be able to improve write throughput by at least a factor of 6-7 (which is roughly what I am seeing for a comparable design on a 5400 RPM spindle disk that is shared with OS, virus checker, etc. competing for the same disk).

15 Nov 2013
02:46 AM

njy

@Oren: maybe a silly question, but would I be right assuming that Voron will not have the same cross-OS problems related to raw files that affect Esent?

15 Nov 2013
02:50 AM

njy

One more thing: this has just been announced by facebook http://rocksdb.org/

It may be interesting to you, and considering the experience you are accumulating on the subject, you like to share any thoughts?

15 Nov 2013
05:05 AM

Howard Chu

By the way, for LMDB Sequential Write test, did you use MDB_APPEND?

15 Nov 2013
10:40 AM

Ayende Rahien

Alex, I haven't checked the page numbers, but the cost we have here is actually for fsync, not for doing writes.

15 Nov 2013
10:40 AM

Ayende Rahien

Njy, Voron will be able to run on Linux, yes. And there wouldn't be an issue with moving between OSes.

15 Nov 2013
10:44 AM

Ayende Rahien

njy, RocksDB seems to be built on LevelDB, but increasing complexity to gain better performance. Voron is mostly based on LMDB, and it handles things in very different fashion.

15 Nov 2013
10:44 AM

Ayende Rahien

Howard, No, we didn't do that, I'll make sure to do that for the next set of benchmarks.

15 Nov 2013
14:09 PM

alex

@ayende. True, the cost of an fsync is the main factor in throughput. This is impacted though by the amount of pages you write per fsync even if it is not the main contributor to fsync cost (I am seeing around 2000 pg/s when batch size is 2 pg/sync and around 10000 pg/s when batch size is 32 pg/sync).

But yeah, main costs for an fsync on windows seem to be more related to the size of the file, whether the file's pages are mapped/in OS page cache, whether you are overwriting a file or allocating new sectors and whether the sectors you are writing are strictly sequential.

15 Nov 2013
17:35 PM

Ayende Rahien

Alex, On a normal HD, you can do 200 - 300 fsync/sec. There are other costs associated with this, but they aren't really relevant when this is the top you can do. Note that I don't really believe your 2,000 ps/s with 2 ps/s. This would give you 1000 fsync/sec, which probably means you aren't really doing a real fsync, or you are using a SSD, or fake fsync

15 Nov 2013
21:00 PM

alex

@Ayende, obviously, you are free to believe whatever you want. If you want to check on your own system, I boiled this scenario down to a self-contained minimal sample of "journal only writes" (i.e. without data syncs, error handling or anything else, just batched sequential fsynced writes through memmap).

Code and results on my system (64 bit Core I7, 1 TB 5400 RMP spindle disk) can be found here: https://gist.github.com/anonymous/7491382. When recycling journal chunk files it reaches a maximum of around 3000 fsyncs/s.

15 Nov 2013
23:48 PM

Ayende Rahien

Alex, I believe that you are calling FlushFileBuffers, sure. But I think that the disk is lying to you. See here for fsync limits: http://helpful.knobs-dials.com/index.php/Fsync_notes (90 for 5400rpm, 120 for 7200rpm, 166 for 10000rpm, 250 for 15000rpm)

16 Nov 2013
12:27 PM

Jahmai Lay

Would it be too troublesome to include FoundationDB as a reference? On the one hand I am keen for you to take some time with it, and on the other I'd love to see you knock it's socks off :)

16 Nov 2013
12:38 PM

Ayende Rahien

Jahmai, You can write it yourself, I would love to compare it against more items. https://github.com/ayende/raven.voron/blob/voron/Performance.Comparison/Performance.Comparison/SQLServer/SqlServerTest.cs

16 Nov 2013
19:26 PM

alex

From what I understand, the "fsync promise" is that the device can guarantee that the data reaches stable media. Either because it has actually written it, or because it has cached it and is battery backed, so that a power outage cannot cause the fsync to fail. I believe that is the case with my HDD.

If I disable all disk caching, throughput will drop to about 75-85 fsyncs/s, which matches well with what would be expected for my HDD (a single rotation / fsync: 5400/60 --> 90 rotations/sec).

16 Nov 2013
20:22 PM

Ayende Rahien

Alex, That is what I meant with regards to fsync lies. It doesn't actually save it to the platter.

16 Nov 2013
20:24 PM

Ayende Rahien

Alex, The basic idea is that I am going to do the following:

1 file memory mapped as scratch. I create it with Temporary | DeleteOnClose. All pending writes go there.
journal files are written to as before (write_through, unbuffered), but never read from during normal operations.
Writes to the data file are done via mem map, with occational fsync.

So, during normal ops, we never actually have a thread waiting for fsync.

19 Nov 2013
06:34 AM

Jahmai Lay

Ok so I did;

https://github.com/ayende/raven.voron/pull/7

I can't run compare it myself though because Voron was crashing for me...

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB