Ayende @ Rahien

filter by tags archive

architecture (614) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (642) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1086) rss
raven (1455) rss
ravendb.net (539) rss
reviews (184) rss

2025
- July (5)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Sep 30 2013

RavenDB Webinar - Customer Stories - uCommerce.net

time to read 1 min | 63 words

Tweet Share Share 0 comments

Tags:

raven

Discover how RavenDB powers uCommerce, an e-commerce platform for .NET tightly integrated CMSs Umbraco and Sitecore. Helping customers find the product they're looking for is one of the biggest challenges in e-commerce: Learn how uCommerce integrates RavenDB and its faceted search capabilities into the core platform to solve this challenge.

Monday, October 14, 2013

Space is limited.
Reserve your Webinar seat now at:
https://www2.gotomeeting.com/register/591534170

Sep 30 2013

Sparse files & memory mapped files

time to read 4 min | 696 words

Tweet Share Share 4 comments

Tags:

One of the problems with memory mapped files is that you can’t actually map beyond the end of the file. So you can’t use that to extend your file. I had a thought and set out to check, what will happen if I create a sparse file, a file that only take space when you write to it, and at the same time, map it?

As it turn out, this actually work pretty well in practice. You can do so without any issues. Here is how it works:

using (var f = File.Create(path))
{
    int bytesReturned = 0;
    var nativeOverlapped = new NativeOverlapped();
    if (!NativeMethod.DeviceIoControl(f.SafeFileHandle, EIoControlCode.FsctlSetSparse, IntPtr.Zero, 0,
                                      IntPtr.Zero, 0, ref bytesReturned, ref nativeOverlapped))
    {
        throw new Win32Exception();
    }
    f.SetLength(1024*1024*1024*64L);
}

This create a sparse file that is 64Gb in size. Then we can map it normally:

using (var mmf = MemoryMappedFile.CreateFromFile(path))
using (var memoryMappedViewAccessor = mmf.CreateViewAccessor(0, 1024*1024*1024*64L))
{
    for (long i = 0; i < memoryMappedViewAccessor.Capacity; i += buffer.Length)
    {
        memoryMappedViewAccessor.WriteArray(i, buffer, 0, buffer.Length);
    }
}

And then we can do stuff to it. And that include writing to yet unallocated parts of the file. This also means that you don’t have to worry about writing past the end of the file, the OS will take care of all of that for you.

Happy happy, joy joy, etc.

One problem with this method, however. It means that you have a 64Gb file, but you don’t have that much allocated. What that means in turn is that you might not have that much space available for the file. Which brings up an interesting question, what happens when you are trying to commit a new page, and the disk is out of space? Using file I/O you would get an IO error with the right code. But using memory mapped files, the error would actually turn up during access, which can happen pretty much anywhere. It also means that it is a Standard Exception Handling error in Windows, which require special treatment.

To test this out, I wrote the following so it would write to a disk that had only about 50 Gb free. I wanted to know what would happen when it run out of space. That is actually something that happens, and we need to be able to address this issue robustly. The kicker is that this might actually happen at any time, so that would really result is some… interesting behavior with regards to robustness. In other words, I don’t think that this is a viable option, it is a really cool trick, but I don’t think it is a very well thought out option.

By the way, the result of my experiment was that we had an effectively a frozen process. No errors, nothing, just a hung. Also, I am pretty sure that WriteArray() is really slow, but I’ll check this out at another pointer in time.

Sep 27 2013

Voron, LMDB and the external APIs, on my!

time to read 12 min | 2250 words

Tweet Share Share 11 comments

Tags:

raven

One of the things that I really don’t like in LMDB is the API that is exposed to the user. Well, it is C, so I guess there isn’t much that can be done about it. But let look at the abstractions that are actually exposed to the user by looking how you usually work with Voron.

   1: using (var tx = Env.NewTransaction(TransactionFlags.ReadWrite))

   2: {

   3:     Env.Root.Add(tx, "key/1", new MemoryStream(Encoding.UTF8.GetBytes("123")));

4:

   5:     tx.Commit();

   6: }

7:

8:

   9: using (var tx = Env.NewTransaction(TransactionFlags.Read))

  10: {

  11:     using(var stream = Env.Root.Read(tx, "key/1"))

  12:     using (var reader = new StreamReader(stream))

  13:     {

  14:         var result = reader.ReadToEnd();

  15:         Assert.Equal("123", result);

  16:     }

  17:     tx.Commit();

  18: }

This is a perfectly nice API, it is quite explicit about what is going on, and it gives you a lot of options with regards to how to actually make things happen. It also gives the underlying library about zero chance to do interesting things. Worse, it means that you have to know, upfront, if you want to do a read only or a read/write operation. And since there can be only one write transaction at any given point in time… well, I think you get the point. If you code doesn’t respond well to explicit demarcation between read/write, you have to create a lot of writes transaction, essentially serializing pretty much your entire codebase.

Now, sure, you might have good command / query separation, right? So you have queries for reads and commands for writes, problem solved. Except that the real world doesn’t operate in this manner. Let us consider the trivial case of a user logging in. When a user logs in, we need to check the credentials, and if they are wrong, we need to mark it so we can lock the account after 5 failed tries. That means either having to always do the login in a write transaction (meaning only one user can log it at any time) or we start with a read transaction, then we switch to a write transaction when we need to write.

Either option isn’t really nice as far as I am concerned. Therefor, I came with a different API (which is internally based on the one above). This now looks like this:

   1: var batch = new WriteBatch();

   2: batch.Add("key/1", new MemoryStream(Encoding.UTF8.GetBytes("123")), null);

3:

   4: Env.Writer.Write(batch);

5:

   6: using (var snapshot = Env.CreateSnapshot())

   7: {

   8:     using (var stream = snapshot.Read(null, "key/1"))

   9:     using (var reader = new StreamReader(stream))

  10:     {

  11:         var result = reader.ReadToEnd();

  12:         Assert.Equal("123", result);

  13:     }

  14: }

As you can see, we make use of snapshots & write batches. Those are actually ideas taken from LevelDB. A write batch is a set of changes that we want to apply to the database. We can add any number of changes to the write batch, and it require no synchronization. When we want to actually write those changes, we call Writer.Write(). This will take the entire batch and apply it as a single transactional unit.

However, while it will do so as a single unit, it will also be able to merge concurrent calls to WriteBatch into a single write transaction, increasing the actual concurrency we gain by quite a bit. The expected usage pattern is that you create a snapshot, do whatever you need to do when reading the data, including maybe adding/removing stuff via a WriteBatch, and finally you write it all out.

Problems with this approach:

You can’t read stuff that you just added, because they haven’t been added yet to the actual storage yet. (Generally not that much of an issue in our expected use case)
You need to worry about concurrently modifying the same value in different write batches. (We’re going to add optimistic concurrency option for that purpose)

Benefits of this approach:

We can optimize concurrent writes.
We don’t have to decide in advance whatever we need to read only or read / write.

Sep 26 2013

ChallengeSpot the bug

time to read 1 min | 52 words

Tweet Share Share 42 comments

Tags:

bugs

Speaking of unfair interview questions, this would be a pretty evil one.

Can you see the bug? And how would you fix it?

Sep 25 2013

What is your control group?

time to read 8 min | 1489 words

Tweet Share Share 6 comments

Tags:

One of the areas that where we think Voron can be improved is the free space utilization policy. In particular, smarter free space utilization can lead to better performance, since we won’t have to seek so much.

I spent some time working on that, and I got something that on paper, at least, looks much better, performance wise. But… actual benchmarks showed little to no improvement, and in some cases, actual degradation! That was the point when I realize that I actually needed to have some sort of a control, to see what would be the absolute optimal scenario for us. So I wrote a null free space policy. With no free space, Voron will always go to the end of the file, giving us the best case scenario of sequential writes.

This gives us the following behavior:

Flush      1 with   2 pages -   8 kb writes and   1 seeks (  2 leaves,   0 branches,   0 overflows)
Flush      2 with   8 pages -  32 kb writes and   1 seeks (  7 leaves,   1 branches,   0 overflows)
Flush      3 with  10 pages -  40 kb writes and   1 seeks (  9 leaves,   1 branches,   0 overflows)
Flush     27 with  74 pages - 296 kb writes and   1 seeks ( 72 leaves,   2 branches,   0 overflows)
Flush     28 with  74 pages - 296 kb writes and   1 seeks ( 72 leaves,   2 branches,   0 overflows)
Flush     29 with  72 pages - 288 kb writes and   1 seeks ( 70 leaves,   2 branches,   0 overflows)
Flush  1,153 with 155 pages - 620 kb writes and   1 seeks (102 leaves,  53 branches,   0 overflows)
Flush  1,154 with 157 pages - 628 kb writes and   1 seeks (104 leaves,  53 branches,   0 overflows)
Flush  1,155 with 165 pages - 660 kb writes and   1 seeks (108 leaves,  57 branches,   0 overflows)
Flush  4,441 with 191 pages - 764 kb writes and   1 seeks (104 leaves,  87 branches,   0 overflows)
Flush  4,442 with 196 pages - 784 kb writes and   1 seeks (107 leaves,  89 branches,   0 overflows)
Flush  4,443 with 198 pages - 792 kb writes and   1 seeks (108 leaves,  90 branches,   0 overflows)
Flush  7,707 with 200 pages - 800 kb writes and   1 seeks (106 leaves,  94 branches,   0 overflows)
Flush  7,708 with 204 pages - 816 kb writes and   1 seeks (106 leaves,  98 branches,   0 overflows)
Flush  7,709 with 211 pages - 844 kb writes and   1 seeks (113 leaves,  98 branches,   0 overflows)
Flush  9,069 with 209 pages - 836 kb writes and   1 seeks (107 leaves, 102 branches,   0 overflows)
Flush  9,070 with 205 pages - 820 kb writes and   1 seeks (106 leaves,  99 branches,   0 overflows)
Flush  9,071 with 208 pages - 832 kb writes and   1 seeks (108 leaves, 100 branches,   0 overflows)

And with this, 10,000 transactions with 100 random values each

fill rnd buff separate tx          :    106,383 ms      9,400 ops / sec

And that tells me that for the best case scenario, there is something else that is causing this problem, and it ain’t the cost of doing seeks. I dropped the number of transactions to 500 and run it through a profiler, and I got the following:

In other words, pretty much the entire time was spent just calling FlushViewOfFile. However, I think that we optimized that enough already, didn’t we? Looking at the calls, it seems that we have just one FlushViewOfFile per transaction in this scenario.

In fact, looking at the actual system behavior, we can see:

So seeks wise, we are good. What I can’t understand, however, is why we see those ReadFile calls. Looking at the data, it appears that we run into this whenever we access now portion of the file, so this is the mmap subsystem paging the file contents into memory before we start doing that. It is actually pretty great that it is able to page 1 MB at a time.

Next, let us see what else we can do here. I run the 500 tx test on an HDD drive. And it have given me the following results.

fill rnd sync separate tx          :     25,540 ms      1,958 ops / sec

But note that each write has two writes. One at the end of the file, and one at the file beginning (which is the actual final act of the commit). What happened if we just removed that part?

This give me a very different number:

fill rnd sync separate tx          :     21,764 ms      2,297 ops / sec

So just seeking and writing a single page cost us 17% of our performance. Here are the details from running this test:

Now, this is a meaningless test, added just to check what the relative costs are. We have to do the header write, otherwise we can’t do real transactions.

For fun, I run the same thing using sequential write, giving me 3,619 ops / sec. Since in both cases we are actually doing sequential writes, the major differences was how much we actually wrote. This is the view of writing sequentially:

As you can see, we only have to write 8 – 10 pages per transaction, compare to 110 – 130 in the random case. And that obviously has a lot of implications.

All of this has thought me something very important. In the end, the actual free space policy matters, but not that much. So I need to select something that is good, but that is about it.

Sep 24 2013

RavenDB 3.0 Mystery Feature #1–Recorded

time to read 1 min | 6 words

Tweet Share Share 14 comments

Tags:

raven

Sep 24 2013

So, what have YOU been learning lately?

time to read 3 min | 457 words

Tweet Share Share 32 comments

Tags:

One of the worst things that can happen to you professionally is stagnation. You know what you are doing, you know how it works, and you can coast along very easily. Unfortunately, there is the old, it isn’t what we know that we don’t know that is going to hurt us. It is what we don’t know that we don’t know that is going to bite us in the end.

One of the reasons that I have routinely been going out and searching for difficult codebases to read has been to avoid that. I know that I don’t know a lot, I just don’t know what I don’t know. So I go into an unfamiliar codebase and try to figure out how things work over there.

I have been doing that for quite some time now. And I am not talking about looking at some sample project a poo schlump put out to show how you can do CQRS with 17 projects to create a ToDo app. I am talking about production code, and usually in areas or languages that I am not familiar with.

A short list of the stuff that I have been gone over:

CouchDB (to learn Erlang, actually, but that got me to do DB stuff).
LevelDB
LMDB
NServiceBus
Mass Transit
SignalR
Hibernate
Hibernate Search

Those are codebases that do interesting things that I wanted to learn from. Indeed, I have learned from each of those.

Some people can learn by reading academic papers, I find that I learn best from having a vague idea about what is going on, then diving into the implementation details and seeing how it all fits together.

But the entire post so far was a preface to the question I wanted to ask. If you are reading this post, I am pretty sure that you are a professional developer. Doctors, lawyers and engineers (to name a few) have to recertify every so often, to make sure that they are current. But I have seen all too many developers stagnate to the point where they are very effective in their chosen field (building web apps with jQuery Mobile on ASP.Net WebForms 3.5) and nearly useless otherwise.

So, how are you keeping your skills sharp and your knowledge current? What have you been learning lately? It can be a course, or a book or a side project or just reading code. But, in my opinion, it cannot be something passive. If you were going to answer: “I read your blog” as the answer to that question, that is not sufficient, flatterer. Although, you might want to go a bit further and consider that imitation is the sincerest form of flattery, so go ahead and do something.

Sep 23 2013

Hibernating Rhinos and Managed Designs announce enterprise partnership related to RavenDB

time to read 2 min | 295 words

Tweet Share Share 5 comments

Tags:

September 23, 2013

Deal will allow Hibernating Rhinos customers to get premium level consulting and support services provided by Managed Designs across Europe

Milan, Italy and Sede Izhak, Israel – September 23, 2013. Hibernating Rhinos and Managed Designs today announced a partnership that will allow Hibernating Rhinos customers to get premium level consulting, support and training services appointing Managed Designs as its official partner for the following European countries:

· West Europe Countries: Portugal, Spain (including Andorra), France (including Monaco);

· Central Europe Countries: Luxemburg, Belgium, Germany, Switzerland, Nederland, United Kingdom and Ireland, Denmark, Sweden, Norway, Finland, Austria and Italy (including San Marino and Vatican City)

· East Europe Countries: Czech Republic, Poland, Hungary, Slovakia, Slovenia, Bosnia Herzegovina, Croatia, Serbia, Albania and Greece, Romania and Bulgaria

As per this partnership

“Hibernating Rhinos is committed on developing and marketing a first class document database and wants its customers to get the best experience out of it, so we’re glad having Managed Designs assisting them” said Oren Eini, CEO of Hibernating Rhinos.

“Managed Designs has been enjoying RavenDB for years now, and we’re excited to have been engaged by Hibernating Rhinos in order to have their customers getting the best experience out of the product”, said Andrea Saltarello, CEO of Managed Designs.

About Hibernating Rhinos

Hibernating Rhinos LTD is an Israeli based company, focused on delivering products and services in the database infrastructure field. For more information about Hibernating Rhinos, visit http://www.hibernatingrhinos.com

About Managed Designs

Managed Designs provides consulting, education and software development services helping customers to find solutions to their business problems. For more information about Managed Designs, visit http://www.manageddesigns.it

RavenDB is a registered trademark of Hibernating Rhinos and/or its affiliates. Other names may be trademarks of their respective owners.

Sep 20 2013

Raven Storage–Voron’s Performance

time to read 10 min | 1804 words

Tweet Share Share 10 comments

Tags:

Voron is the code name for another storage engine that we are currently trying, based on LMDB. After taking that for a spin for a while, it is not pretty complete, and I decided to give it some perf test runs. So far, there has been zero performance work. All the usual caveat applies here, with regards to short test runs, etc.

Just like before, we found that this is horribly slow on the first run. The culprit was our debug code that verified the entire structure whenever we added or removed something. One we run it in release mode we started getting more interesting results.

Here is the test code:

using (var env = new StorageEnvironment(new MemoryMapPager("bench.data", flushMode)))
{
    using (var tx = env.NewTransaction(TransactionFlags.ReadWrite))
    {
        var value = new byte[100];
        new Random().NextBytes(value);
        var ms = new MemoryStream(value);
        for (long i = 0; i < Count; i++)
        {
            ms.Position = 0;
            env.Root.Add(tx, i.ToString("0000000000000000"), ms);
        }

        tx.Commit();
    }
     using (var tx = env.NewTransaction(TransactionFlags.ReadWrite))
     {
         DebugStuff.RenderAndShow(tx, tx.GetTreeInformation(env.Root).Root);

         tx.Commit();
     }
}

We write 1 million entries with 100 bytes in size and 8 bytes of the key. We run it in three mode:

fill seq none : 9,578 ms 104,404 ops / sec

fill seq buff : 10,802 ms 92,575 ops / sec

fill seq sync : 9,387 ms 106,528 ops / sec

None means no flushing to disk, let the OS deals with that completely. Buffers means flush to the OS, but not to disk, and full means do a full fsync.

Now, this is pretty stupid way to go about it, I’ve to say. This is doing everything in a single transaction, and we are actually counting the time to close & open the db here as well, but that is okay for now. We aren’t interested in real numbers, just some rough ideas.

Now, let us see how we read it?

using (var env = new StorageEnvironment(new MemoryMapPager("bench.data")))
{
    using (var tx = env.NewTransaction(TransactionFlags.Read))
    {
        var ms = new MemoryStream(100);
        for (int i = 0; i < Count; i++)
        {
            var key = i.ToString("0000000000000000");
            using (var stream = env.Root.Read(tx, key))
            {
                ms.Position = 0;
                stream.CopyTo(ms);
            }
        }

        tx.Commit();
    }
}

And this gives us:

read seq : 3,289 ms 304,032 ops / sec

And again, this is with opening & closing the entire db.

We could do better with pre-allocation of space on the disk, etc. But I wanted to keep things realistic and to allow us to grow.

Next, I wanted to see how much we would gain by parallelizing everything, so I wrote the following code:

using (var env = new StorageEnvironment(new MemoryMapPager("bench.data")))
{
    var countdownEvent = new CountdownEvent(parts);
    for (int i = 0; i < parts; i++)
    {
        var currentBase = i;
        ThreadPool.QueueUserWorkItem(state =>
        {
            using (var tx = env.NewTransaction(TransactionFlags.Read))
            {
                var ms = new MemoryStream(100);
                for (int j = 0; j < Count / parts; j++)
                {
                    var current = j * currentBase;
                    var key = current.ToString("0000000000000000");
                    using (var stream = env.Root.Read(tx, key))
                    {
                        ms.Position = 0;
                        stream.CopyTo(ms);
                    }
                }

                tx.Commit();
            }

            countdownEvent.Signal();
        });
    }
    countdownEvent.Wait();
}

I then run it with 1 – 16 parts, so see how it behaves. Here are the details for this machine:

And the results pretty much match what I expected:

read seq : 3,317 ms 301,424 ops / sec

read parallel 1 : 2,539 ms 393,834 ops / sec

read parallel 2 : 1,950 ms 512,711 ops / sec

read parallel 4 : 2,201 ms 454,172 ops / sec

read parallel 8 : 2,139 ms 467,387 ops / sec

read parallel 16 : 2,010 ms 497,408 ops / sec

We get a 2x perf improvement from running on two cores, running on 4 threads require some dancing around, which caused some perf drop, then we see more time spent in thread switching than anything else, pretty much. As you can see, we see a really pretty jump in performance the more cores we use, until we reach the actual machine limitations.

Note that I made sure to clear the OS buffer cache before each test. If we don't do that, we get:

read seq : 2,562 ms 390,291 ops / sec

read parallel 1 : 2,608 ms 383,393 ops / sec

read parallel 2 : 1,868 ms 535,220 ops / sec

read parallel 4 : 1,646 ms 607,283 ops / sec

read parallel 8 : 1,673 ms 597,557 ops / sec

read parallel 16 : 1,581 ms 632,309 ops / sec

So far, I am pretty happy with those numbers. What I am not happy is the current API, but I’ll talk about this in a separate post.

Sep 19 2013

Optimizing writes in Voron

time to read 7 min | 1357 words

Tweet Share Share 2 comments

Tags:

As I mentioned, one of the things that I have been working on with Voron is optimizing the sad case of random writes. I discussed some of the issues that we had already, and now I want to explain how we approach resolving them.

With LMDB, free space occur on every write, because we don’t make modifications in place, instead, we make modifications to a copy, and free the existing page to be reclaimed later. The way the free space reclamation work, a new page can be allocated anywhere on the file. That can lead to a lot of seeks. With Voron, we used a more complex policy. The file is divided in 4 MB sections. And we will aggregate free space in each section. When we need more space, we will find a section with enough free space and use that, and we will continue to use that for as long as we can. The end result is that we tend to be much more local in the way we are reusing space.

Here are the original results:

Flush     1 with  12 pages   - 48 kb writes and 1  seeks   (11 leaves, 1 branches, 0 overflows)
Flush     2 with  13 pages   - 52 kb writes and 1  seeks   (12 leaves, 1 branches, 0 overflows)
Flush     3 with  21 pages   - 84 kb writes and 1  seeks   (20 leaves, 1 branches, 0 overflows)
 
Flush    27 with  76 pages   - 304 kb writes and 1 seeks  (75 leaves,  1 branches, 0 overflows)
Flush    28 with  73 pages   - 292 kb writes and 1 seeks  (72 leaves,  1 branches, 0 overflows)
Flush    29 with  84 pages   - 336 kb writes and 1 seeks  (80 leaves,  4 branches, 0 overflows)
 
Flush 1,153 with 158 pages - 632 kb writes and 67  seeks (107 leaves, 51 branches, 0 overflows)
Flush 1,154 with 168 pages - 672 kb writes and 65  seeks (113 leaves, 55 branches, 0 overflows)
Flush 1,155 with 165 pages - 660 kb writes and 76  seeks (113 leaves, 52 branches, 0 overflows)
 
Flush 4,441 with 199 pages - 796 kb writes and 146 seeks (111 leaves, 88 branches, 0 overflows)
Flush 4,442 with 198 pages - 792 kb writes and 133 seeks (113 leaves, 85 branches, 0 overflows)
Flush 4,443 with 196 pages - 784 kb writes and 146 seeks (109 leaves, 87 branches, 0 overflows)
 
Flush 7,707 with 209 pages - 836 kb writes and 170 seeks (111 leaves, 98 branches, 0 overflows)
Flush 7,708 with 217 pages - 868 kb writes and 169 seeks (119 leaves, 98 branches, 0 overflows)
Flush 7,709 with 197 pages - 788 kb writes and 162 seeks (108 leaves, 89 branches, 0 overflows)
 
Flush 9,069 with 204 pages - 816 kb writes and 170 seeks (108 leaves, 96 branches, 0 overflows)
Flush 9,070 with 206 pages - 824 kb writes and 166 seeks (112 leaves, 94 branches, 0 overflows)
Flush 9,071 with 203 pages - 812 kb writes and 169 seeks (105 leaves, 98 branches, 0 overflows)

And here are the improved results:

Flush      1 with   2 pages -     8 kb writes and   1 seeks (  2 leaves,   0 branches,   0 overflows)
Flush      2 with   8 pages -    32 kb writes and   1 seeks (  7 leaves,   1 branches,   0 overflows)
Flush      3 with  10 pages -    40 kb writes and   1 seeks (  9 leaves,   1 branches,   0 overflows)
  
Flush     27 with  73 pages -   292 kb writes and   1 seeks ( 72 leaves,   1 branches,   0 overflows)
Flush     28 with  72 pages -   288 kb writes and   1 seeks ( 71 leaves,   1 branches,   0 overflows)
Flush     29 with  71 pages -   284 kb writes and   1 seeks ( 70 leaves,   1 branches,   0 overflows)
  
Flush  1,153 with 157 pages -   628 kb writes and  11 seeks (105 leaves,  52 branches,   0 overflows)
Flush  1,154 with 159 pages -   636 kb writes and   2 seeks (107 leaves,  52 branches,   0 overflows)
Flush  1,155 with 167 pages -   668 kb writes and  17 seeks (111 leaves,  56 branches,   0 overflows)
  
Flush  4,441 with 210 pages -   840 kb writes and  11 seeks (121 leaves,  86 branches,   3 overflows)
Flush  4,442 with 215 pages -   860 kb writes and   1 seeks (124 leaves,  88 branches,   3 overflows)
Flush  4,443 with 217 pages -   868 kb writes and   9 seeks (126 leaves,  89 branches,   2 overflows)
  
Flush  7,707 with 231 pages -   924 kb writes and   7 seeks (136 leaves,  93 branches,   2 overflows)
Flush  7,708 with 234 pages -   936 kb writes and   9 seeks (136 leaves,  97 branches,   1 overflows)
Flush  7,709 with 241 pages -   964 kb writes and  13 seeks (140 leaves,  97 branches,   4 overflows)

Flush  9,069 with 250 pages - 1,000 kb writes and   6 seeks (144 leaves, 101 branches,   5 overflows)
Flush  9,070 with 250 pages - 1,000 kb writes and  13 seeks (145 leaves,  98 branches,   7 overflows)
Flush  9,071 with 248 pages -   992 kb writes and  12 seeks (143 leaves,  99 branches,   6 overflows)

Let us plot this in a chart, so we can get a better look at things:

As you can see, this is a pretty major improvement. But it came at a cost, let us see the cost of size per transaction…

So we improved on the seeks / tx, but got worse on the size / tx. That is probably because of the overhead of keeping the state around, but it also relates to some tunable configuration that we added (the amount of free space in a section that will make it eligible for use.

Annoyingly, after spending quite a bit of time & effort on this, we don’t see a major perf boost here. But I am confident that it’ll come.

Oren Eini

Oren Eini

CEO of RavenDB

RavenDB Webinar - Customer Stories - uCommerce.net

Sparse files & memory mapped files

Voron, LMDB and the external APIs, on my!

ChallengeSpot the bug

What is your control group?

RavenDB 3.0 Mystery Feature #1–Recorded

So, what have YOU been learning lately?

Hibernating Rhinos and Managed Designs announce enterprise partnership related to RavenDB

September 23, 2013

About Hibernating Rhinos

About Managed Designs

Raven Storage–Voron’s Performance

Optimizing writes in Voron

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed