Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

@

Posts: 5,947 | Comments: 44,541

filter by tags archive

The cost of random writes


After pretty much getting to where we wanted with sequential writes, it is time to try to tackle the cost of random writes. Here is Esent & Voron doing random writes for 1 million entries:

image

Yes, it is sad that this is the case, because we can do about 40 times better for sequential writes.

However… what is the reason for the slowness? As usual, the answer is to profile, profile & profile. In fact, we want to do profiling on multiple machines, to check that we don’t have a machine bias.

At any rate, as you can see, we are actually pretty close to Esent in random writes, but why are we so much slower than our performance during sequential writes?

Well, we actually covered that, actually. The major reason for the cost is actually the fact that we have to deal with very different B-Tree shapes. You can see the difference between sequential and randomly inserted trees here. The more random you are, the bigger your tree is, the more pages you have to touch whenever you update. Our scenario calls for making 100 random modification in every transaction. That means that on every transaction, we have to touch many different parts of the tree.

Here are the results (in pages) for how much it cost us (in this test, we run 5,000 transactions of 100 items each.

 

Sequential

Random

Average

7.06

111.37

Max

11.00

134.00

Total

35,320.00

556,973.00

Std Dev

0.35

7.09

In other words, this means that on average, a sequential transaction wrote 7 pages, or about 28Kb. While the average random transaction wrote about 450Kb!

While the total size written for sequential was 137MB, and allocate 319MB. However, random sizes wrote 2.1 GB for 2.3 GB that were allocated. And that explains quite a lot about the whole thing. (Note that we count for writes only the journal writes, not the data writes, because those are the limiting factor for write performance).

Esent, by the way, writes sequentially about 465 MB and 2.3GB for random writes. Now we know why we are pretty close, because the vast majority of the time is actually spent just writing to the disk, and we are stalled by the sheer amount of data we have.

Interestingly enough, I don’t think that there is a good way to handle that nicely. Certainly not with a B-Tree. We’ll see how we can work with that at a future post.


Comments

Fujiy

When you say sequencial writes, it's about the keys? But sequential keys are wroten sequentially in disk?

Ayende Rahien

Fujiy, We are using a B+Tree structure. The pattern of inserts into the B+Tree is very important. See the link in the post for demonstrating the difference.

Chris Wright

For fast inserts, people these days are generally going with a log-structured merge tree, cache-oblivious lookahead array, or something of that sort. B-trees tend to fall over and die on random inserts, while LSMs and COLAs just chug away like nothing's different.

Ayende Rahien

Chris, Those become a lot harder when you are dealing with persistent data. In particular, LSM usually need to do the merging at some point, and that cause write amplification. I've been able to find a persisted COLA impl to look at, though. Do you know of any?

Howard Chu

Chris: "people these days" are generally stupid trend-followers. Reality is that LSMs suck in sustained write loads. E.g. http://symas.com/mdb/hyperdex/

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  2. The RavenDB Comic Strip (2):
    20 May 2015 - Part II – a team in trouble!
  3. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  4. Interview question (2):
    30 Mar 2015 - fix the index
  5. Excerpts from the RavenDB Performance team report (20):
    20 Feb 2015 - Optimizing Compare – The circle of life (a post-mortem)
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats