Optimizing RavenDB by adding Thread.Sleep(5)

time to read 3 min | 455 words

This post is here because we recently had to add this code to RavenDB:

Yes, we added a sleep to RavenDB, and we did it to increase performance.

The story started out with a reported performance regression. On a previous version of RavenDB, the user was able to insert 32,000 documents per second. Same code, same machine, new version of RavenDB, but the performance is 13,000 documents per second.

That is, as we call it internally, and Issue. More specifically issue: RavenDB-14777. Smile

Deeper investigation revealed that the problem was that we are too fast, therefor we are too slow. Adding a sleep fixed the being too fast thing, so we were faster again.

You might need to read the previous paragraph a few times to make sense of it, I’m particularly proud of it. Here is what actually happened. Our bulk insert code is reading from the network and as soon as we have some data, we start parallelizing the write to disk and the read from the network. The idea is that we want to be reduce the user time, so we maximize the amount of work we do. This is a fairly standard optimization for us and has paid many dividends in performance. The way it works, we read from the network until there is nothing available in memory and we have to wait for I/O, at which point we start writing to the disk and wait for the network I/O to resume the operation.

However, the issue is that the versions that the user was trying also included a runtime change. The old version run on .NET Core 2.2 and the new version run on .NET Core 3.1. There has been many optimizations as a result of this change, and it seems that the read from network path has benefited from these.

As a result, we would be able to read the data from the buffer more quickly, which meant that we would end up faster with waiting for network I/O. And that meant that we would do a lot more disk writes because we were better in reading from the network. And that, in turn, slowed down the whole thing enough to be noticeable.

Our change means that we’ll only queue a new disk operation if there has been 5 milliseconds with no new network traffic (or a bunch of other conditions that you don’t really care about). This way, we retain the parallel work and not saturate the disk with small writes.

As I said earlier, we had to pump the brakes to get into real high speed.

Tweet Share Share 20 comments

Tags:

Comments

26 May 2020
12:22 PM

Uri

this doesn't make sense, isn't there a better pattern to handle this? producer-consumer / channels / dataflow?

26 May 2020
12:59 PM

Oren Eini

Uri,

This is using a producer consumer + batching mode. The only difference is what is the trigger for the batch.Instead of "no data available" we changed it to "no data available for 5 ms.

27 May 2020
02:09 AM

Andrés

How many documents per second are you inserting now?

27 May 2020
06:15 AM

Oren Eini

Andres,

We are seeing single bulk insert pushing > 25K docs / sec after this change

27 May 2020
10:39 AM

Sylvain

Hi, Thanks for the interesting post.

A question: I understood you wait for not data available anymore "from the left hand side" (network in this case) then only start writing this data "to the right hand side" (disk in this case). This feels more like a serial process than a parallel one... I missed something... Could you explain it to me again how is this "parallelizing the write to disk and the read from the network"? Maybe it's because of the "batching mode" that I get confused.

Is it because you don't want to start writing to disk while reading from network in order to save CPU for the network reading process?

Or do you actually resume reading from network if new data is coming even before then current write to disk operation is completed?

Thanks in advance.

27 May 2020
10:49 AM

Oren Eini

Sylvain,

We start writing to the disk in an async manner, and read more from the network at the same time, ready for the next write to disk.

27 May 2020
10:59 AM

Sylvain

Thanks Oren.

But then I don't see why the new delay improves things. Is it because without it you would write too small batches of data to disk, thus decreasing the useful/overhead ratio?

27 May 2020
12:00 PM

Oren Eini

Sylvain,

Yes, the issue was that we read from the network too quickly, so we sent a smaller batch to disk. That ended up causing high latency because we kept having to wait for the disk.

When we waiting more for the network, we would send bigger disk batches, so we had more parallelism of work.

27 May 2020
12:53 PM

Adam

Interesting problem! I glanced at the code and wondered if you considered breaking up the process a bit more, like having a central ConcurrentQueue or BlockingCollection that one thread just dumps items into from the network and a separate thread that just dequeues as fast as it likes for disk writes.

27 May 2020
12:55 PM

Oren Eini

Adam,

We did that in the past, but it turns out that the additional complexity isn't worth it. We need to make sure that we aren't reading too much to memory, that we balance network and disk speeds, etc. This ended up being the best option.

27 May 2020
13:35 PM

Sylvain

OK, thanks.

So optimally it would be like: don't start writing data to disk before we get enough data to write from the network, or, if we don't get that amount of data within a given time range, give up and write what we got so far anyway (because we don't want to wait for ever). I guess the 5ms are this "given time range".

Thanks.

27 May 2020
14:22 PM

Andrés

On a previous version of RavenDB, the user was able to insert 32,000 documents per second. Same code, same machine, new version of RavenDB, but the performance is 13,000 documents per second

We are seeing single bulk insert pushing > 25K docs / sec after this change

Well, it does not look as an improvement...

27 May 2020
14:32 PM

Oren Eini

Sylvian,

Yes, that is the case. The idea is that we ensure that we always going to do work, both network and disk

27 May 2020
14:34 PM

Oren Eini

Andres,

The two tests weren't run on the same machine for those numbers. The 32K was on the user's machine, the 25K was on one of our tests machines. With the issue, on the test machine that was saw 25K / sec, the speed was 8K / sec on our tests machine.

27 May 2020
16:33 PM

TrevHunter

Couple of quick things on this:

The turnaround time from the problem report to the fix was really good. It would be interesting to see a future post on both the profiling and diagnostic info you had available to help track down and test this optimization and how the "5ms" magic number was the most optimal batch size (and if different durations work better or worse for different disk types). Wonder if self-tuning is a thing in the future, especially as you run on such diverse HW?
@Andrés - we were part of the discovery of the problem . The 4.2.8 perf on our test rig was about 32k docs/sec. 4.2.101 was about 13k docs per sec. The patch / updated version was slightly better than 4.2.8 (about 34k docs/sec). Not scientific, but the 4.2.10x versions are proving faster and less resource intensive in most regards.

27 May 2020
20:06 PM

Oren Eini

TrevHunter,

What may not be apparent here is that the code here does adaptive behavior.

We are going to read from the network as long as:

There is data in available the network buffer (within 5 ms)
The last write to disk did not complete
The size of the data we read is within a certain limit (about 16 MB, IIRC).

All of these together means that after a short while, we are going to settle on reading from the network in batches that are identical to the time it takes to write these to the disk. There isn't a lot of code here, but the behavior is quite sophisticated.

29 May 2020
00:36 AM

JustPassingBy

I'm curious, try you try other values than 5?

29 May 2020
12:04 PM

Oren Eini

JustPassingBy,

Yes, we tested a whole bunch of values. See the details in the post. 5 ms was the best value.

08 Jun 2020
22:56 PM

Adrian

If a malicious user sent a single byte every 4ms, that would keep one of your threads busy for potentially very long time. If you had enough of such malicious users, you could run out of thread pool threads/sockets/memory/other resources. Does above sound like a real problem or am I missing something?

09 Jun 2020
07:31 AM

Oren Eini

Adrian,

Not really, that would hit the rate limits that we have set and bounce.Also note that we are assuming non malicious user here, this is not generally exposed to the wide world, after all. You need an authenticated certificate to run this.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB