Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,666 | Comments: 48,512

filter by tags archive

Blind optimizations: Making MurmurHash3 faster

time to read 3 min | 512 words

I needed to use Bloom Filters, and I didn’t want to use the implementation we already have in RavenDB. That one is too tied up in our infrastructure to be easily used. So I found the Maybe.NET project. it is nice, but it doesn’t have a CoreCLR Nuget package. This meant that I had to go into the code and look at what is going on. And I started going, “nope, that isn’t the way I want it done”.

Now, to be clear, the code in the project is great. It is clear, obvious and idiomatic C#. It is also raising every red flag I have for inefficient code detection that I had built over the past few years of making RavenDB faster. Because this is such as small sample that I thought it would make a good blog post, because I can explain what the code is doing and what changes I’m doing there, and why. Before I can get to the Bloom Filter implementation, I need to use the hash function, and that was just a full stop for me. Let me show you what I mean. The key parts are below, and you can find the full code here.

image

This is a hash method, it produced several hashes for the purpose of the bloom filter. I underlined in red every time that this code allocates. As you can see, this allocates a lot.

There is also the fact that this accepts a generic object as a parameter and serialize that to a byte[]. I’m ignoring the allocations in that part of the code, but I’m assuming that they are significant. So let’s talk about how we can optimize this function?

Well, to start with, I’m going to decide that accepting an object is too high level. This is a hash function, the caller should give us bytes. Now, let’s see what impact that has on us, shall we?

Now this is much better. We don’t allocate anything in the ComputeHashes method and we give the compiler the chance to build really efficient code here. We can probably require that the maxHashValue be a power of two and avoid the mod operation in favor of bit shifting, but I’m not writing RavenDB here and worrying about every little thing. I’ll leave that part as an exercise for the reader.

Now, let’s look at the actual Hash function, shall we?

There is quite a bit going on, but essentially, I’m using the fixed to get the pointer from the span, then compute the hash in 4 bytes at once, then handle the remainder. There is not allocations and this has far fewer instructions that actually need to run. Note that this would be a great place to stop and run unit tests to verify that I didn’t break something, I’m going to assume that I got it write and close this post, I still want to talk about the optimizations that are available for the bloom filter.

Why IOPS matter for the database?

time to read 3 min | 436 words

imageYou knew that this had to come, after talking about memory and CPU so often, we need to talk about actual I/O.

Did I mention that the cluster was setup by a drunk monkey. That term was raised in the office today, and we had a bunch of people fighting over who was the monkey. So to clarify things, here is the monkey:

image

If you have any issues with this being a drunk monkey, you are likely drunk as well. How about you setup a cluster that we can test things on.

At any rate, after setting things up and push the cluster, we started seeing some odd behaviors. It looked like the cluster was… acting funny.

One of the things we build into RavenDB is the ability to inspect its state easily. You can see it in the image on the right. In particular, you can see that we have a journal write taking 12 seconds to run.

It is writing 76Kb to disk, at a rate of about 6KB per second. To compare, a 1984 modem would actually be faster. What is going on? As it turned out, the IOPS on the system was left in their default state, and we had less than 200 IOPS for the disk.

Did I mention that we are throwing production traffic and some of the biggest stuff we have on this thing? As it turns out, if you use all your IOPS burst capacity, you end up having to get your I/O through a straw.

This is excellent, since it exposed a convoy situation in some cases, and also gave us a really valuable lesson about things we should look at when we are investigating issue (the whole point of doing this “setup by a monkey” exercise).

For the record, here is what this looks like when you do things properly:

image

Why does this matter, by the way?

A journal write is how RavenDB writes to the transaction journal. This is absolutely critical to ensuring that the transaction ACID properties are kept.

It also means that when we write, we must wait for the disk to OK the write before we consider it completed. And that means that there were requests somewhere that were sitting there waiting for 12 seconds for a reply because the IOPS run out.

Handling tens of thousands of requests / sec on t2.nano with Kestrel

time to read 2 min | 332 words

This blog post was a very interesting read. It talks about the ipify service and how it grew. It is an interesting post, but what really caught my eye was the performance details.

In particular, this talks about exceeding 30 billions of requests per month. The initial implementation used Node, and couldn’t get more than 30 req/second or so and the current version seems to be in Go and can handle about 2,000 requests a second.

I’m interested in performance so I decided to see why my results would be. I very quickly wrote the simplest possible implementation using Kestrel and threw that on a t2.nano in AWS. I’m pretty sure that this is the equivalent for the dyno that he is using. I then spun another t2.small instance and used that to bench the performance of the system. All I did was just run the code with release mode on the t2.nano, and here are the results:"

image

So we get 25,000 requests a second and some pretty awesome latencies under very high load on a 1 CPU / 512 MB machine. For that matter, here are the top results from midway through on the t2.nano machine:

image

I should say that the code I wrote was quick and dirty in terms of performance. It allocates several objects per request and likely can be improved several times over, but even so, these are nice numbers. Primarily because there is actually so little that needs to be done here. To be fair, a t2.nano machine is meant for burst traffic and is not likely to be able to sustain such load over time, but even when throttled by an order of magnitude, it will still be faster than the Go implementation Smile.

RavenDB 4.0The Server Dashboard

time to read 1 min | 184 words

One of the things that we have been saving is the hooking together of all the work we have ben doing to expose how RavenDB works into the operations dashboard. This has just landed in the nightly and can give you a lot of insight into exactly what is going on inside your server.

You can see some of the screenshots below. The idea is that in addition to exposing all of these metrics over dedicated endpoints and SNMP, we will also save users the trouble of setting up monitoring and just show them what is going on directly.

Operators can just head to this page and see what is going on, and it is meant to be put as a background for users to observe this during routine operations.

image

image

image

reEntity Framework Core performance tuning–Part III

time to read 1 min | 166 words

I mentioned in the previous post that I’ll take the opportunity to show of some interesting queries. The application itself is here, and you can see how to UI look in the following screenshot:

image

I decided to see what would be the best way to come up with the information we need for this kind of query. Here is what I got.

image

This is using a select object style to get a complex projection back from the server. Here are the results:

image

As you can see, we are able to get all the data we want, in a format that is well suited to just sending directly to the UI with very little work and with tremendous speed.

reEntity Framework Core performance tuning–part I

time to read 3 min | 501 words

I run into a really interesting article about performance optimizations with EF Core and I thought that it deserve a second & third look. You might have noticed that I have been putting a lot of emphasis on performance and I had literally spent years on optimizing relational database access patterns, including building a profiler dedicated for inspecting what an OR/M is doing. I got the source and run the application.

I have a small bet with myself, saying that in any application using a relational database, I’ll be able to find a SELECT N+1 issue within one hour. So far, I think that my rate is 92% or so. In this case, I found the SELECT N+1 issue on the very first page load.

image

Matching this to the code, we have:

image

Which leads to:

image

And here we can already tell that there is a problem, we aren’t accessing the authors. This actually happens here:

image

So we have the view that is generating 10 out of 12 queries. And the more results per page you have, the more this costs.

But this is easily fixed once you know what you are looking at. Let us look at something else, the actual root query, it looks like this:

image

Yes, I too needed a minute to recover from this. We have:

  1. One JOIN
  2. Two correlated sub queries

Jon was able to optimize his code by 660ms to 80ms, which is pretty awesome. But that is all by making modifications to the access pattern in the database.

Given what I do for a living, I’m more interested in what it does inside the database, and here is what the query plan tells us:

image

There are only a few tens of thousands of records and the query is basically a bunch of index seeks and nested loop joins. But note that the way the query is structured forces the database to evaluate all possible results, then filter just the top few. That means that you have to wait until the entire result set has been processed, and as the size of your data grows, so will the cost of this query.

I don’t think that there is much that can be done here, given the relational nature of the data access ( no worries, I’m intending to write another post in this series, you guess what I’m going to write there, right?Smile ).

RavenDB 4.0 Unsung HeroesField compression

time to read 3 min | 498 words

I have been talking a lot about major features and making things visible and all sort of really cool things. What I haven’t been talking about is a lot of the work that has gone into the backend and all the stuff that isn’t sexy and bright. You probably don’t really care how the piping system in your house work, at least until the toilet doesn’t flush. A lot of the things that we did with RavenDB 4.0 is to look at all the pain points that we have run into and try to resolve them. This series of posts is meant to expose some of these hidden features. If we did our job right, you will never even know that these features exists, they are that good.

In RavenDB 3.x we had a feature called Document Compression. This allowed a user to save significant amount of space by having the documents stored in a compressed form on disk. If you had large documents, you could typically see significant space savings from enabling this feature. With RavenDB 4.0, we removed it completely. The reason is that we need to store documents in a way that allow us to load them and work with them in their raw form without any additional work. This is key for many optimizations that apply to RavenDB 4.0.

However, that doesn’t mean that we gave up on compression entirely. Instead of compressing the whole document, which would require us to decompress any time that we wanted to do something to it, we selectively compress individual fields. Typically, large documents are large because they have either a few very large fields or a collection that contain many items. The blittable format used by RavenDB handles this in two ways. First, we don’t need to repeat field names every time, we store this once per document and we can compress large field values on the fly.

Take this blog for instance, a lot of the data inside it is actually stored in large text fields (blog posts, comments, etc). That means that when stored in RavenDB 4.0, we can take advantage of the field compression and reduce the amount of space we use. At the same time, because we are only compressing selected fields, it means that we can still work with the document natively. A trivial example would be to pull the recent blog post titles. we can fetch just these values (and since they are pretty small already, they wouldn’t be compressed) directly, and not have to touch the large text field that is the actual post contents.

Here is what this looks like in RavenDB 4.0 when I’m looking at the internal storage breakdown for all documents.

image

Even though I have been writing for over a decade, I don’t have enough posts yet to make a statistically meaningful difference, the total database sizes for both are 128MB.

JS execution performance and a whole lot of effort…

time to read 2 min | 267 words

I spoke at length about our adventures with JS engine, but I didn’t talk about the actual numbers, because I wanted them to be final.

Here are the results:

Put

Set

Push

Query

3.5

63

125

-

-

Jint - Origin

7

5

12

1784

Jurassic

33

38

42

239

Jint - Optimized

2

6

8

191

I intentionally don’t provide context for those numbers, it doesn’t actually matter.

Put means that we just have a single call to Put in a patch script, Set means that we set a value in the patch, Push add an item to an array. Query test a very simple projection.

You can discard the query value for the original Jint. This was a very trivial implementation that always created a new engine (and paid full cost for it) while we were checking the feature itself.

What is interesting about this is comparing the values to 3.5, we are so much better. Another is that after we moved back to Jint, we run another set of tests, and a lot of the optimizations were directly in our code, primarily in how we send to and receive the data from the JS engine.

And this is without any of the optimizations that we could still write. Identifying common patterns and lifting them would be the obvious answer, and we keep that for later, once we have a few more common scenarios from users to explore.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. RavenDB 4.1 features (11):
    04 Jul 2018 - This document is included in your subscription
  2. Codex KV (2):
    06 Jun 2018 - Properly generating the file
  3. I WILL have order (3):
    30 May 2018 - How Bleve sorts query results
  4. Inside RavenDB 4.0 (10):
    22 May 2018 - Book update
  5. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats