Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,481 | Comments: 47,778

filter by tags archive

reEntity Framework Core performance tuning–Part III

time to read 1 min | 166 words

I mentioned in the previous post that I’ll take the opportunity to show of some interesting queries. The application itself is here, and you can see how to UI look in the following screenshot:

image

I decided to see what would be the best way to come up with the information we need for this kind of query. Here is what I got.

image

This is using a select object style to get a complex projection back from the server. Here are the results:

image

As you can see, we are able to get all the data we want, in a format that is well suited to just sending directly to the UI with very little work and with tremendous speed.

reEntity Framework Core performance tuning–part I

time to read 3 min | 501 words

I run into a really interesting article about performance optimizations with EF Core and I thought that it deserve a second & third look. You might have noticed that I have been putting a lot of emphasis on performance and I had literally spent years on optimizing relational database access patterns, including building a profiler dedicated for inspecting what an OR/M is doing. I got the source and run the application.

I have a small bet with myself, saying that in any application using a relational database, I’ll be able to find a SELECT N+1 issue within one hour. So far, I think that my rate is 92% or so. In this case, I found the SELECT N+1 issue on the very first page load.

image

Matching this to the code, we have:

image

Which leads to:

image

And here we can already tell that there is a problem, we aren’t accessing the authors. This actually happens here:

image

So we have the view that is generating 10 out of 12 queries. And the more results per page you have, the more this costs.

But this is easily fixed once you know what you are looking at. Let us look at something else, the actual root query, it looks like this:

image

Yes, I too needed a minute to recover from this. We have:

  1. One JOIN
  2. Two correlated sub queries

Jon was able to optimize his code by 660ms to 80ms, which is pretty awesome. But that is all by making modifications to the access pattern in the database.

Given what I do for a living, I’m more interested in what it does inside the database, and here is what the query plan tells us:

image

There are only a few tens of thousands of records and the query is basically a bunch of index seeks and nested loop joins. But note that the way the query is structured forces the database to evaluate all possible results, then filter just the top few. That means that you have to wait until the entire result set has been processed, and as the size of your data grows, so will the cost of this query.

I don’t think that there is much that can be done here, given the relational nature of the data access ( no worries, I’m intending to write another post in this series, you guess what I’m going to write there, right?Smile ).

RavenDB 4.0 Unsung HeroesField compression

time to read 3 min | 498 words

I have been talking a lot about major features and making things visible and all sort of really cool things. What I haven’t been talking about is a lot of the work that has gone into the backend and all the stuff that isn’t sexy and bright. You probably don’t really care how the piping system in your house work, at least until the toilet doesn’t flush. A lot of the things that we did with RavenDB 4.0 is to look at all the pain points that we have run into and try to resolve them. This series of posts is meant to expose some of these hidden features. If we did our job right, you will never even know that these features exists, they are that good.

In RavenDB 3.x we had a feature called Document Compression. This allowed a user to save significant amount of space by having the documents stored in a compressed form on disk. If you had large documents, you could typically see significant space savings from enabling this feature. With RavenDB 4.0, we removed it completely. The reason is that we need to store documents in a way that allow us to load them and work with them in their raw form without any additional work. This is key for many optimizations that apply to RavenDB 4.0.

However, that doesn’t mean that we gave up on compression entirely. Instead of compressing the whole document, which would require us to decompress any time that we wanted to do something to it, we selectively compress individual fields. Typically, large documents are large because they have either a few very large fields or a collection that contain many items. The blittable format used by RavenDB handles this in two ways. First, we don’t need to repeat field names every time, we store this once per document and we can compress large field values on the fly.

Take this blog for instance, a lot of the data inside it is actually stored in large text fields (blog posts, comments, etc). That means that when stored in RavenDB 4.0, we can take advantage of the field compression and reduce the amount of space we use. At the same time, because we are only compressing selected fields, it means that we can still work with the document natively. A trivial example would be to pull the recent blog post titles. we can fetch just these values (and since they are pretty small already, they wouldn’t be compressed) directly, and not have to touch the large text field that is the actual post contents.

Here is what this looks like in RavenDB 4.0 when I’m looking at the internal storage breakdown for all documents.

image

Even though I have been writing for over a decade, I don’t have enough posts yet to make a statistically meaningful difference, the total database sizes for both are 128MB.

JS execution performance and a whole lot of effort…

time to read 2 min | 267 words

I spoke at length about our adventures with JS engine, but I didn’t talk about the actual numbers, because I wanted them to be final.

Here are the results:

Put

Set

Push

Query

3.5

63

125

-

-

Jint - Origin

7

5

12

1784

Jurassic

33

38

42

239

Jint - Optimized

2

6

8

191

I intentionally don’t provide context for those numbers, it doesn’t actually matter.

Put means that we just have a single call to Put in a patch script, Set means that we set a value in the patch, Push add an item to an array. Query test a very simple projection.

You can discard the query value for the original Jint. This was a very trivial implementation that always created a new engine (and paid full cost for it) while we were checking the feature itself.

What is interesting about this is comparing the values to 3.5, we are so much better. Another is that after we moved back to Jint, we run another set of tests, and a lot of the optimizations were directly in our code, primarily in how we send to and receive the data from the JS engine.

And this is without any of the optimizations that we could still write. Identifying common patterns and lifting them would be the obvious answer, and we keep that for later, once we have a few more common scenarios from users to explore.

When you forget to turn off the benchmark when you leave the office…

time to read 1 min | 150 words

On our Slack channel, I just got this:

image

This is testing high write load on slow I/O device (a network drive).

We didn’t actually forget to turn off the benchmark when we left the office, this is part of our longevity / performance test suite, but it makes for a much better post title, I think you’ll agree.

By the way, this translate to over 16K writes per second sustained for hours on end. And this is testing our behavior on a network drive, because that is the only drive we had off hand with enough capacity for running such a test for long enough.

And then we just left it under load way over the weekend:

image

Optimizing select projectionsPart IV–Understand, don’t do

time to read 2 min | 382 words

This post is what the entire series has been building building toward to. In the previous post we have refactored our code to make it easier to build additional behaviors. Here is one such behavior, which uses the Esprima project to parse the code and see if it uses a pattern that we can optimize for. Let us see the code, then discuss what it does.

The new stuff is mostly in the TryProjectOptimized method. This method is making heavy use of the new C# features to do easy parsing of ASTs, and it really shows that the C# team is using C# to build the compiler (that is one of the common things that compiler writers do, make their own lives easier). Take this code even a couple of language versions back, and it would be much hard to read and work with.

At any rate, what this code is doing is search for an object literal that does simple assignment of properties. If it detect that, instead of calling into the JS engine, it will produce a delegate that will do just that directly.

The result is pretty much the same, but we have to do so much less. If we can’t detect a pattern that we recognize, we just fall back to the JS engine again, and get the pervious behavior.

This is pretty awesome thing to have, because if we detect a particular common pattern, we can optimize that quite easily. The code really lend itself to that approach now.

What about the performance? Well…

And this runs in 0.31 seconds on the 10K, 5K, 1K run.

  • 10K in 83 ms
  • 5K in 200 ms
  • 1K in 31 ms

So that is two orders of magnitude for the 10K run.

An an exercise, try taking this code and teaching it how to recognize simple expressions like + and –. The second projection example is meant to give you some idea about that.

Oh, and one last thought, we are actually biased against the optimization. The Jint code produce results in formats that we need to pull the data from, and we aren’t accounting for that here. The optimize code can generate the output to be already in the format we need it to be ( the difference between the Dictionary and the JsValue).  

Optimizing select projectionsInterlude, refactoring

time to read 2 min | 256 words

So we got an order of magnitude performance boost, without really doing much, to be honest. But the code is starting to look really ugly, and future optimizations are out until we fix it.

Since we don’t really care about the internal details, we can do two things in one move. First, we’ll cleanup the code by introducing a delegate that will abstract the handling and the second is that this delegate will also allow us to handle caching of additional values simple by way of capturing the delegate state.

Here is what the code looks like:

Note that we have it broken apart into distinct steps, and the ProjectViaJint method is self contained and any caching it needs is done internally. This is much needed refactoring if we want to be able to continue to optimize this code, and it runs in about the same speed as the previous version, so we didn’t hurt anything there.

With this in place, we can see how much better we can make things.

Before we go on, I want to emphasis that this code is actually making a lot of assumptions. The passed dictionary must always have the same set of fields, for example, otherwise we may see a value from a previous iteration. No null handling or error handling is in place, we aren’t showing the unpacking of the values from the JS env to our code, and plenty of other stuff. That isn’t that important at this point, because I’m showing the progression of optimizations, rather then production code.

Optimizing select projectionsPart III

time to read 2 min | 229 words

After optimizing the projection performance by so much by just caching the engine, I got to thinking, we are also creating a new JS object to pass the arguments every single time. What would happen if we’ll cache that?

Here is what I ended up with, again, all of this is pretty brute force (mainly because I’m writing these posts while the baby is asleep, and I’m trying to get as many of them out before she wakes up):

I wouldn’t have expected this to be a dramatic boost, but we got:

And this runs in 0.57 seconds on the 10K, 5K, 1K run.

  • 10K in 413 ms
  • 5K in 110 ms
  • 1K in 45 ms

That is also a third of the cost that we just saved.

This is interesting, so it is worth another check, there are other two potentially expensive operations here, the invocation of the method and the sending of arguments.

Trawling of the Jint code shows that we can remove some abstract by using ICallable directly, and we can cache the array of arguments, this all leads to:

And this runs in 0.37 seconds on the 10K, 5K, 1K run.

  • 10K in 279 ms
  • 5K in 59  ms
  • 1K in 32 ms

And that is a wow. Because right now, without really doing much at all, we area already an order of magnitude higher then our first attempt, and we can do more.

Optimizing select projectionsPart II

time to read 2 min | 236 words

In the previous post, I showed how we can handle select projections and setup a perf test. The initial implementation we had run in about 4.4 seconds for our test. But it didn’t give any thought to performance.

Let us see if we can do better. The first thing to do is to avoid building the Jint engine and parsing the code all the time. The way we set things up, we wrap the actual object literal in a function, and there is no state, so we can reuse the previous engine instance without any issues. ;:

That means that we don’t need to pay the cost of creating a new engine, parsing the code, etc. Here is what this looks like:

Note that the cache key here is the raw projection, not the function we send to the entire, this allows us to avoid any string allocations in the common (cached) path.

And this runs in 0.75 seconds on the 10K, 5K, 1K run.

  • 10K in 574 ms
  • 5K in 137 ms
  • 1K in 51 ms

Just this small change boosted our performance by a major margin.

Note that because the engines are not thread safe, to use that in a real application we’ll need to ensure thread safety. The way I did that is to have a pool of these engines for each projection and just use that, so an engine is always access in a single threaded mode.

Optimizing select projectionsPart I

time to read 2 min | 240 words

I spoke about designing for performance in my previous post and I thought it would be in interesting series  of blog posts. The task I have is that we have a data source of some kind, in our case, I decided to make the task as simple as possible and define the source data as a Dictionary<string, object>, without allowing nesting of data.

We want the user to be able to provide a custom way to project data from that dictionary. Here are a few ways to do that:

And here is the data source in question:

Obviously the user can make the select projections as complex as they wish. But I think that these three sample give us a good representation of the complexity. We’ll also treat them a bit differently, with regards to their likelihood. So when testing, we’ll check that the first projection is run 10,000 times, then second projection is run 5,000 times and the last projection is run 1,000 times.

With the layout of the land settled, let us see how we can get this actually working. I’m going to use Jint and solve the issue in a very brute force manner.

Here is the initial implementation:

And this runs in 4.4 seconds on the 10K, 5K, 1K run.

  • 10K in 2,756 ms
  • 5K in 1,310 ms
  • 1K in 314 ms

I’m pretty sure that we can do better, and we’ll look at that in the next post.

FUTURE POSTS

  1. Giving Demeter PTSD - 9 hours from now
  2. PR Review: Code has cost, justify it - about one day from now
  3. PR Review: Beware the things you can’t see - 4 days from now
  4. The right thing and what the user expect to happen are completely unrelated - 5 days from now
  5. PR Review: It’s the error handling, again - 6 days from now

There are posts all the way to Oct 25, 2017

RECENT SERIES

  1. PR Review (7):
    10 Aug 2017 - Errors, errors and more errors
  2. RavenDB 4.0 (15):
    13 Oct 2017 - Interlocked distributed operations
  3. re (21):
    10 Oct 2017 - Entity Framework Core performance tuning–Part III
  4. RavenDB 4.0 Unsung Heroes (5):
    05 Oct 2017 - The design of the security error flow
  5. Writing SSL Proxy (2):
    27 Sep 2017 - Part II, delegating authentication
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats