Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,524 | Comments: 47,985

filter by tags archive

Optimizing select projectionsInterlude, refactoring

time to read 2 min | 256 words

So we got an order of magnitude performance boost, without really doing much, to be honest. But the code is starting to look really ugly, and future optimizations are out until we fix it.

Since we don’t really care about the internal details, we can do two things in one move. First, we’ll cleanup the code by introducing a delegate that will abstract the handling and the second is that this delegate will also allow us to handle caching of additional values simple by way of capturing the delegate state.

Here is what the code looks like:

Note that we have it broken apart into distinct steps, and the ProjectViaJint method is self contained and any caching it needs is done internally. This is much needed refactoring if we want to be able to continue to optimize this code, and it runs in about the same speed as the previous version, so we didn’t hurt anything there.

With this in place, we can see how much better we can make things.

Before we go on, I want to emphasis that this code is actually making a lot of assumptions. The passed dictionary must always have the same set of fields, for example, otherwise we may see a value from a previous iteration. No null handling or error handling is in place, we aren’t showing the unpacking of the values from the JS env to our code, and plenty of other stuff. That isn’t that important at this point, because I’m showing the progression of optimizations, rather then production code.

Optimizing select projectionsPart III

time to read 2 min | 229 words

After optimizing the projection performance by so much by just caching the engine, I got to thinking, we are also creating a new JS object to pass the arguments every single time. What would happen if we’ll cache that?

Here is what I ended up with, again, all of this is pretty brute force (mainly because I’m writing these posts while the baby is asleep, and I’m trying to get as many of them out before she wakes up):

I wouldn’t have expected this to be a dramatic boost, but we got:

And this runs in 0.57 seconds on the 10K, 5K, 1K run.

  • 10K in 413 ms
  • 5K in 110 ms
  • 1K in 45 ms

That is also a third of the cost that we just saved.

This is interesting, so it is worth another check, there are other two potentially expensive operations here, the invocation of the method and the sending of arguments.

Trawling of the Jint code shows that we can remove some abstract by using ICallable directly, and we can cache the array of arguments, this all leads to:

And this runs in 0.37 seconds on the 10K, 5K, 1K run.

  • 10K in 279 ms
  • 5K in 59  ms
  • 1K in 32 ms

And that is a wow. Because right now, without really doing much at all, we area already an order of magnitude higher then our first attempt, and we can do more.

Optimizing select projectionsPart II

time to read 2 min | 236 words

In the previous post, I showed how we can handle select projections and setup a perf test. The initial implementation we had run in about 4.4 seconds for our test. But it didn’t give any thought to performance.

Let us see if we can do better. The first thing to do is to avoid building the Jint engine and parsing the code all the time. The way we set things up, we wrap the actual object literal in a function, and there is no state, so we can reuse the previous engine instance without any issues. ;:

That means that we don’t need to pay the cost of creating a new engine, parsing the code, etc. Here is what this looks like:

Note that the cache key here is the raw projection, not the function we send to the entire, this allows us to avoid any string allocations in the common (cached) path.

And this runs in 0.75 seconds on the 10K, 5K, 1K run.

  • 10K in 574 ms
  • 5K in 137 ms
  • 1K in 51 ms

Just this small change boosted our performance by a major margin.

Note that because the engines are not thread safe, to use that in a real application we’ll need to ensure thread safety. The way I did that is to have a pool of these engines for each projection and just use that, so an engine is always access in a single threaded mode.

Optimizing select projectionsPart I

time to read 2 min | 240 words

I spoke about designing for performance in my previous post and I thought it would be in interesting series  of blog posts. The task I have is that we have a data source of some kind, in our case, I decided to make the task as simple as possible and define the source data as a Dictionary<string, object>, without allowing nesting of data.

We want the user to be able to provide a custom way to project data from that dictionary. Here are a few ways to do that:

And here is the data source in question:

Obviously the user can make the select projections as complex as they wish. But I think that these three sample give us a good representation of the complexity. We’ll also treat them a bit differently, with regards to their likelihood. So when testing, we’ll check that the first projection is run 10,000 times, then second projection is run 5,000 times and the last projection is run 1,000 times.

With the layout of the land settled, let us see how we can get this actually working. I’m going to use Jint and solve the issue in a very brute force manner.

Here is the initial implementation:

And this runs in 4.4 seconds on the 10K, 5K, 1K run.

  • 10K in 2,756 ms
  • 5K in 1,310 ms
  • 1K in 314 ms

I’m pretty sure that we can do better, and we’ll look at that in the next post.

Future proofing for crazy performance optimizations

time to read 2 min | 306 words

Image result for javascript object literal

I talked about the new query language and javascript performance quite a bit, but I wanted post about a conversation that we had at the office. It involves the following design decision:

image

We chose to use an object literal syntax as the complex projection mechanism for several reasons. First, it is easy to implement right now, since we can throw that to the JS engine and let it handle this. Second, it is very readable and given that we are a JSON document database, should be very familiar for users. But the third reason isn’t so obvious. It is using this syntax because we can quite easily optimize most common scenarios very easily.

I can take the object literal and parse that into an AST, and instead of sending that to a JS engine, I can execute that directly. Even if I’m going to implement a trivial AST traverser that can the most basic of operations, that would be enough to handle a lot of scenarios, and I don’t have to be 100%. I can implement an AST walked and run that, and if I find that there is something there that I can’t support I can just fallback to the JS engine.

Since my AST interpreter wouldn’t need any of the JS engine behaviors, it can be implemented much more efficiently and with greater focus. For that matter, I don’t have to limit myself to just interpreting this AST. I can also generate IL code on the fly to evaluate this instead, given me an even higher speed boost.

Basically, this syntax give us at least three levels of optimization opportunities to explore as needed.

With performance, test, benchmark and be ready to back out

time to read 5 min | 979 words

image

Last week I spoke about our attempt to switch our internal JS engine to Jurassic from Jint. The primary motivation was speed, Jint is an interpreter, while Jurassic is compiled to IL and eventually machine code. That is a good thing from a performance standpoint, and the benchmarks we looked at, both external nd internal, told us that we could expect anything between twice as fast and ten times as fast. That was enough to convince me to go for it. I have a lot of plans for doing more with javascript, and if it can be fast enough, that would be just gravy.

So we did that, we took all the places in our code where we were doing something with Jint and moved them to Jurrasic. Of course, that isn’t nearly as simple as it sounds. We have a lot of places like that, and a lot of already existing code. But we also took the time to do this properly, of making sure that there is a single namespace that is aware of JS execution in RavenDB and hide that functionality from the rest of the code.

Now, one of the things that we do with the scripts we execute is expose to them both functions that they can call and documents to look at and mutate. Consider the following patch script:

this.NumberOfOrders++;

This is on a customer document that may be pretty big, as in, tens of KB or higher. We don’t want to have to serialize the whole document into the JS environment and then serialize it back, that road lead to a lot of allocations and extreme performance costs. No, already with Jint we have implemented a wrapper object that we expose to the JS environment that would do lazy evaluation of just the properties that were needed by the script and track all changes so we can reserialize things more easily.

Moving to Jurassic had broken all of that, so we have to re-write it all. The good thing is that we already knew what we wanted, and how we wanted to do it, it was just a matter for us to figure out how Jurassic allows it. There was an epic coding montage (see image on the right) and we got it all back into working state.

Along the way, we paid down a lot of technical debt  around consistency and exposure of operations and made it easier all around to work with JS from inside RavenDB. hate javascript.

After a lot of time, we had everything stable enough so we could now test RavenDB with the new JS engine. The results were… abysmal. I mean, truly and horribly so.

But that wasn’t right, we specifically sought a fast JS engine, and we did everything right. We cached the generated values, we reused instances, we didn’t serialize / deserialize too much and we had done our homework. We had benchmarks that showed very clearly that Jurassic was the faster engine. Feeling very stupid, we started looking at the before and after profiling results and everything became clear and I hate javascript.

Jurassic is the faster engine, if most of your work is actually done inside the script. But most of the time, the whole point of the script is to do very little and direct the rest of the code in what it is meant to do. That is where we actually pay the most costs. And in Jurassic, this is really expensive. Also, I hate javascript.

It was really painful, but after considering this for a while, we decided to switch back to Jint. We also upgraded to the latest release of Jint and got some really nice features in the box. One of the things that I’m excited about is that it has a ES6 parser, even if it doesn’t fully support all the ES6 features. In particular, I really want to look implementing arrow functions for jint, because it would be very cool for our usecase, but this will be later, because I hate javascript.

Instead of reverting the whole of our work, we decided to take this as a practice run of the refactoring to isolate us from the JS engine. The answer is that it was much easier to switch from Jurassic to Jint then it was the other way around, but it is still not a fun process. There is too much that we depend on. But this is a really great lesson in understanding what we are paying for. We got a great deal of refactoring done, and I’m much happier about both our internal API and the exposed interfaces that we give to users. This is going to be much easier to explain now. Oh, and I hate javascript.

I had to deal with three different javascript engines (two versions of Jint and Jurassic) at a pretty deep level. For example, one of the things we expose in our scripts is the notion of null propagation. So you can write code like this:

return user.Address.City.Street.Number.StopAlready;

And even if you have missing properties along the way, the result will be null, and not an error, like normal. That requires quite deep integration with the engine in question and lead to some tough questions about the meaning of the universe and the role of keyboards in our life versus the utility of punching bugs (not a typo).

The actual code we ended up with for this feature is pretty small and quite lovely, but getting  there was a job and a half, causing me to hate javascript, and I wasn’t that fond of it in the first place.

I’ll have the performance numbers comparing all three editions sometimes next week. In the meantime, I’m going to do something that isn’t javascript.

Solving a problem in a way that doesn’t generate dozens more

time to read 2 min | 393 words

Yesterday I showed off some of the new features in RQL, and in particular, two very cool features, the ability to declare functions and the ability to project an object literal directly from the select.

Both of these features are shown here, including usage demonstration:

image

Take a minute to read the query, then consider what we are doing here.

One of the things we want to enable is to have a powerful query capabilities. We started with what SQL syntax and very quickly hit the wall. Leaving aside the issue of working with just rectangular format of SQL, there is a huge problem here.

Something like N1QL or the Noise syntax have both seems that you can deal with non rectangular formats with enough effort. But the problem is that there is really no really good way to express complex imperative stuff easily. For example, N1QL has about 30 functions just to deal with arrays, and 40 just for dates. And those are just the first I looked up that were easy to count.

Going with something with a SQL syntax, including subqueries, complex nesting of expressions, etc would have led to a really complex language. Both from the point of view of implementation and from the point of view of the users.

On the other hand, with RQL, you get the familiar query language, and if you want to just get some data to display, you can do that. When you need to do more, you move to using the object literal syntax and if you need to apply more logic, you write a small function for that and call it.

Imagine write the code above in SQL (just the fact that the Nicks array is on a separate table means that you’ll have a correlated subquery there), and this is a pretty simple query. Choosing this approach means that we actually have very little expected support burden.

That is important, because with the other option I could absolutely see a lot of requests for “can you add a function to do X”, “how do I add my own function” popping up all the time in the support channels. This way, we give it all to you already and the solution is self contained and extensible as needed very easily.

Breaking the language barrier

time to read 5 min | 916 words

In RavenDB 4.0, we decided to finally bite the bullet and write our own query language. That led to a lot of really complex decisions that we had to make. I already posted about the language and you saw some first drafts. RQL is meant to be instantly familiar to anyone who used SQL before.

At the same time, this led to issues. In particular, the major issue is that SQL as a language is meant to handle rectangular tables, and RavenDB isn’t storing data in tables.  For that matter, we also want to give our users the ability to express themselves fully in the query language, that means support for complex queries and complex projections.

For a while, we explored the option of supporting nested selects as the way to express these semantics, but that was pretty horrible, both in terms of the complexity of the implementation and in terms of the complexity of the language. Instead, we decided that take only the good parts out of SQL Smile.

What do I mean by that? Well, here is a pretty simple query:

image

And here is how we can ask for specific fields:

image

You’ll note that we moved the select clause to the end of the query. The idea being that the syntax will hint to the user about the order of operations in the pipeline.

Next we add some filtering, giving us:

image

This isn’t really interesting except for the part where implementing this was a lot of fun. Things become both more interesting and more obvious when we look at a full query, with almost everything there:

image

And again, this is pretty boring, because except for the clauses location, you might as well write SQL. So let us see what happens when we start mixing some of RavenDB’s own operations.

Here is how we can traverse the object graph on the database side:

image

This is an interesting example, because you can see that we have traversed the path of the relation and are able to fetch not just the data from the documents we are looking for, but also the related documents.

It is important to note that this happens after the where, so you can’t filter on that (but you can plug this in the index, but that is a story for another time). I like this syntax, because it is very clear about what is going on, but at the same time, it is also very limited. If I want anything that isn’t rectangular, I need to jump through hops. Instead, let us try something else…

image

The idea is that we can use a JavaScript object literal in place of the select. Instead of fighting with the language and have to fight it, we have a very natural way to express projections from RavenDB.

The cool part is that you can apply logic as well, so things like string concatenation or logic in the select clause are absolutely permitted. However, take a look at the example, we have code duplication here, in the formatting for customer and contact names. We can do something about it, though:

image


The idea is that we also have a way to define functions to hold common logic and handle the more complex details if we need to. In this manner, instead of having to define and deploy transformers, you can define that directly in your query.

Naturally, if you want to calculate the Mandelbrot set inside the query, that is… suspect, but the idea is that having JavaScript applied on the results of the query give us a lot more freedom and easy of use.

The actual client side in C# is going to remain unchanged and was already updated to talk to the new backend, but in our dynamic language clients, I think we’ll work to expose this more, since it is a much better fit in such environments.

Finally, here is the full query, with everything that we can do:

image

Don’t focus too much the actual content of the queries, they are mostly there to show off the syntax. The last query now has the notion of include directly in the query. As an aside, you can now use load and include to handle tertiary includes. I didn’t actually set out to build them, but they came naturally from the syntax, and they are probably going to stay.

I have a lot more to say about this, but I’m so excited about this feature that I just got to get it out there. And yes, as you can see, you can declare several functions, and you can also call them from one another so there is a lot of power at the table right now.

Optimizing JavaScript and solving the halting problemPart II

time to read 3 min | 544 words

In the previous post I talked about the issues we had with wanting to run untrusted code and wanting to use Jurassic to do so. The problem is that when the IL code is generated, it is then JITed and run on the machine directly, we have no control over what it is doing. And that is a pretty scary thing to have. A simple mistake causing an infinite loop is going to take down a thread, which requires us to Abort it, which means that we are now in a funny state. I don’t like funny states in production systems.

So we were stuck, and then I realized that Jurrasic is actually generating IL for the scripts. If it generates the IL, maybe we can put something in the IL? Indeed we can, here is the relevant PR for Jurrasic code. Here is the most important piece of code here:

The key here is that we can provide Jurrasic with a delegate that will be called at specific locations in the code. Basically, we do that whenever you evaluate a loop condition or just before you jump using goto. That means that our code can be called, which means that we can then check whatever limits such as time / number of iterations / memory used have been exceeded and abort.

Now, the code is written in a pretty funny way. Instead of invoking the delegate we gave to the engine, we are actually embedding the invocation directly in each loop prolog. Why do we do that? Calling a delegate is something that has a fair amount of cost when you are doing that a lot. There is a bit of indirection and pointer chasing that you must do.

The code above, however, is actually statically embedding the call to our methods (providing the instance pointer if needed). If we are careful, we can take advantage of that. For example, if we mark our method with aggressive inlining, that will be taken into account. That means that the JavaScript code will turn into IL and then into machine code and our watchdog routine will be there as well. Here is an example of such a method:

We register the watchdog method with the engine. You’ll notice that the code smells funny. Why do we have a watchdog and then a stage 2 watchdog? The reason for that is that we want to inline the watchdog method, so we want it to be small. In the case above, here are the assembly instructions that will be added to a loop prolog.

These are 6 machine instructions that are going to have great branch prediction and in general be extremely lightweight. Once in a while we’ll do a more expensive check. For example, we can check the time or use GC.GetAllocatedBytesForCurrentThread() to see that we didn’t allocate too much. The stage 2 check is still expected to be fast, but by making it only rarely, it means that we can actually afford to do this in each and every place and we don’t have to worry about the performance of that.

This means that our road is opened to now move to Jurassic fully, since the major limiting factor, trusting the scripts, is now handled.

Optimizing JavaScript and solving the halting problemPart I

time to read 3 min | 537 words

RavenDB is a JSON document database, and the natural way to process such documents is with JavaScript. Indeed, there is quite a lot of usage of JS scripts inside RavenDB. They are used for ETL, Subscription filtering, patching, resolving conflicts, managing administration tasks and probably other stuff that I’m forgetting.

The key problem that we have here is that some of the places where we need to call out to JavaScript are called a lot. A good example would be a request to patch request on a query. The result can be a single document modified of 50 millions. If this is the later case, given our current performance profile, it turns out that the cost of evaluating JavaScript is killing our performance.

We are using Jint as the environment that runs our scripts. It works, it is easy to understand and debug and it is an interpreter. That means that it is more focused on correctness then performance. Over the years, we were able to extend it a bit to do all sort of evil things to our scripts ,but the most important thing is that it isn’t actually executing machine code directly, it is always Jint code running and handling everything.

Why is that important? Well, these scripts that we are talking about can be quite evil. I have seen anything from 300 KB of script (yes, that is 300 KB to modify a document that was considerably smaller) to just plain O(N^6) algorithms (document has 6 collections iterate on each of them while iterating on each of the others). These are just the complex ones. The evil ones do things like this:

We have extended Jint to count the number of operations are abort after a certain limit has passed as well as prevent stack overflow attacks. This means that it is much easier to deal with such things. They just gonna run for a while and then abort.

Of course , there is the perf issue of running an interpreter. We turned out eyes to Jurrasic, which took a very different approach, it generate IL on the fly and then execute it. That means that as far as we are concerned, most operations are going to end up running as fast as the rest of our code. Indeed, benchmarks show that Jurrasic is significantly faster (as in, order of magnitude or more). In our own scenario, we saw anything from simply doubling the speed to order of magnitude performance improvement.

However, that doesn’t help very much. The way Jurrasic works, code like the one above is going to hang or die. In fact, Jurassic documentation calls this out explicitly as an issue and recommend dedicating a thread or a thread pool for this and calling Thread.Abort if need. That is not acceptable for us. Unfortunately, trying to fix this in a smart way take us to the halting problem, and I think we already solved that mess.

This limiting issue was the reason why we kept using Jint for a long time. Today we finally had a breakthrough an were able to fix this issue. Here is the PR, but it tells only part of the story, the rest I’ll leave for tomorrow’s post.

FUTURE POSTS

  1. Queries++ in RavenDB: I suggest you can do better - 7 hours from now
  2. Queries++ in RavenDB: Spatial searches - 3 days from now
  3. PR Review: The simple stuff will trip you - 4 days from now
  4. The married couple component design pattern - 5 days from now
  5. Distributed computing fallacies: There is one administrator - 6 days from now

There are posts all the way to Dec 21, 2017

RECENT SERIES

  1. PR Review (9):
    08 Nov 2017 - Encapsulation stops at the assembly boundary
  2. Queries++ in RavenDB (4):
    11 Dec 2017 - Gimme more like this
  3. Production postmortem (21):
    06 Dec 2017 - data corruption, a view from INSIDE the sausage
  4. API Design (9):
    04 Dec 2017 - The lack of a method was intentional forethought
  5. The best features are the ones you never knew were there (5):
    27 Nov 2017 - You can’t do everything
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats