Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,399 | Comments: 47,407

filter by tags archive

PR Reviewavoid too many parameters

time to read 2 min | 228 words

During code review I run into these two sections, which raised a flag. Can you tell why?

image

image

The problem with this type of code is two fold. First, we add optional parameters, to reduce the number of breaking changes we have. The problem with that is that we already have parameters on the call, and eventually you’ll get to something like this:

image

Which is the queen of optional parameters method, and you can probably guess how it looks internally.

In the first case, we can add the new optional parameter to the… options variable that we are already sending this method. This way, we don’t have to worry about breaking changes, and we already have a way to setup options, determine defaults, etc.

In the second case, we are passing two bools to the method, and there isn’t a preexisting parameters object. Instead of creating one, we can use a Flags enum, whose bits we can set to determine what exactly the behavior of this method should be. That is generally much easier to maintain in the long run.

PR Reviewthe errors should be nurtured

time to read 2 min | 202 words

I run into the following in a PR I recently reviewed. Take a look and try to figure out why I’m pointing this bits out:

image_thumb[2]

image_thumb[3]

Those are bad errors. In the sense that they are hiding information. Sure, in 90% of the cases the user just put it the backup location or the restore location, so they know what the application is referring to. But the other 10% is looking at the exception from the logs, having “I got an error” support call or all sort of weird stuff.

In particular, one of the most annoying things is when the user and the application disagree on a relative path. If the user believes that the relative path is based on one location and the application another, hilarity ensues. But not in a fun way.

A better way to prevent all of those would be to just include the actual locations (fully resolved, mind) in the error. It doesn’t prevent support calls, but it does make them a lot easier to solve.

Reviewing Noise Search EngineSummary

time to read 5 min | 835 words

imageAfter the last three parts, I think that I have a good high level understanding of what is going on in the Noise codebase, as a reminder, I’m reviewing 48d808375e541eca383826ea928272976bce064d.

I love reading code, especially stuff that I’m not familiar with, such as a new platform or doing something novel. The Noise codebase is quite interesting, and it is a very early stage in the project. That said, when looking at what it is doing and comparing to what I’m doing in my day job, a few things popped up.

I couldn’t help but think that this is a very similar model to the default Azure DocumentDB. All the data is indexed, and the values are shredded as they go in. The model you use was very similar. The nice thing about this kind of model is that it gives you a lot of freedom and flexibility, but the downside is that if you need more, there aren’t any alternatives. Storing large JSON documents inside Noise is likely to cause problems, especially if you are typically working with the full document. If you are just throwing things into it and then pulling back pieces, especially if the documents you write are relatively small, this is likely to be much more suited.

Noise is making some clever usage of the LSM nature of RocksDB to do cleanups, and I really like the elegance of the indexing engine. The query model is also interesting, in that it gives you a lot of freedom to operate. Select the right fields, define the appropriate aggregation, etc. But queries are limited (no computation) and aggregation queries are going to run with cost of O(N) with respect to their results. Given that this is intended to be embedded, and apparently into low end devices, that might very well be the right decision.

A side affect of the shredding is that you need to reconstruct documents all the time, since they were broken into component parts. Things like scanning results by one field while returning another looks like they can generate O(N * log N) number of queries in RocksDB. I’m not sure how much that can scale with regards to queries. On the one hand, there are very clear layers between the pieces of the code, which is great, but that also have some built in issues with regards to how much performance you can squeeze out of the system. I don’t have a feeling for Rust code performance, nor do I think that the project is ready at this early stage for performance analysis, but it is a concern. Then again, I did just spent the past two years head down in a major performance work, so that might very well be the reason. And it is entirely possible that the code will perform significantly better when the project is done.

One last thing to note, when I tried building a replacement to Lucene one of the annoying thing to pop up was that it is really hard to replace Lucene efficiently and still be better in term of time / space / complexity. Noise in particular is storing all the data at least twice (once as terms to be searched, once as the actual document data). This make is very easy to do certain operations (such as update a document), but it also means that the cost of updating a document is now doubled. We first need to index the new document, then we need to re-index old document, compare between the two and then write the changes. I guess that it is possible that reducing the amount work to be written would help matters, but the cost of indexing is likely to dominate.

I would expect a Noise index to be significantly larger than the Lucene equivalent, because Lucene does a lot to reduce file size (in quite evil & nasty ways, to be fair).

Another thing that I don’t think that is good is the reliance on a query language in an embedded solution. A much better idea in my option would be to expose the raw documents out and let the client handle it. If it is running in the same memory space, there is no extra cost. This is because the moment that this open up a huge number of features / issues. I’m talking specifically about the notion of “return” here. The idea is really cool, you can project exactly what you want from the document, and that allows the engine to only pull the relevant data.

At any rate, this is really getting late now, so I’ll close with the note that Noise is a really cool codebase to read. Even if you don’t know Rust, going ahead and reading something non trivial is a really good exercise in learning new things. And I’m really interested in what Noise is going to do next.

Reviewing Noise Search EnginePart III

time to read 5 min | 837 words

After the previous two parts, I think that I have a good high level understanding of what is going on in the Noise codebase, as a reminder, I’m reviewing 48d808375e541eca383826ea928272976bce064d.

Now I’m reading through the JsonValue file. The first interesting thing here is:

image

Given all the code I have read so far, I think that this is a really elegant way to handle JSON, both in terms of how you can define such a recursive structure in 7 lines of code and in how this allow you to express relatively complex scenario quite nicely.

One thing that bugs me is that there are a lot of things like this:

image

This is done to have enough space for escaping. The problem is that this waste quite a bit of space. The last few blog posts I had were about 2KB of text each. Imagine doing a batch of a few thousands of them, just this types of calls can take up a lot of extra memory. Now, I’m used to managed memory, where this kind of thing hangs around and need to be collected. This code is in Rust, and it will clean itself up as soon as it is out of scope. But given that we hold on to it for a non trivial amount of time, I think that it is a concern. It might actually be cheaper to go over the string multiple times to figure out exactly how many escape chars we need.

One of the first optimizations we did when we wrote our own JSON parser was get rid of the costly escape char handling. It took literally more time than anything else, because we kept having to scan the string and break it over and over again.

Another issue that popped up is here:

image

A silent assumption here is that the order of properties in the two objects is the same. Two identical objects that have different property order will sort unequal. That is likely to be a surprise to the user.

The next file is parser.rs, which is over 1,300 lines of code (almost 20% of the entire codebase of Noise!), this is pretty typical parsing code, low level, nasty and quite tedious. I skimmed it briefly, but I don’t think that there is anything really important going on there. Its main purpose is to take a query string and produce a filter, next.

The next file is query.rs and I’m going to assume that this will lead me to also jump to filter.rs which I previous skipped.

The code here start using terms that I’m familiar with from Lucene. We’ll start where the money is, in Query::get_matches, which starts by building the query itself:

image

Basically, create a snapshot of the index, parse the query. We need to pass the snapshot to the parser so it can pass it on to the filters that it builds internally. The order of operations in this code also determines the required order of appearances of the clauses in the Noise query syntax.

The rest of the method goes over the query and build it, setting up ordering, aggregation and such. There is close cooperation between query.rs, filter.rs and snapshot.rs. The query is a high level concept of everything that must happen, filter is a much lower level, just finding the right stuff. On top of that the query layer boosting, scoring, ordering and aggregation. The filter uses the snapshot to access the stored information.

For example, the AllDocsFilter is using the snapshot.new_all_docs_iterator, which is defined as:

image

Basically, scan all the values with the prefix of S, which I think stands for sequence number.

More complex stuff can be seen in the Scorer, as a good example. We have terms prefixed with C, which I think stands for Count, which gives the number of times that a particular term appear in a particular position. And K is the number of times a particular property has shown up in general. This seems to be the basis of tf-idf calculation. Those are merged together using RocksDB merge command and the custom merge policy that we looked at in the index.rs.

At this point I looked at the clock and realized that this is now after midnight and that I got to the most interesting parts that I wanted to see, the rest are pretty familiar details, since a lot of this follows the Lucene construction.

I’m going to have another blog post about my thoughts about this project, which will show up tomorrow.

Reviewing Noise Search EnginePart II

time to read 6 min | 1169 words

In my previous post, I started going over the Noise project code. We got to the indexer, but there is still a lot to do, as a reminder, I’m reviewing 48d808375e541eca383826ea928272976bce064d.

The indexer code handed off a lot of work to the shredder, so it is a good thing that this is my next target for review. At a glance, it looks like Noise is using rustc_serialize to do the actual JSON parsing. I’m guessing it make sense at this point, but given that RavenDB 4.0 had rolled its own JSON parser, and given that Noise is accepting the data as strings, I’m going to assume that JSON parsing costs are going to be high on any profiler output.

The first thing to see in the file is the shredder struct:

image

The existing_key_value_to_delete is very interesting, since it implies that this is how deletes & updates are handled.

It looks like this is working really closely with the key builder. The first method on the file is:

image

There are a whole bunch of stuff like that, with roughly the same pattern (add_bool_null_entries, add_stemmed_entries, etc). Because of the close cooperation of those two, I’m going to be reading them in parallel. The KeyBuilder struct looks like:

image

At this point, I’m guessing that what is going on is that the shredder goes over the JSON  document, and add the property path to the KeyBuilder. But given add_number_entries, I don’t know see how that would work. We are adding the document sequence number to the end of the key path, and store the value of the property in the value. That is… strange. The KeyBuilder data contains the property paths and the current position in the array (possibly nested), so that much is clear, its usage, not so much at this stage.

Because this is confusing, I’ll start from the shred function and go from there, to see what it is exactly that this is going on.

image

This takes the string, pass it to the JSON parser and start iterating over it. Okay, going over the code, that seems easier to follow now. This parse the JSON one token at at time. When it encounters a property, it add that to the KeyBuilder. So the KeyBuilder is basically the path into the current value in the document that we are parsing.

This bugs me, though:

image

I’m following what is going on, and it makes perfect sense. But all those special key prefixes that are scattered are not documented anywhere that I could see, and as I mentioned previously, single letter prefixes are hard on readability and I don’t think they add much, given that RocksDB is using prefix compression anyway.

There is already a PR to fix this, which is likely to be applied by the time you are reading this blog post, by the way. This change the prefixes to be constants, improving code readabily.

A lot of the functionality seems to be inside maybe_add_value, see the call sites:

image

It looks like all the numbers are recorded as floats (given that JSON use double precision IEEE floating point, I guess that make sense). You can see the different prefixes that are given for the values.

The tests at the bottom of the file completes the picture, and if I understand this properly, given the following JSON:

{ "name":"John", "age":31, "city":"New York", “hobbies”: [“reading”, “dogs”] }

The shredder is going to output the following entries, given a document sequence of 555:

  • W.age!f31#555,
  • W.hoobies$!dogs#555,1
  • W.hobbies$!reading#555,0
  • W.name!john#555,

I think that the code calls this key path, and the general format is (white space added:

W <property path> ! <value> # <document sequence> , <array path>

A critical observation is that an array is denoted as a $ in the property path, and there are as many indexes in the array path as there are $ signs in the property path.  I’m currently ignoring the full text search aspect of things, since it is not important for our purposes. With the understanding of what is being generated, let’s move to seeing what is actually going on.

Let us go back to merge_existing_doc, and see how that works. This is called after we shredded a document, so we have done all of the interesting work with regards to parsing the document and building the tokens, splitting the data, etc. At this point, we check if we already have this document, and if so, we load all the existing values and compare them to the current version. The nice thing is that we get to optimize. If a certain value didn’t change, we don’t need to do anything at all. On the other hand, if a value was there and is no longer in the current version, we need to delete it. That is effectively how updates are handled.

With deletes, we do the same, but in reverse, we reconstruct the document, find all the indexed values, and then we remove it. Finally, we have add_all_to_batch, which take all the in memory work that was done and apply that to the RocksDB batch, pretty much finishing all the work. I don’t like that some methods are holding into the state (shredded_key_values / existing_key_value_to_delete)  then flush it and some are manipulating the RocksDB batch directly, it seems like there should be either or kind of approach.

Coming back to the issue that I had with the compaction routine, we have this code, that add a number to the entries:

image

Now, if I’m reading things properly, if the value is zero (not uncommon by any means), that will result in 8 bytes of zeros being written as the value. Combine this with this:

image

And now I think that this will cause that value to be removed during compaction, resulting in data corruption. No idea if this is actually the case, since I’m just going over the code, and not running / testing it thoroughly.

It turns out that I’m not reading this properly (pun intended). I missed the negation on the first line of compaction_filter. The C & K values are the only things that will be removed during compaction, not the other way around, so no data corruption, just me remember that there is a reason I use “== false” instead of “!”, because I keep missing them.

Reviewing Noise Search EnginePart I

time to read 11 min | 2097 words

I run across this post by Damien Katz, and was immediately intrigued. That has several reasons.  The last time I did a deep review of Damien’s code was about a decade ago, when I went over the CouchDB source code, in an attempt to learn Erlang. What end up happening was that I wrote RavenDB.

Note: This is pretty much my thoughts as I’m going through the code, it isn’t meant to be a guide or anything except a record of what went through my head as I read the code.

Another reason is that this is obviously a subject matter that is near & dear to my heart. Searching is something that we do all  the time, and I would dearly love to read some code that isn’t Lucene to do so. Finally, the code is written in Rust, which I have some passing interest at, so I consider this a win all around. You can find the project site here and the code is here. I’m reviewing 48d808375e541eca383826ea928272976bce064d.

Some things that are obvious from the get go, just by skimming through the docs. This is a small project, less than 10,000 lines of code. That is great, since it means that I can review it in one sitting. I also don’t really care that much for the parser and format that they do, I’m mostly interested in how they are storing and searching the data. The storage is done via RocksDB, which is interesting, because it impose certain limitations on the solution (in particular, single key space) so it is interesting how the data is stored.  Most importantly, I really want to see how this is handling updates, which is what I find to be the hardest problem for search impl.

As usual, I’m going to start reading code in file name order, without any relation to the current structure of the code, and jump around from there.

In aggregates.rs, we find this piece of code, which allows to run an aggregation. The init / action / extract are the set  of operations that happen in the aggregation process.

image

Right now I don’t know what JsonValue is, but it is interesting to read just from the signatures. The action function can get a reference to a mutable instance, an immutable instance and a reference to an immutable instance. I’m assuming that the mutable instance is the one that is being modified to hold the aggregated value.

The aggregation itself is implemented by defining an enum that has a get_fun_impls method that initialize the AggregateFunImpls based on the specific aggregation value required.

Sum is defined as two functions, sum_init and sum, without any final step. The actual implementation is:

image

That is fun to read, because it pretty simple, although the syntax in the if let line is really confusing to me (first time in the code, non Rust dev). Basically, it says that we’ll take the JsonValue instance, convert it from a JsonValue to a number instance, and then set it to the same variable name as the original one.

If I was writing it in C it would probably look like:

JsonValue* existing = …;
int* existing = extract_number_(existing);
existing += new;

That is leaving aside the fact that the “new” that is used here and the “new’ in  the function arguments are not the same (well, they are conceptually, but not the same type, etc). I find this confusing, but I know that this is fairly common in Rust code.

The rest of the file is pretty similar constructs for other aggregation operations, and the next is error.rs, which is just boilerplate code that Rust requires from anything. The next file is filters.rs, which seems really interesting, but it doesn’t make sense at the current stage of the code, because it refers to a lot of other things that I haven’t checked yet, so moving to the next one, index.rs. Hopefully that will start the juicy bits.

Indeed, the interesting bits start when we call open, where we open up a rocksdb instance and start playing around. There is some interesting error handling here:

 

image

In particular, note that if the error failed because the database didn’t exists, it is created, and a header version is added. I wholeheartedly approve, because such early planning will pay dividends down the road. It is important to note that assert statements in Rust are going to be there for release. Only debug_assert will be removed in release builds.

The index has the ability to create a snapshot, based on the RocksDB Snapshots. I wrote a blog post on how that works on LevelDB (which RocksDB is based on).

image

I do wonder about the unwrap call here. If I understand correctly, this would cause a panic if I’m access an instance that isn’t open. That might be bad if you have a query hitting the server during shutdown, I’m guessing. I assume that this is oversight because it make the code easier to consume.

The add method there is where stuff really start to happen.

image

This is quite interesting, because you pass it a json string, and a batch (a Noise concept that basically wraps a RocksDB batch).

Actually, thinking about this, there is something that might be trippy here. RocksDB allow concurrent batches, but while it has support for optimistic concurrency, I’m guessing (at this point it is a guess) that this isn’t used by Noise, and that can lead to severe problems if you are trying to add to the index from concurrent threads. This is important because parallel indexing is pretty much the first optimization step that anyone will try, and Lucene calls it out explicitly as a best practice.

imageThe json is parsed by a class called shredder, I’m assuming that this is the JSON parser used by Noise.

The actual behavior is quite interesting.

image

We create a new shredder, then set it on the JSON string. The “let … if let” syntax gave me a pause for a bit. It is elegant, in a crazy sort of way.

But the really interesting bit here is that this code says quite a lot about the way Noise works. First, we now know that Noise has stable document ids, since we can see it being used when we check if the document is already in the index. That means that there is update support, which is pretty cool. Although I do wonder why the values are merged back in. For deletion, or are they just creating a large piece?

If there is no id, Noise uses a guid. I would usually call out that this might be a problem (because down the line, given enough work, it will create a lot of compaction work around the files), but I’m suspecting that not specifying an id is going to be a pretty rare occurrence, so that isn’t important.

Finally, we add it to the batch:

image

That means tat the shredder is probably a very important piece in the system. I’ll continue to read the index code for now and see what I can figure out, before jumping to that code.

Next, we have delete, and the key part of that method is:

image

So we have delete support, and comparing that to the update process is very interesting. They are different, which means that updates aren’t just delete/insert. That is interesting.

Next we have gather_doc_fields, which gets a string document id. What happens from there is interesting. Let us assume that our document id is “users/1”.

Noise searches the RocksDB for a value named “Iusers/1” <- Note that I prefix, I assume to stand for id. That happens in fetch_seq. If there is such a key, its value would be a string representation of int64, which is parsed and returned. That seq is the internal sequence number, I’m assuming. And I’m guessing it is used for most operations. Let us assume that the seq number returned in 1234.

gather_doc_fields then do another RocksDB search for “V1234#”. This is done via a KeyBuilder, which Noise is using to generate keys based on the depth of the search, I’m guessing. Haven’t really looked into that yet.

The code to collect the keys for a document sent me into a bit of a wild goose chase, in the sense that I couldn’t figure out what was going on here:

image

RocksDB is a key/value database, it doesn’t support multiple values per key. Took me a while to figure out that this is a (I’m guessing inefficient) way to to clone the value from the database owned memory to memory controlled by the Noise.

The next method, flush, just update the internal data (version and last sequence number) and submit the write batch. That concludes it as far as I’m concerned, unless something really strange is going on, indexing in Noise must be done in a single threaded manner. Note that there isn’t anything wrong with that (this is what we ended up doing in RavenDB 4.0), but it is something that needs to be stated out load, because it is contrary to the expected practice. Then again, I haven’t seen anything on indexing speed or any sign of benchmarking yet, maybe the team didn’t get there at this point.

The last thing on this file that is of interest are the RocksDB management functions. The compaction_filter – Will keep any key that starts with K or C, but if the value doesn’t start with them, then the value is interpreted as int32, and if it is 0, it is discarded. This is interesting, because HDB (the header for the db) key is not one of the protected values, and so the reason it isn’t thrown out is that Noise is using little endian value here, and accidently the int64 value (1 in this case) is interpreted as non zero int32. I’m assuming that the same holds true for the rest of values, but it seems more like accidental behavior. There is also the fact that we already know that there are some values (sequence numbers) whose value is a stringified int64. That also will pass the “not zero int32”, but even if it works, it looks wrong.

image

So basically, we have a few more key prefixes, which have a certain meaning, yet unknown. But if they are those values, they should be compared using the KeyBuilder comparison, otherwise, just compare them as is. I don’t like the single letter prefixes. RocksDB uses prefix compression, so I don’t think that using more readable prefix would cost anything.

Finally, we have the sum_merge, which is called by RocksDB when you make a Merge call. This is an interesting approach to handling updates, and what this basically does in the case of Noise is that it allow to merge multiple calls to the same value and return the sum total of them. I’m assuming that the value used here is the ref count or some such. Given the interplay between sum_merge and compaction_filter.

That is it for the index.rs file. There are actually tests there for the index, which is interesting. I don’t like having tests in the same file, but I think it is common in Rust.

The only thing of note there is that we find some interesting information about the format of the keys:

The document:

image

Is translated into the following:

image

That is the job of the KeyBuilder, I think, but I’ll leave digging into that to another post.

What does this tidbit DO?

time to read 1 min | 176 words

I run into the following inside a C codebase.

image

That caused some head scratching, until I realized that this is probably just a funny way to write a static assert.

Basically, sizeof() is a constant time value that is being evaluated by the compiler, I assume that the cast to void is to make this clear to the compiler that we are throwing away the value intentionally and that it shouldn’t actually generate any code here.

The rest is pretty basic, the compiler need to compute the size of the array, and to do that, it need to compute the value, but if the above statement is false, the value is –1, and that is not a valid size for the array, so a compiler error will be raised.

I think that there are better ways to do it, but I’m guessing this a more portable version?

Reviewing pull requests with large number of commits

time to read 2 min | 343 words

I have to go over a fairly compress pull request, adding a pretty big feature. The feature, if you care, is compressing data in the leaf pages of a B+Tree for use in Map/Reduce entries storage. The current and non final results are reducing storage costs from about 650MB to 32MB, so we are pretty happy with it, but this isn’t the topic of this post.

At this stage, the feature is not complete, but it is ready for initial review, so we created a temporary PR to review it, here is a partial view:

image

There are over 30 commits over a period of three weeks there, and they represent a lot of experimentation as we go along. This is fine & proper as far as we are concerned, since we this represent quite a bit of work, and we certainly want to be able to track it. But since this isn’t completed, the commit history is long, which means that reviewing it one commit at a time is not trivial. In particular, there are a lot of back and forth changes in the same place, and I don’t want to review code that has been changed.

Luckily, git make it easy to change that view.

I pull the changes to my local branch, then reset the log to the last commit before the merge, then I just squashed all those commit into a single one. You can check it out here.

The final result is:

image

Which is much nicer way to go about reviewing this, since I can look at all the changes in one shot, and more to the point, I can make comments on the changes using our normal workflow.

Digging into the CoreCLRExceptional costs, Part II

time to read 3 min | 429 words

Note, this post was written by Federico.

In my last post we talked about the cost of throwing and catching exceptions at the caller site. Some are straightforward and easily seen like the code complexity, but others are a bit deeper, like for instance how the code ends up like that (we will talk about that, but not just yet). Today we will focus on what to do when we control the call-site and are in a performance sensitive hot-spot.

There are 2 important assumptions here:

  • You own the code
  • You are in a hot-spot

If either one of these two is not true, then this is of zero importance or you are screwed. Smile

So let's modify our code a bit (check in yesterday's post if you don’t remember the details). In order to achieve the same result we will resort to a very well-known pattern that I like to call TryXXX. Many instance of such optimizations are visible in the .Net Framework like the famous int.TryParse method. Apparently someone during the course of using v1.0 (or v1.1) of the Framework figured out that the cost of exception handling for certain scenarios was a bit too much. We probably won’t know who was, but we can all be glad they have fixed it; even though we have to live with an exception based implementation (borrowed from Java style?) as obsolete code since then.

So let's see how the code would look. 

image

Pretty straightforward I might say. Now the interesting thing is what happens at the assembler level:

image

Even under shallow review, we can conclude that this code is definitely faster than the alternative. Now what did we win against the try-catch version? Essentially, we don't have a prolog and an epilog in case of the choosing the exceptional path, that’s faster than having to execute such code. The exception case also does not have to deal with non-local effects caused by unwinding the stack; but we are forced to have a hierarchy of TryXXX methods if that goes deep (the alternative of using exceptions for readability is not great either).

Now in this code we have the first glimpse of evidence of a few JIT design choices (and some current restrictions too) that are important performance wise and we will discuss them in future posts.

Digging into the CoreCLRExceptional costs, Part I

time to read 4 min | 624 words

Note, this post was written by Federico.

One guideline which is commonly known is: "Do not use exceptions for flow control." You can read more about it in many places, but this is good compendium of the most common arguments. If you are not acquainted with the reasons, give them a read first; I’ll wait.

Many of the reasons focus on the readability of the code, but remember, my work (usually) revolves around writing pretty disgusting albeit efficient code. So even though I care about readability it is mostly achieved through very lengthy comments on the code on why you shouldn't touch something if you cannot prove something will be faster.

Digression aside the question is still open. What is the impact of using exceptions for control flow (or having to deal with someone else throwing exceptions) in your performance sensitive code? Let's examine that in detail.

For that we will use a very simple code to understand what can happen. 

image

This is a code that is simple enough so that the assembler won’t get too convoluted, but at the same time sport at least some logic we can use as markers.

Let's first inspect the method CanThrow, in there what we can see is how the throwing of exceptions happen:

image

As you can see there is a lot of things to be done just to throw the exception. There in the last call we will use jump to the proper place in the stack and continue in the catch statement that we hit.

image

So here is the code of our simple method. At the assembler level, our try statement has a very important implication. Each try-catch forces the method to deal with a few control flow issues. First it has to store the exception handler in case anything inside would throw, then it has to do the actual work. If there is no exception (the happy path) we move forward and end. But what happen if we have an exception? We first need to remove the handler (we don't want to recheck this handler if we end up throwing inside the catch, right?) Then execute the catch and be done.

But now let’s contrast that to the generated code if no try-catch statement happens. The avid reader will realize that the happy path will never be executed because we are throwing, but don’t worry, the code is the same if there is no inlining happening.

image

We will talk about why the code ends up like this in a follow up post, but suffice to say that all this trouble cannot beat a check for a Boolean if you needed to fail (and could do something about it). 

It is also important to remember that this kind of work is only relevant if you are in the hot path. If you are not calling a function at least a few tens of thousands a second, don’t even bother, your performance costs are elsewhere. This is micro optimization land.

FUTURE POSTS

  1. Zombies vs. Ghosts: The great debate - about one day from now
  2. Bug stories: The data corruption in the cluster - 2 days from now
  3. Bug stories: How do I call myself? - 3 days from now
  4. Bug stories: The memory ownership in the timeout - 4 days from now
  5. We won’t be fixing this race condition - 5 days from now

And 2 more posts are pending...

There are posts all the way to Jul 04, 2017

RECENT SERIES

  1. RavenDB 4.0 (8):
    13 Jun 2017 - The etag simplification
  2. PR Review (2):
    23 Jun 2017 - avoid too many parameters
  3. Reviewing Noise Search Engine (4):
    20 Jun 2017 - Summary
  4. De-virtualization in CoreCLR (2):
    01 May 2017 - Part II
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats