GoTo based optimizations

time to read 2 min | 372 words

One of the things that we did recently was go over our internal data structures in RavenDB and see if we can optimize them. Some of those changes are pretty strange if you aren’t following what is actually going on. Here is an example:

Before	After

What is the point in this kind of change? Well, let us look at the actual assembly generated by this, shall we?

Before	After

As you can see, the second option is much shorter, and in the common case, it involves no actual jumping. This ends up being extremely efficient. Note that because we return a value from the ThrowForEmptyStack, the assembly generated is extremely short, since we can rely on the caller to clean us up.

This was run in release mode, CoreCLR, x64. I got the assembly from the debugger, so it is possible that there are some optimizations that hasn’t been applied because the debugger is attached, but it is fairly closed to what should happen for real, I think. Note that the ThrowForEmptyStack is inlined, even though it is an exception only method. If we use [MethodImpl(MethodImplOptions.NoInlining)], it will stop it, but the goto version will still generate better code.

The end result is that we are running much less code, and that makes me happy. In general, a good guide for assembly reading is that shorter == faster, and if you are reading assembly, you are very likely in optimization mode, or debugging the compiler.

I’m pretty sure that the 2.0 release of CoreCLR already fixed this kind of issues, by the way, and it should allow us to write more idiomatic code that generates very tight machine code.

Tweet Share Share 13 comments

Tags:

performance

Comments

23 Mar 2017
11:13 AM

Guillaume Pouillet

In your after.asm file, the je instruction jump to the address 00007FFB4DC92015, which is not present in your file.

Did you try to disassemble at this address to the next ret ?

23 Mar 2017
12:50 PM

Stuart Turner

Would you get similar short hot-path assembly with the following:

if (_size == 0)
{
    return _array[_size - 1];
}
return ThrowForEmptyStack();

I think this would be cleaner to read than goto where your if statement is above and your main path is so short. I do favor goto when you have many exit paths and common cleanup for all of them though.

Also, I think Guillaume is correct: the code for ThrowForEmptyStack() in the after.asm is at address 00007FFB4DC92015, which is not included in your disassembly. However, the critical path is short, which is the important thing, so I don't think it's super relevant what happens after that address. It does mean that your analysis that "Note that because we return a value from the ThrowForEmptyStack, the assembly generated is extremely short", since you are not examining the code for the return from that point.

23 Mar 2017
13:57 PM

Federico Lois

@Guillaume the code must live somewhere it doesnt disapear, the point is that the CPU decoder wont have to decode it (which is the actual trick). This is exactly what the CPU sees/works-on, the extra 88 bytes are not seen by the CPU even if eventually it can jump predict it, the decoder has no way to know there is a jump until the instruction is decoded.

Also if it is an small method will live entirely in a single L1 cache miss (instead of multiple ones). We are talking about sub 20 nanoseconds operations, the code layout is probably the 30% to 40% of the total execution cost (if not more). Think about it, an L1 cache line is 64 bytes, now you have poluted 2 cache lines with useless code, taking 4 cache lines instead of half a cache line (essentially 1).

23 Mar 2017
14:04 PM

Federico Lois

@Stuart, you are right but this is a single tree we planted on our backyard, the real code is far more complex that the demonstration we cooked to show how the optimization operates. Think about the impact in code readability when now instead of 1 condition (precondition) you now have 3. Or what if those operations happen inside a loop, you still have to stop to throw and avoiding poluting your loop is essential at this level.

This is an actual example of real database code: https://github.com/Corvalius/ravendb/commit/f502c192c49d3939b8fb9ca1a21553ac4fbc941b#diff-448d65505ebe234daaa8954fdca2bf1aR283

Now you will see also that we even had put the return true and return false as extremes here. This code is so hot that with just that change we were able to get 1% out (1200ms) on a 1Gb data processing. That is an entire second.

23 Mar 2017
14:10 PM

Federico Lois

@Stuart also something I forgot to mention. We are kinda positive that this is solved in CoreCLR 2.0, the JIT now has the ability to mark code as cold; and will do so when you are only throwing in your method. I know because I reported the issue myself. Changing the logic means that when CoreCLR 2.0 is released and our code is migrated to run there we can just Grep for Error* and just put the calls back into their actual position. The revert now is pretty easy and mostly error free, not so if we change the logic.

The return case will still be necessary though.

23 Mar 2017
14:17 PM

Stuart Turner

@Federico - wow! I've certainly seen cases where common exit point improves performance and/or readability, especially with a complex return. I would never have expected that common exit for return true; would affect performance.

And I completely agree - when you have multiple pre-conditions and common cleanup, goto is a very effective way for improving readability. I was simply commenting on this example, which I did not realize was so heavily simplified.

23 Mar 2017
17:47 PM

dhasenan

This is why GCC has a __builtin_expect macro. Unfortunately, that doesn't exist for C#.

23 Mar 2017
22:50 PM

Federico Lois

@dhasenan I have been pushing that (among other JIT improvements) pretty hard, to the point that I believe we will see it sooner than later. :)

24 Mar 2017
10:42 AM

Onur Gümüş

Isn't this some sort of tail call optimization ?

24 Mar 2017
19:21 PM

Oren Eini

Onur, No, this isn't tail call. It is just avoid the need of the CPU to go over that code.

24 Mar 2017
19:47 PM

Jiri Cincura

This particular case might be even written as one liner. return _size > 0 ? _array[_size - 1] : ThrowForEmptyStack();. Though not sure whether the asm will be any better.

22 May 2017
07:33 AM

hagai luger

Hi, Can you please elaborate on how do you view this assembly? I've created a consoleapplication and put your code there, then I looked at the exe file via ildasm and I believe it's not what you mean, because it's only the intermediate file... So, where is the assembly file created?

22 May 2017
12:23 PM

Oren Eini

hagai luger, Run it in the VS debugger, the go to Debug > Assembly

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB