Writing high performance code despite C#

time to read 1 min | 112 words

Consider the following C code snippet:

This code cannot be written in C#. Why? Because you can’t use ‘+’ on bool, and you can’t cast bools. So I wrote this code, instead:

And then I changed it to be this code:

Can you tell why I did that? And what is the original code trying to do?

For that matter (and I’m honestly asking here), how would you write this code in C# to get the best performance?

Hint:

Tweet Share Share 11 comments

Tags:

programming

Comments

13 Sep 2019
15:24 PM

ocoanet

I suppose I would use InlineIL.Fody.

public void M(int dw)
{
    uint symbol = AsUInt32(dw > 0x000000FF) + AsUInt32(dw > 0x0000FFFF) + AsUInt32(dw > 0x00FFFFFF);
    // ...
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static uint AsUInt32(bool b)
{
    IL.Emit.Ldarg_0();
    IL.Emit.Conv_U4();
    return IL.Return<uint>();
}

14 Sep 2019
14:51 PM

svick

What is the expected distribution of dw?

15 Sep 2019
08:27 AM

Simon

To make it look like C (but like C code, we always run all tests):

var symbol = (dw > 0x000000FF ? 1 : 0) + (dw > 0x0000FFFF ? 1 : 0) + (dw > 0x00FFFFFF ? 1 : 0);

15 Sep 2019
19:52 PM

Oren Eini

ocoanet ,Interesting solution. I wouldn't have thought about going that route.

15 Sep 2019
19:52 PM

Oren Eini

svick,Most of the time, pretty small.

15 Sep 2019
20:00 PM

Oren Eini

Simon,The issue isn't how it looks, the problem is how it _works_.In the C version, there are no branches, in the C# version, there are a bunch of them.In general, for high perf code, you want to have branch free code, since that allows the CPU a lot more freedom with how the code is executed.

16 Sep 2019
07:10 AM

ocoanet

At the IL level, bool is an integer type, you can even use bool for an enum underlying type. The standard does not specify that true is 1, it only states that false is 0. Yet, I suppose that the machine code generated by my sample code is similar to the machine code generated by your C code snippet.

16 Sep 2019
10:31 AM

Patrick Huizinga

You can't cast a bool to an uint directly, but you can use the FieldOffsetAttribute (and StructLayoutAttribute) to force a bool field to overlap with an uint field. Now you can set the bool fields and then add up the uint fields.

[StructLayout(LayoutKind.Explicit)]
public struct BoolUint
{
    [FieldOffset(0)] public bool Bool;
    [FieldOffset(0)] public uint Uint;
}

var bu = new BoolUint { Bool = (23 > 16) };
Assert.AreEqual(1u, bu.Uint);

Zero branches, but I don't know the performance characteristics.

Also keep in mind that using this struct the other way around will not allow you to convert a random number into a 1 or 0:

var bu = new BoolUint { Uint = 12 };
var bu2 = new BoolUint { Bool = bu.Bool };
Assert.AreEqual(12u, bu2.Uint);

18 Sep 2019
08:16 AM

Hagen

First of all, your altered version is interesting - if counter-intuitive. I looked at the JITed code and it is really the most compact one can possibly achieve.

00007FF99AF31994  xor         edi,edi  
00007FF99AF31996  cmp         esi,0FFh  
00007FF99AF3199C  jbe         00007FF99AF319BD  
00007FF99AF3199E  mov         edi,1  
00007FF99AF319A3  cmp         esi,0FFFFh  
00007FF99AF319A9  jbe         00007FF99AF319BD  
00007FF99AF319AB  mov         edi,2  
00007FF99AF319B0  cmp         esi,0FFFFFFh  
00007FF99AF319B6  jbe         00007FF99AF319BD  
00007FF99AF319B8  mov         edi,3  
00007FF99AF319BD

So you speculatively assign the result eeach time, and with 1/2 probability, that's already it and we're done. Any other way to put it seems to produce more jumps.

For you interest, just to have a proof-of-concept, here is an algorithm without branching. I doubt it will be faster than the branching one because it is quite heavier, but in some funny scenarios it might rather help to not disturb the prediction and cache lines. The example is unit-tested and should really be correct.

public static int NumAdditionalOctets(uint value)
{
    // fill non-empty octets, ~6 OPS
    var filled = Fill(Fill(Fill(value, 1), 2), 4);

    // from truth table for bits 24, 16 and 8 => coverage, ~12 OPS
    return (int) (
        (filled >> 24 | ~filled >> 16 & filled >> 8) & 1u
        | (filled >> 23 | filled >> 15) & 2u);
}

private static uint Fill(uint value, int shift)
{
    return value | (value >> shift);
}

And this yields

00007FF9A1A93200  mov         eax,ecx  
00007FF9A1A93202  shr         eax,1  
00007FF9A1A93204  or          eax,ecx  
00007FF9A1A93206  mov         edx,eax  
00007FF9A1A93208  shr         edx,2  
00007FF9A1A9320B  or          eax,edx  
00007FF9A1A9320D  mov         edx,eax  
00007FF9A1A9320F  shr         edx,4  
00007FF9A1A93212  or          eax,edx  
00007FF9A1A93214  mov         edx,eax  
00007FF9A1A93216  shr         edx,17h  
00007FF9A1A93219  mov         ecx,eax  
00007FF9A1A9321B  shr         ecx,0Fh  
00007FF9A1A9321E  or          edx,ecx  
00007FF9A1A93220  and         edx,2  
00007FF9A1A93223  mov         ecx,eax  
00007FF9A1A93225  not         ecx  
00007FF9A1A93227  shr         ecx,10h  
00007FF9A1A9322A  mov         r8d,eax  
00007FF9A1A9322D  shr         r8d,8  
00007FF9A1A93231  and         ecx,r8d  
00007FF9A1A93234  shr         eax,18h  
00007FF9A1A93237  or          ecx,eax  
00007FF9A1A93239  and         ecx,1  
00007FF9A1A9323C  mov         eax,edx  
00007FF9A1A9323E  or          eax,ecx

19 Sep 2019
10:02 AM

Lucas Trzesniewski

There's a new shiny intrinsic for this in .NET Core 3: Lzcnt.LeadingZeroCount along with a wrapper: BitOperations.LeadingZeroCount.

Here's an implementation I came up with:

uint symbol = (31 - System.Runtime.Intrinsics.X86.Lzcnt.LeadingZeroCount(dw | 1)) >> 3;

This is very efficient compared to other solutions so far.

I wrote benchmarks for all solutions suggested here, and here are the results:

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-rc1-014190
  [Host]     : .NET Core 3.0.0-rc1-19456-20 (CoreCLR 4.700.19.45506, CoreFX 4.700.19.45604), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-rc1-19456-20 (CoreCLR 4.700.19.45506, CoreFX 4.700.19.45604), 64bit RyuJIT


|           Method |   Value |      Mean |     Error |    StdDev |    Median |
|----------------- |-------- |----------:|----------:|----------:|----------:|
|        NestedIfs |   17151 | 0.3260 ns | 0.0067 ns | 0.0059 ns | 0.3228 ns |
|         InlineIL |   17151 | 0.2856 ns | 0.0101 ns | 0.0095 ns | 0.2826 ns |
| LeadingZeroCount |   17151 | 0.0156 ns | 0.0135 ns | 0.0127 ns | 0.0088 ns |
|            Lzcnt |   17151 | 0.0189 ns | 0.0084 ns | 0.0075 ns | 0.0161 ns |
|            Hagen |   17151 | 0.9252 ns | 0.0081 ns | 0.0068 ns | 0.9234 ns |
|            Union |   17151 | 5.9519 ns | 0.0092 ns | 0.0082 ns | 5.9530 ns |
|       UnsafeCast |   17151 | 1.3950 ns | 0.0124 ns | 0.0116 ns | 1.3932 ns |
|        NestedIfs | 4390911 | 0.3495 ns | 0.0127 ns | 0.0119 ns | 0.3431 ns |
|         InlineIL | 4390911 | 0.2798 ns | 0.0061 ns | 0.0057 ns | 0.2792 ns |
| LeadingZeroCount | 4390911 | 0.0522 ns | 0.0041 ns | 0.0034 ns | 0.0520 ns |
|            Lzcnt | 4390911 | 0.0464 ns | 0.0051 ns | 0.0048 ns | 0.0449 ns |
|            Hagen | 4390911 | 0.8711 ns | 0.0070 ns | 0.0062 ns | 0.8710 ns |
|            Union | 4390911 | 5.6731 ns | 0.0124 ns | 0.0104 ns | 5.6706 ns |
|       UnsafeCast | 4390911 | 1.4757 ns | 0.0150 ns | 0.0140 ns | 1.4759 ns |
|        NestedIfs |      66 | 0.2787 ns | 0.0084 ns | 0.0070 ns | 0.2786 ns |
|         InlineIL |      66 | 0.2428 ns | 0.0096 ns | 0.0085 ns | 0.2424 ns |
| LeadingZeroCount |      66 | 0.0133 ns | 0.0093 ns | 0.0082 ns | 0.0100 ns |
|            Lzcnt |      66 | 0.0118 ns | 0.0068 ns | 0.0063 ns | 0.0102 ns |
|            Hagen |      66 | 0.8718 ns | 0.0097 ns | 0.0086 ns | 0.8705 ns |
|            Union |      66 | 5.6232 ns | 0.0092 ns | 0.0082 ns | 5.6205 ns |
|       UnsafeCast |      66 | 1.3873 ns | 0.0079 ns | 0.0070 ns | 1.3858 ns |

The disassembly of the Lzcnt benchmark method is:

00007ffe`c3269b40 Bench.AyendeSymbolSwitch.Lzcnt()
00007ffe`c3268543 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffe`c3268547 e8fc53ddff      call    00007ffe`c303d948
00007ffe`c326854c 8945fc          mov     dword ptr [rbp-4],eax
00007ffe`c326854f 8b45fc          mov     eax,dword ptr [rbp-4]
00007ffe`c3268552 83c801          or      eax,1
00007ffe`c3268555 f30fbdc0        lzcnt   eax,eax
00007ffe`c3268559 f7d8            neg     eax
00007ffe`c326855b 83c01f          add     eax,1Fh
00007ffe`c326855e c1e803          shr     eax,3

00007ffe`c3269b70 Bench.AyendeSymbolSwitch.get_Value()
00007ffe`c3268588 488b4510        mov     rax,qword ptr [rbp+10h]
00007ffe`c326858c 8b4008          mov     eax,dword ptr [rax+8]

Method got most probably inlined
System.Runtime.Intrinsics.X86.Lzcnt.LeadingZeroCount(UInt32)

24 Sep 2019
05:27 AM

Oren Eini

Lucas,I'm really impressed by this. That is an awesome approach to solving this cleanly and efficently.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB