Ayende @ Rahien

configurator commented on When mini benchmarks are important

Fri, 01 Jan 2010 00:48:38 GMT

Ayende, May we know which hash function you're using now that you changed it?

Ayende Rahien commented on When mini benchmarks are important

Thu, 31 Dec 2009 14:08:46 GMT

Jose, There are reasons to violate this rule, in this case, ByteBuffer is a Fly Weight, which I use to reduce memory consumption

Patrick Smacchia commented on When mini benchmarks are important

Thu, 31 Dec 2009 13:42:29 GMT

On a side note Dennis, to avoid dead code, instead of case 6: YYY case 7: XXX default: return false; //dead code I usually do case 6: YYY default: Debug.Assert(length == 7); XXX break;

José Romaniello commented on When mini benchmarks are important

Thu, 31 Dec 2009 11:28:30 GMT

Sorry, but the hashcode should be constant for the live of the object. This is a very basic guideline. Am I missing something?

Dennis commented on When mini benchmarks are important

Thu, 31 Dec 2009 07:37:17 GMT

I'm not sure how smart the JIT is... If it is really intelligent, it can use the repl cmpb instruction in this case, which would make it pretty much as fast as your memory is. But even without that, the 32 byte version will allow the CPU to fill all the ALUs (assuming 4 ALUs with 2op/clock) with useful stuff to do. The 8-version has some overhead in loop accounting. For my benchmarking the 32byte one is about 20% faster in the TRUE case for 38 values, your luck may vary depending on your input. But I would say that in general for longer strings the 32byte will win. And the x64 version is alot faster than the x32 version (for obvious reasons, but may also be since the x64 JIT is optimized for performance rather than quick compilation). Using int*s instead of long*s on the 32bit is not any faster.

Dennis commented on When mini benchmarks are important

Thu, 31 Dec 2009 07:29:34 GMT

Another quick optimization due to your getHashcode optimization (and since they use the same implementation) is to compare those two if they have a current valid precomputed value. But it is only useful if the common case is that they are NOT equal.

Ayende Rahien commented on When mini benchmarks are important

Thu, 31 Dec 2009 07:25:52 GMT

Dennis, That is a nice impl, although I 'll admit it took me a moment to understand it. Wouldn't it be just as fast to do? [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] private static unsafe bool UnSafeArrayEquals(byte[] strA, byte[] strB) { int length = strA.Length; if (length != strB.Length) return false; fixed (byte* str = strA, str2 = strB) { long* chPtr3 = (long*)str; long* chPtr4 = (long*)str2; while (length >= 8) { if (*chPtr3++ != *chPtr4++) { return false; } length -= 8; } switch (length) { case 0: return true; case 1: return *(byte*)chPtr3 != *(byte*)chPtr4; case 2: return *(short*)chPtr3 != *(short*)chPtr4; case 3: return *(short*)chPtr3 != *(short*)chPtr4 && *(byte*)(((short*)chPtr3) + 1) != *(byte*)(((short*)chPtr4) + 1); case 4: return *(int*)chPtr3 != *(int*)chPtr4; case 5: return *(int*)chPtr3 != *(int*)chPtr4 && *(byte*)(((int*)chPtr3) + 1) != *(byte*)(((int*)chPtr4) + 1); case 6: return *(int*)chPtr3 != *(int*)chPtr4 && *(short*)(((int*)chPtr3) + 1) != *(short*)(((int*)chPtr4) + 1); case 7: return *(int*)chPtr3 != *(int*)chPtr4 && *(short*)(((int*)chPtr3) + 1) != *(short*)(((int*)chPtr4) + 1) && *(((byte*)chPtr3) + 6) != *(((byte*)chPtr4) + 6); default: return false; //dead code } } }

Dennis commented on When mini benchmarks are important

Thu, 31 Dec 2009 07:19:36 GMT

In that case you can get it about double as fast with unsafe code (or 5% of total runtime according to your profile): [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)] private static unsafe bool UnSafeArrayEquals(byte[] strA, byte[] strB) { int length = strA.Length; if (length != strB.Length) return false; fixed (byte* str = strA, str2 = strB) { long* chPtr3 = (long*)str; long* chPtr4 = (long*)str2; while (length >= 32) { if ((*chPtr3 != *chPtr4) || (*(chPtr3 + 1) != *(chPtr4 + 1)) || (*(chPtr3 + 2) != *(chPtr4 + 2)) || (*(chPtr3 + 3) != *(chPtr4 + 3))) { return false; } chPtr3 += 4; chPtr4 += 4; length -= 32; } while (length >= 8) { if (*chPtr3 != *chPtr4) { return false; } chPtr3 += 1; chPtr4 += 1; length -= 8; } switch (length) { case 0: return true; case 1: return *(byte*)chPtr3 != *(byte*)chPtr4; case 2: return *(short*)chPtr3 != *(short*)chPtr4; case 3: return *(short*)chPtr3 != *(short*)chPtr4 && *(byte*)(((short*)chPtr3) + 1) != *(byte*)(((short*)chPtr4) + 1); case 4: return *(int*)chPtr3 != *(int*)chPtr4; case 5: return *(int*)chPtr3 != *(int*)chPtr4 && *(byte*)(((int*)chPtr3) + 1) != *(byte*)(((int*)chPtr4) + 1); case 6: return *(int*)chPtr3 != *(int*)chPtr4 && *(short*)(((int*)chPtr3) + 1) != *(short*)(((int*)chPtr4) + 1); case 7: return *(int*)chPtr3 != *(int*)chPtr4 && *(short*)(((int*)chPtr3) + 1) != *(short*)(((int*)chPtr4) + 1) && *(((byte*)chPtr3) + 6) != *(((byte*)chPtr4) + 6); default: return false; //dead code } } }

Ayende Rahien commented on When mini benchmarks are important

Thu, 31 Dec 2009 07:06:08 GMT

Andrew, Agreed, and it was changed.

Ayende Rahien commented on When mini benchmarks are important

Thu, 31 Dec 2009 07:05:58 GMT

Dennis I ended up doing something very much like yours, and byteBuffer.Buffer is an array of bytes.

Ayende Rahien commented on When mini benchmarks are important

Thu, 31 Dec 2009 06:57:00 GMT

Configurator, I implemented something similar, but note that I can't do it in StringEqaulsToBuffer, for the simple reason that the byte buffer has an offset that would kill that optimization

Ayende Rahien commented on When mini benchmarks are important

Thu, 31 Dec 2009 06:50:44 GMT

Greg, Rafal & Daniel, I was able to fix the hash function on both locations, thanks.

Dennis commented on When mini benchmarks are important

Thu, 31 Dec 2009 03:44:48 GMT

private static bool StringEqaulsToBuffer(ByteString byteString, ByteBuffer byteBuffer) { var strLen = byteString.Length; if(strLen != byteBuffer.Length) return false; var bytestr = byteString.bytes; var bufferstr = byteBuffer.Buffer; var offset = byteBuffer.Offset; for (int i = 0; i < strLen; offset++, i++) { if(bytestr[i] != bufferstr[offset]) return false; } return true; } But how is byteBuffer.Buffer[] implemented? That might allow for further optimization

Andrew commented on When mini benchmarks are important

Thu, 31 Dec 2009 03:18:41 GMT

RE: Custom has function, its probably also worth checking how even the distribution is between buckets in hashtables/dictionaries etc. Perhaps there's a performance problem somewhere because the hashcode is too similar for many strings etc.

configurator commented on When mini benchmarks are important

Wed, 30 Dec 2009 22:22:13 GMT

@Alex: This example is the actual implementation of string.GetHashCode(), so I'm not the person to ask questions about it. I will note though that the enumeration is not reverse; only 'i' is reversed, and that is to prevent accessing the Length property multiple times. The pointer is incremented in each step though, so the enumeration is actually forward. Note that this is an implementation for a char array. A similar one for bytes (This should be checked before using it, it's notepad code): public override unsafe int GetHashCode() { fixed (byte* str = ((byte*) Buffer)) { byte* bytePtr = str; int num = 0x15051505; int num2 = num; int* numPtr = (int*) bytePtr; for (int i = Buffer.Length; i > 0; i -= 8) { num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0]; if (i <= 4) { break; } num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; numPtr += 2; } return (num + (num2 * 0x5d588b65)); } What's changed? 'char' changed to 'byte', 'i -= 4' changed to 'i -= 8' (because a char is two bytes), and '(i <= 2)' changed to '(i <= 4)'. I think that's pretty much it and this should work for byte arrays.

tobi commented on When mini benchmarks are important

Wed, 30 Dec 2009 20:20:30 GMT

Your hash function is really bad because it only includes the last 4 bytes. If that is enough for you you can optimize this method to be a lot faster^^

Alex Yakunin commented on When mini benchmarks are important

Wed, 30 Dec 2009 20:18:31 GMT

Hmm... Just noticed he wrote GetHashCode example - so it's not clear why he uses reverse enumeration in this case. But what I wrote above is still intact for comparison.

Alex Yakunin commented on When mini benchmarks are important

Wed, 30 Dec 2009 20:15:05 GMT

Nice you started to care about such benchmarks ;) For exactly this case the optimization suggested by configurator must be ideal: - Strings are more frequently differ in the end, so better to compare them in reverse order. If you'd take a look at built-in string comparers in .NET, you'll find they use this assumption. - It's really better to use pointers accessing integer values, especially if you're going to use above assumption (JIT compiler does not recognize backward array enumeration + ~ it is anyway a bit faster then byte-by-byte comparison).

firefly commented on When mini benchmarks are important

Wed, 30 Dec 2009 19:27:39 GMT

Greg is right, most of the time the JIT compiler is really good at inlining property accessor for a loop. I also agree with the commenters, C and Assembly programmers are extremely good at bit manipulation so I would study their code to see how they do it. Perhaps do a search on bit masking, bit shifting and bitwise operation will get you to the right place. Something similar to what configurator is doing. Though his is specific so you might want to come up with your own.

configurator commented on When mini benchmarks are important

Wed, 30 Dec 2009 16:29:30 GMT

I agree with Daniel and Rafal. This hash function is horrible and will cost you. Here's a good, fast and tested hash function - it will give different results on 32-bit and 64-bit machines but that doesn't matter. Hope you don't mind it's unsafe. public override unsafe int GetHashCode() { fixed (char* str = ((char*) Buffer)) { char* chPtr = str; int num = 0x15051505; int num2 = num; int* numPtr = (int*) chPtr; for (int i = Buffer.Length; i > 0; i -= 4) { num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0]; if (i <= 2) { break; } num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; numPtr += 2; } return (num + (num2 * 0x5d588b65)); } }

configurator commented on When mini benchmarks are important

Wed, 30 Dec 2009 16:20:11 GMT

The JIT has optimizations for array access in a loop when the loop variable is bound to the array size (no bounds checking). If ByteBuffer.Buffer is actually an array, and if the entire buffer is used commonly (Offset = 0, Length = buffer.Length), you could try: private static bool StringEqaulsToBuffer(ByteString byteString, ByteBuffer byteBuffer) { var buffer = byteBuffer.Buffer; if(byteString.Length != buffer.Length) return false; for (int i = 0; i < buffer.Length; i++) { if(byteString.bytes[i] != buffer[i]) return false; } return true; } You'd have to make the Offset-using version a special case though, so if you use it often this will be no optimization at all...

Rafal commented on When mini benchmarks are important

Wed, 30 Dec 2009 15:51:44 GMT

Bad hash function will make hashtable operations more costly, so probably its better to fix the function than try to run the bad version faster

Daniel commented on When mini benchmarks are important

Wed, 30 Dec 2009 14:29:29 GMT

If you have to use that stupid hash function, surely you can implement it by looking only at the last 4 characters? That would save that loop which can get expensive for long strings. In fact, if we're talking x86 assembly, GetHashCode() for Length>=4 can be implemented in a few instructions // given EBX=Buffer+Offset and ECX=Length MOV EAX, EBX[ECX-4] BSWAP EAX // convert little endian to big endian RET I don't know if there's a way to make the .NET JIT output something like that, I would try something like public override int GetHashCode() { if (Length >= 4) { return IPAddress.HostToNetworkOrder(BitConverter.ToInt32(Buffer, Offset + Length - 4)); } else { // your code } }

Ayende Rahien commented on When mini benchmarks are important

Wed, 30 Dec 2009 12:35:55 GMT

Greg I haven't considered the effect of running under the profiler, you are right. But the perf boost was noticable even without the profiler you are correct about the hash func as well, in my defence, I am matching the behaviour of a third party

Ayende Rahien commented on When mini benchmarks are important

Wed, 30 Dec 2009 12:27:59 GMT

Andrews yes I fond that ot after I did the profiling and dht want to redo it and I recalc immediately

Patrick Smacchia commented on When mini benchmarks are important

Wed, 30 Dec 2009 11:51:20 GMT

Some others tips on Micro optimization I found btw: [codebetter.com/.../...to-increase-performance.aspx](http://codebetter.com/blogs/patricksmacchia/archive/2009/04/19/micro-optimization-tips-to-increase-performance.aspx)

Patrick Smacchia commented on When mini benchmarks are important

Wed, 30 Dec 2009 11:50:00 GMT

Micro-optimizations are needed I think: [codebetter.com/.../...need-micro-optimization.aspx](http://codebetter.com/blogs/patricksmacchia/archive/2009/04/19/do-we-need-micro-optimization.aspx) I wrote also concerning optimization on for loop: [codebetter.com/.../...e-net-code-performances.aspx](http://codebetter.com/blogs/patricksmacchia/archive/2008/11/19/an-easy-and-efficient-way-to-improve-net-code-performances.aspx) Btw, you can still have a win on the first example by creating a temporary reference to byteBuffer.Buffer (if it is a reference type): for (int i = 0; i < strLen; i++) { if(byteString.bytes[i] != ********byteBuffer.Buffer*********[byteBuffer.Offset+i]) return false; }

Greg Beech commented on When mini benchmarks are important

Wed, 30 Dec 2009 11:32:26 GMT

Are you sure the first problem with get_Length/get_Item isn't caused by running the code under the profiler (a Heisenbug, in effect)? In optimised code running normally I'd expect to see the property accessor methods inlined by the CLR. Often profilers disable inlining to get more 'accurate' results which can lead to false positives like this when the method call overhead becomes significant; you may want to check that your profiler settings and ensure that it was not disabling method inlining. In addition, your GetHashCode function doesn't appear to provide a true hash. As you're simply left-shifting all the previous data by 8, you're actually just discarding all but the last four bytes of data. I'd consider replacing the aggregation term with a multiply/xor combination, e.g. "(ret * 23) ^ Buffer[i]", which will give you a true hash.

Andrew commented on When mini benchmarks are important

Wed, 30 Dec 2009 10:09:40 GMT

(Nitpick: Equals is spelled wrong in StringEqualsToBuffer) With ResetHash, are you recalculating the hash immediately, or simply clearing the hash value, allowing it to be calculated again when needed, lazily? These posts are interesting, thanks for making them!