What is wrong with this code?
Calling the Compare method will (sometimes) crash your entire application in a very rude fashion, why?
static bool Compare(byte[] b1, byte[] b2) { IntPtr retval = memcmp(b1, b2, new IntPtr(b1.Length)); return retval == IntPtr.Zero; } [DllImport("msvcrt.dll")] static extern IntPtr memcmp(byte[] b1, byte[] b2, IntPtr count);
In fact, there are actually two different problems that I can see here. The easy one would consistently crash, the hard one would usually pass, but sometime crash you in an apparently random ways.
Comments
Well, the memory for the byte arrays is not pinned, so if the GC moves the byte arrays during a garbage collection you will get some really strange results, which could include a segfault.
Furthermore, even if the GC doesn't move the arrays, you may get a segfault if the length of b2 is less than that of b1. If the length of b2 is less than that of b1 then you may just get a wrong result.
Lastly, I'm not sure if the representation of a byte[] is guaranteed to be the same as char* but I guess it is not - in particular it must include at least the length. So the check may return a bogus result even if the GC doesn't move things AND the byte arrays are the same length.
Well, simple one is the assumption that b2 is at least as long as b1 (otherwise, you're wandering into other memory, possibly crossing a page boundary into memory that isn't there)
Can anyone give an example of when this might be used in production .NET code.
Thanks
@Max - objects passed to P/Invoke don't need to be pinned, unless the unmanaged code is going to be keeping a reference - the GC won't move them during the P/Invoke call.
Null reference exception - if b1 is null b1.Length will boom. If b2 is null well, I'm not sure, but it will probably crash. Returning IntPtr makes no sense whatsoever - it should be plain int - I'm not sure now if this will return 0 as IntPtr.Zero (probably will)
There's actually usage example on pinvoke.net: http://www.pinvoke.net/default.aspx/msvcrt.memcmp
Could be a problem that IntPtr don't have the same size on a 32 or 64-bit process. Since a.NET dll may be compiled to run in processes of any address size this could potentially drive you silly when you e.g. coolly run shit on a big server and do tests with MSTest.
When I look at the memcmp docs, it actually wants pointers? I am wondering if that is not a source of mishap...
The obvious problem is that b2 can contain lesser bytes than b1. And not so obvious is that using IntPtr instead of UintPtr may produce stack imbalance on x86 platforms.
memcmp returns an int, not an IntPtr. Will probably cause some stack corruption. The GC is not a problem though.
I am seeing the failure due to the byte stream potentially being longer than what an int can hold.
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)] static extern int memcmp(byte[] b1, byte[] b2, UIntPtr count);
You are likely to get a PInvoke stack imbalance in one of x86 or x64.
b2.Length could be more than b1.Length; if b2 is equal to the first b2.Length bytes of b1, memcmp would continue after the end of the array.
Both b1 and b2 could be null; b1 would cause a NullReferenceException, b2 would cause a segfault.
If the b1 is over 2GB in size, b1.Length would break. Use LongLength instead; it is safe to cast it to an IntPtr because an array couldn't be larger than int.MaxValue on a 32-bit system.
That's exactly my problem. This seems an overly hypothetical question.
Chris. For example, when I need to compare data that comes from unmanaged code. I have an index that output data that I need to sort based on their memcmp values
That's a valid case then. Thanks.
Can't you just do it in managed code then you don't have to worry about pinning, pinvoke and all that? It's probably not much slower (I can't claim to have tried it or worked out what the CLR assembles it to though).
bool memcmp(byte[] a, byte[]b, int len) { for (int i = 0; i >= 0; i++) if (a[i] != b[i]) return false; return true; }
That's rougly what a platform independent C implementation does albeit with less pointer magic and minus some optimisations (that glibc does at least) which do it in aligned 4/8 byte blocks for CPU efficiency.
@Chris Smith: I suspect a naive loop comparing a single byte at a time be much slower than memcmp. One of the reasons is that memory operations on bytes take the same time as memory operations on a whole memory word (and depending on cache considerations, even more than that). Besides, memcmp is probably optimized to take advantage of SSE/SSE2 instructions.
Comment preview