Ayende @ Rahien

It's a girl

What is wrong with this code?

Calling the Compare method will (sometimes) crash your entire application in a very rude fashion, why?

static bool Compare(byte[] b1, byte[] b2)
  {
    IntPtr retval = memcmp(b1, b2, new IntPtr(b1.Length));
    return retval == IntPtr.Zero;
  }
 
  [DllImport("msvcrt.dll")]
  static extern IntPtr memcmp(byte[] b1, byte[] b2, IntPtr count);

In fact, there are actually two different problems that I can see here. The easy one would consistently crash, the hard one would usually pass, but sometime crash you in an apparently random ways.

Tags:

Posted By: Ayende Rahien

Published at

Originally posted at

Comments

Max
10/13/2011 10:16 AM by
Max

Well, the memory for the byte arrays is not pinned, so if the GC moves the byte arrays during a garbage collection you will get some really strange results, which could include a segfault.

Furthermore, even if the GC doesn't move the arrays, you may get a segfault if the length of b2 is less than that of b1. If the length of b2 is less than that of b1 then you may just get a wrong result.

Lastly, I'm not sure if the representation of a byte[] is guaranteed to be the same as char* but I guess it is not - in particular it must include at least the length. So the check may return a bogus result even if the GC doesn't move things AND the byte arrays are the same length.

Damien
10/13/2011 10:18 AM by
Damien

Well, simple one is the assumption that b2 is at least as long as b1 (otherwise, you're wandering into other memory, possibly crossing a page boundary into memory that isn't there)

Greg B
10/13/2011 10:25 AM by
Greg B

Can anyone give an example of when this might be used in production .NET code.

Thanks

Damien
10/13/2011 10:26 AM by
Damien

@Max - objects passed to P/Invoke don't need to be pinned, unless the unmanaged code is going to be keeping a reference - the GC won't move them during the P/Invoke call.

Mix
10/13/2011 10:38 AM by
Mix

Null reference exception - if b1 is null b1.Length will boom. If b2 is null well, I'm not sure, but it will probably crash. Returning IntPtr makes no sense whatsoever - it should be plain int - I'm not sure now if this will return 0 as IntPtr.Zero (probably will)

There's actually usage example on pinvoke.net: http://www.pinvoke.net/default.aspx/msvcrt.memcmp

Frank Quednau
10/13/2011 11:11 AM by
Frank Quednau

Could be a problem that IntPtr don't have the same size on a 32 or 64-bit process. Since a.NET dll may be compiled to run in processes of any address size this could potentially drive you silly when you e.g. coolly run shit on a big server and do tests with MSTest.

When I look at the memcmp docs, it actually wants pointers? I am wondering if that is not a source of mishap...

Sergey M
10/13/2011 11:52 AM by
Sergey M

The obvious problem is that b2 can contain lesser bytes than b1. And not so obvious is that using IntPtr instead of UintPtr may produce stack imbalance on x86 platforms.

tobi
10/13/2011 01:45 PM by
tobi

memcmp returns an int, not an IntPtr. Will probably cause some stack corruption. The GC is not a problem though.

Steven Proctor
10/13/2011 02:14 PM by
Steven Proctor

I am seeing the failure due to the byte stream potentially being longer than what an int can hold.

configurator
10/13/2011 02:54 PM by
configurator
  1. Your declaration of memcmp doesn't match the one in pinvoke.net, which is usually correct It has the wrong calling convention, parameter type (although it is the same size so should be safe), and return type (IntPtr instead of int)

[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)] static extern int memcmp(byte[] b1, byte[] b2, UIntPtr count);

You are likely to get a PInvoke stack imbalance in one of x86 or x64.

  1. b2.Length could be more than b1.Length; if b2 is equal to the first b2.Length bytes of b1, memcmp would continue after the end of the array.

  2. Both b1 and b2 could be null; b1 would cause a NullReferenceException, b2 would cause a segfault.

  3. If the b1 is over 2GB in size, b1.Length would break. Use LongLength instead; it is safe to cast it to an IntPtr because an array couldn't be larger than int.MaxValue on a 32-bit system.

Chris Smith
10/14/2011 07:27 AM by
Chris Smith

Can anyone give an example of when this might be used in production .NET code.

That's exactly my problem. This seems an overly hypothetical question.

Ayende Rahien
10/14/2011 07:29 AM by
Ayende Rahien

Chris. For example, when I need to compare data that comes from unmanaged code. I have an index that output data that I need to sort based on their memcmp values

Chris Smith
10/14/2011 06:10 PM by
Chris Smith

That's a valid case then. Thanks.

Can't you just do it in managed code then you don't have to worry about pinning, pinvoke and all that? It's probably not much slower (I can't claim to have tried it or worked out what the CLR assembles it to though).

bool memcmp(byte[] a, byte[]b, int len) { for (int i = 0; i >= 0; i++) if (a[i] != b[i]) return false; return true; }

That's rougly what a platform independent C implementation does albeit with less pointer magic and minus some optimisations (that glibc does at least) which do it in aligned 4/8 byte blocks for CPU efficiency.

Sasha Goldshtein
10/17/2011 08:47 AM by
Sasha Goldshtein

@Chris Smith: I suspect a naive loop comparing a single byte at a time be much slower than memcmp. One of the reasons is that memory operations on bytes take the same time as memory operations on a whole memory word (and depending on cache considerations, even more than that). Besides, memcmp is probably optimized to take advantage of SSE/SSE2 instructions.

Comments have been closed on this topic.