Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,949 | Comments: 44,548

filter by tags archive

What is wrong with this code?


Calling the Compare method will (sometimes) crash your entire application in a very rude fashion, why?

static bool Compare(byte[] b1, byte[] b2)
  {
    IntPtr retval = memcmp(b1, b2, new IntPtr(b1.Length));
    return retval == IntPtr.Zero;
  }
 
  [DllImport("msvcrt.dll")]
  static extern IntPtr memcmp(byte[] b1, byte[] b2, IntPtr count);

In fact, there are actually two different problems that I can see here. The easy one would consistently crash, the hard one would usually pass, but sometime crash you in an apparently random ways.


Comments

Max
Max

Well, the memory for the byte arrays is not pinned, so if the GC moves the byte arrays during a garbage collection you will get some really strange results, which could include a segfault.

Furthermore, even if the GC doesn't move the arrays, you may get a segfault if the length of b2 is less than that of b1. If the length of b2 is less than that of b1 then you may just get a wrong result.

Lastly, I'm not sure if the representation of a byte[] is guaranteed to be the same as char* but I guess it is not - in particular it must include at least the length. So the check may return a bogus result even if the GC doesn't move things AND the byte arrays are the same length.

Damien

Well, simple one is the assumption that b2 is at least as long as b1 (otherwise, you're wandering into other memory, possibly crossing a page boundary into memory that isn't there)

Greg B

Can anyone give an example of when this might be used in production .NET code.

Thanks

Damien

@Max - objects passed to P/Invoke don't need to be pinned, unless the unmanaged code is going to be keeping a reference - the GC won't move them during the P/Invoke call.

Mix
Mix

Null reference exception - if b1 is null b1.Length will boom. If b2 is null well, I'm not sure, but it will probably crash. Returning IntPtr makes no sense whatsoever - it should be plain int - I'm not sure now if this will return 0 as IntPtr.Zero (probably will)

There's actually usage example on pinvoke.net: http://www.pinvoke.net/default.aspx/msvcrt.memcmp

Frank Quednau

Could be a problem that IntPtr don't have the same size on a 32 or 64-bit process. Since a.NET dll may be compiled to run in processes of any address size this could potentially drive you silly when you e.g. coolly run shit on a big server and do tests with MSTest.

When I look at the memcmp docs, it actually wants pointers? I am wondering if that is not a source of mishap...

Sergey M

The obvious problem is that b2 can contain lesser bytes than b1. And not so obvious is that using IntPtr instead of UintPtr may produce stack imbalance on x86 platforms.

tobi

memcmp returns an int, not an IntPtr. Will probably cause some stack corruption. The GC is not a problem though.

Steven Proctor

I am seeing the failure due to the byte stream potentially being longer than what an int can hold.

configurator
  1. Your declaration of memcmp doesn't match the one in pinvoke.net, which is usually correct It has the wrong calling convention, parameter type (although it is the same size so should be safe), and return type (IntPtr instead of int)

[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)] static extern int memcmp(byte[] b1, byte[] b2, UIntPtr count);

You are likely to get a PInvoke stack imbalance in one of x86 or x64.

  1. b2.Length could be more than b1.Length; if b2 is equal to the first b2.Length bytes of b1, memcmp would continue after the end of the array.

  2. Both b1 and b2 could be null; b1 would cause a NullReferenceException, b2 would cause a segfault.

  3. If the b1 is over 2GB in size, b1.Length would break. Use LongLength instead; it is safe to cast it to an IntPtr because an array couldn't be larger than int.MaxValue on a 32-bit system.

Chris Smith

Can anyone give an example of when this might be used in production .NET code.

That's exactly my problem. This seems an overly hypothetical question.

Ayende Rahien

Chris. For example, when I need to compare data that comes from unmanaged code. I have an index that output data that I need to sort based on their memcmp values

Chris Smith

That's a valid case then. Thanks.

Can't you just do it in managed code then you don't have to worry about pinning, pinvoke and all that? It's probably not much slower (I can't claim to have tried it or worked out what the CLR assembles it to though).

bool memcmp(byte[] a, byte[]b, int len) { for (int i = 0; i >= 0; i++) if (a[i] != b[i]) return false; return true; }

That's rougly what a platform independent C implementation does albeit with less pointer magic and minus some optimisations (that glibc does at least) which do it in aligned 4/8 byte blocks for CPU efficiency.

Sasha Goldshtein

@Chris Smith: I suspect a naive loop comparing a single byte at a time be much slower than memcmp. One of the reasons is that memory operations on bytes take the same time as memory operations on a whole memory word (and depending on cache considerations, even more than that). Besides, memcmp is probably optimized to take advantage of SSE/SSE2 instructions.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats