What is wrong with this code?

time to read 1 min | 168 words

Calling the Compare method will (sometimes) crash your entire application in a very rude fashion, why?

static bool Compare(byte[] b1, byte[] b2)
  {
    IntPtr retval = memcmp(b1, b2, new IntPtr(b1.Length));
    return retval == IntPtr.Zero;
  }
 
  [DllImport("msvcrt.dll")]
  static extern IntPtr memcmp(byte[] b1, byte[] b2, IntPtr count);

In fact, there are actually two different problems that I can see here. The easy one would consistently crash, the hard one would usually pass, but sometime crash you in an apparently random ways.

Tweet Share Share 14 comments

Tags:

Bugs

Comments

13 Oct 2011
10:16 AM

Max

Well, the memory for the byte arrays is not pinned, so if the GC moves the byte arrays during a garbage collection you will get some really strange results, which could include a segfault.

Furthermore, even if the GC doesn't move the arrays, you may get a segfault if the length of b2 is less than that of b1. If the length of b2 is less than that of b1 then you may just get a wrong result.

Lastly, I'm not sure if the representation of a byte[] is guaranteed to be the same as char* but I guess it is not - in particular it must include at least the length. So the check may return a bogus result even if the GC doesn't move things AND the byte arrays are the same length.

13 Oct 2011
10:18 AM

Damien

Well, simple one is the assumption that b2 is at least as long as b1 (otherwise, you're wandering into other memory, possibly crossing a page boundary into memory that isn't there)

13 Oct 2011
10:25 AM

Greg B

Can anyone give an example of when this might be used in production .NET code.

Thanks

13 Oct 2011
10:26 AM

Damien

@Max - objects passed to P/Invoke don't need to be pinned, unless the unmanaged code is going to be keeping a reference - the GC won't move them during the P/Invoke call.

13 Oct 2011
10:38 AM

Mix

Null reference exception - if b1 is null b1.Length will boom. If b2 is null well, I'm not sure, but it will probably crash. Returning IntPtr makes no sense whatsoever - it should be plain int - I'm not sure now if this will return 0 as IntPtr.Zero (probably will)

There's actually usage example on pinvoke.net: http://www.pinvoke.net/default.aspx/msvcrt.memcmp

13 Oct 2011
11:11 AM

Frank Quednau

Could be a problem that IntPtr don't have the same size on a 32 or 64-bit process. Since a.NET dll may be compiled to run in processes of any address size this could potentially drive you silly when you e.g. coolly run shit on a big server and do tests with MSTest.

When I look at the memcmp docs, it actually wants pointers? I am wondering if that is not a source of mishap...

13 Oct 2011
11:52 AM

Sergey M

The obvious problem is that b2 can contain lesser bytes than b1. And not so obvious is that using IntPtr instead of UintPtr may produce stack imbalance on x86 platforms.

13 Oct 2011
13:45 PM

tobi

memcmp returns an int, not an IntPtr. Will probably cause some stack corruption. The GC is not a problem though.

13 Oct 2011
14:14 PM

Steven Proctor

I am seeing the failure due to the byte stream potentially being longer than what an int can hold.

13 Oct 2011
14:54 PM

configurator

Your declaration of memcmp doesn't match the one in pinvoke.net, which is usually correct It has the wrong calling convention, parameter type (although it is the same size so should be safe), and return type (IntPtr instead of int)

[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)] static extern int memcmp(byte[] b1, byte[] b2, UIntPtr count);

You are likely to get a PInvoke stack imbalance in one of x86 or x64.

b2.Length could be more than b1.Length; if b2 is equal to the first b2.Length bytes of b1, memcmp would continue after the end of the array.
Both b1 and b2 could be null; b1 would cause a NullReferenceException, b2 would cause a segfault.
If the b1 is over 2GB in size, b1.Length would break. Use LongLength instead; it is safe to cast it to an IntPtr because an array couldn't be larger than int.MaxValue on a 32-bit system.

14 Oct 2011
07:27 AM

Chris Smith

Can anyone give an example of when this might be used in production .NET code.

That's exactly my problem. This seems an overly hypothetical question.

14 Oct 2011
07:29 AM

Ayende Rahien

Chris. For example, when I need to compare data that comes from unmanaged code. I have an index that output data that I need to sort based on their memcmp values

14 Oct 2011
18:10 PM

Chris Smith

That's a valid case then. Thanks.

Can't you just do it in managed code then you don't have to worry about pinning, pinvoke and all that? It's probably not much slower (I can't claim to have tried it or worked out what the CLR assembles it to though).

bool memcmp(byte[] a, byte[]b, int len) { for (int i = 0; i >= 0; i++) if (a[i] != b[i]) return false; return true; }

That's rougly what a platform independent C implementation does albeit with less pointer magic and minus some optimisations (that glibc does at least) which do it in aligned 4/8 byte blocks for CPU efficiency.

17 Oct 2011
08:47 AM

Sasha Goldshtein

@Chris Smith: I suspect a naive loop comparing a single byte at a time be much slower than memcmp. One of the reasons is that memory operations on bytes take the same time as memory operations on a whole memory word (and depending on cache considerations, even more than that). Besides, memcmp is probably optimized to take advantage of SSE/SSE2 instructions.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB