Ayende @ Rahien

It's a girl

Challenge: Can you spot the bug?

I have this piece of code:

public static string FirstCharacters(this string self, int numOfChars)
{
    if (self == null)
        return "";
    if (self.Length < numOfChars)
        return self;
    return self
        .Replace(Environment.NewLine, " ")
        .Substring(0, numOfChars - 3) + "...";
}

And the following exception was sent to me:

System.ArgumentOutOfRangeException: Index and length must refer to a location within the string.
Parameter name: length
  at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
  at StringExtensions.FirstCharacters(String self, Int32 numOfChars)

It actually took me a while to figure out what is going on. Partly that is because we have an “impossible” stack trace (the JIT seems to have inlined the Substring call). When I did figure what the bug was, i is a head slapping moment.

Comments

Rafal
10/21/2009 06:18 PM by
Rafal

Environment.NewLine is a two-character string so Replace shortens the original...

Frank
10/21/2009 06:25 PM by
Frank

It checks if the length of the string is the minimum length necessary to return the version ending in "...". However, after that you replace the newlines with spaces, which means the string you work on will be shorter (depending on the platform possibly). So, kaboom.

Correct?

Dmitry
10/21/2009 06:29 PM by
Dmitry

I wonder if the bug would have happened in Mono (on Unix).

Kyle Szklenski
10/21/2009 06:34 PM by
Kyle Szklenski

There's another simple one in there too, yes? Your "numOfChars - 3" isn't checking if numOfChars is greater than 3, which it must be or a diff. exception is thrown. The error you got would not indicate that this was the bug, but the previous two got it, I think, so I thought I'd point out this extra one.

Remco
10/21/2009 07:27 PM by
Remco

Why do you replace newlines when the string is cut, but not when it's whithin max characters? isn't that an unexpected side effect?

ie.

max. characters is 144,

if I write 5 lines with -144 chars, newlines are remained.

if i write 5 lines with +144 chars all newlines are replaced.

huge impact when showed on screen.

chrissie1
10/21/2009 07:35 PM by
chrissie1

what Rafal says.

David Thibault
10/21/2009 07:46 PM by
David Thibault

Something like myString.ToSingleLine().FirstCharacters(100) would probably be more explicit and less error prone.

Roberto
10/21/2009 07:47 PM by
Roberto

I know it... i get it everytime :D

you didn't check if the actual string length is greater than (or egual to) 3

substring can't manage a negative range (0 to -1/-2/-3, in that example)

in the length check, you have to put too another expression, if lenght is greater than 3 (and maybe use a constant for this number)

devdimi
10/21/2009 07:49 PM by
devdimi

" ". FirstCharacters(1);

Kyle Szklenski
10/21/2009 08:29 PM by
Kyle Szklenski

Roberto, SubString accepts 0 as its range; it simply takes no characters.

tobi
10/21/2009 08:39 PM by
tobi

the solutions mentioned so far were pretty easy to spot so i guess it was more complex than that.

a) when there are many new lines the second parameter is larger than the length of the string.

b) when num of chars was less than 3 the length was negative

here is a proof that no other explanation can exist. from the error message we know that the substring call is the culprit. as the first argument is an always valid constant the second argument must be the reason. it can either be < 0 or > string.length. so a) and b) are exhaustive.

Stijn Guillemyn
10/21/2009 08:41 PM by
Stijn Guillemyn

Although most bugs were already found:

  • Replacing newline after checking string length

  • The offset of 3 for the ellipsis isn't taken into account when checking the length

I think I've got another one. Nothing prevents the method to be called with a negative int, isn't it? This would also blow up, I guess.

Safer to use an unsigned int for the number of characters?

Stijn Guillemyn
10/21/2009 08:58 PM by
Stijn Guillemyn

On a side note, I also don't like some things about the implementation in general.

For instance that a null reference returns an empty string. I'd prefer to see it return null instead.

Also, if the length == numOfChars, why don't you just return the string? It's not too long in this case.

@Roberto: I think it's not that much the length of the string that should be greater than 3. Rather than the length of the ellipsis that should depend on numOfChars. If someone requests numOfChars == 1, you cannot add an ellipsis of 3 (and take it in account for your substring). The requested numOfChars should be > 3 for this implementation to work.

Harry
10/21/2009 09:01 PM by
Harry

I think Roberto is right for negative substring ...

I wrote a test program and to my surprise the following code

        Console.WriteLine(@"

        asdf

asdf

        asdf

        asdf

        asdf

        ".FirstCharacters(10));

returns " ..."

don't know why

Rik Hemsley
10/21/2009 09:01 PM by
Rik Hemsley

This is exactly the sort of code which is easy to cover with unit tests. The fact it obviously isn't is a big WTF.

Ayende Rahien
10/21/2009 09:05 PM by
Ayende Rahien

Rik,

Why do you assume that it isn't covered in tests?

Rik Hemsley
10/21/2009 09:12 PM by
Rik Hemsley

Because it's very easy to make it fail. Or are the tests useless?

Ayende Rahien
10/21/2009 09:16 PM by
Ayende Rahien

Rik,

I am watching the thread going on and growing more & more amused by the second.

People are trying to make this into a very generic routine, when in fact it is used only in very specific circumstances.

The tests are there for those circumstances, not for generic behavior.

Rik Hemsley
10/21/2009 09:18 PM by
Rik Hemsley

In which case the code isn't covered by unit tests... which is the WTF.

Ayende Rahien
10/21/2009 09:20 PM by
Ayende Rahien

I am not quite sure how to respond to that, but I think I'll refer you to my recent posts about unit testing.

Mahendra Mavani
10/21/2009 09:34 PM by
Mahendra Mavani

Substring(0, numOfChars - 3) will fail for numOfChars=0,1,,2 hence FirstCharacters(0), FirstCharacters(1),FirstCharacters(2) will give you exception , provided your original string is long enough to bypass first two if conditions

Steve Py
10/21/2009 09:54 PM by
Steve Py

@ Ayende,

Well, in all honesty it should be treated as a very generic routine. You are adding it as an extension method to String so it's as public and available as it could possibly be.

You've tripped one bug, but as mentioned by Roberto, there are at least 2 issues in there. (newline replacement reduces the length by 1 for each replacement so it will crash after 4 or more CRLFs, and numOfChars cannot be less than 3.)

Leaner
10/22/2009 12:52 AM by
Leaner

For your specific error, this will occur when numOfChars-3 > self.Length after the replacement of the NewLine with " ".

This requires that there be 4+ NewLines in the string (and using multibyte NewLine), and numOfChars is between string.Length and (string.Length - numOfNewLine) + 4

Johannes Rudolph
10/22/2009 05:19 AM by
Johannes Rudolph

I didn't take time to read the comments, but here are my two cents:

It's preety easy to Spot: You don't check numOfChars > 3

Tim Van Wassenhove
10/22/2009 06:05 AM by
Tim Van Wassenhove

Despite the Length & Offset issues, i find something a lot more troublesome:

The fact that a method "FirstCharacters" removes spaces and that it appends "..." in some situations is something that i would consider a very naste side-effect of a method that returns the first characters of a string.

Eyan
10/22/2009 06:30 AM by
Eyan

The string length could be less than 3 charachters

Jochen Jonckheere
10/22/2009 07:17 AM by
Jochen Jonckheere

Well I had Pex running over the code and it has found the ArgumentOutOfRangeException 2 times. Once for numOfChars 0 and once for numOfChars as a negative number.

Did I cheat, or did I find a use for Pex? ;-)

Koen
10/22/2009 08:06 AM by
Koen

The second parameter of SubString() would be negative if numOfChars == 1 or 2...

Billy Stack
10/22/2009 09:26 AM by
Billy Stack

You are missing another pre-condition:

If(numOfChars <0) { throw ArgumentException(); } or

If(numOfChars <0) { return "" }

depending on desired solution...

GarlandGreene
10/22/2009 10:11 AM by
GarlandGreene

You check for a shorter string length than numOfChars but not for one with the same length (which you could immediately return without change). Another one is that numOfChars could be smaller than 3, which would also cause the same exception as the length parameter for the substring would negative.

Jamie
10/22/2009 11:11 AM by
Jamie

if (self == null)

    return "";

if(self.Length == 0)

    return "";
Stephen Lacy
10/22/2009 12:04 PM by
Stephen Lacy

the first problem I would find with the method is that it doesn't describe what it does in it's name.

Harry
10/22/2009 01:45 PM by
Harry

@ Ayende,

So, are you saying the possible negative parameters for SubString is not the problem since this routine is not designed for that purpose anyway? If so, I am out of clue why it throws that error. Can you shed some light?

meo
10/22/2009 02:08 PM by
meo

Yes, the bug is that on Win Environment.NewLine is 2-bytes, and you replace it with one-byte space, so, for instance, this code (5 line breaks in it):

@"

".FirstCharacters(10);

gives you an exception you've listed above.

M?

Peter
10/23/2009 05:29 AM by
Peter

Get creative people! I say Ayende needs to replace what he has with code that calls Regex.Replace successively, each time building on the last Regex.Replace. Incremental design at its best!

Neil Mosafi
11/14/2009 03:48 PM by
Neil Mosafi

So what's the answer I'm curious now!?

Ayende Rahien
11/15/2009 02:43 AM by
Ayende Rahien

A string of size 256 with a "\r\n" somewhere in it

Comments have been closed on this topic.