Challenge: Can you spot the bug?
I have this piece of code:
public static string FirstCharacters(this string self, int numOfChars) { if (self == null) return ""; if (self.Length < numOfChars) return self; return self .Replace(Environment.NewLine, " ") .Substring(0, numOfChars - 3) + "..."; }
And the following exception was sent to me:
System.ArgumentOutOfRangeException: Index and length must refer to a location within the string. Parameter name: length at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy) at StringExtensions.FirstCharacters(String self, Int32 numOfChars)
It actually took me a while to figure out what is going on. Partly that is because we have an “impossible” stack trace (the JIT seems to have inlined the Substring call). When I did figure what the bug was, i is a head slapping moment.
Comments
Environment.NewLine is a two-character string so Replace shortens the original...
It checks if the length of the string is the minimum length necessary to return the version ending in "...". However, after that you replace the newlines with spaces, which means the string you work on will be shorter (depending on the platform possibly). So, kaboom.
Correct?
I wonder if the bug would have happened in Mono (on Unix).
There's another simple one in there too, yes? Your "numOfChars - 3" isn't checking if numOfChars is greater than 3, which it must be or a diff. exception is thrown. The error you got would not indicate that this was the bug, but the previous two got it, I think, so I thought I'd point out this extra one.
Why do you replace newlines when the string is cut, but not when it's whithin max characters? isn't that an unexpected side effect?
ie.
max. characters is 144,
if I write 5 lines with -144 chars, newlines are remained.
if i write 5 lines with +144 chars all newlines are replaced.
huge impact when showed on screen.
what Rafal says.
Something like myString.ToSingleLine().FirstCharacters(100) would probably be more explicit and less error prone.
I know it... i get it everytime :D
you didn't check if the actual string length is greater than (or egual to) 3
substring can't manage a negative range (0 to -1/-2/-3, in that example)
in the length check, you have to put too another expression, if lenght is greater than 3 (and maybe use a constant for this number)
" ". FirstCharacters(1);
Roberto, SubString accepts 0 as its range; it simply takes no characters.
the solutions mentioned so far were pretty easy to spot so i guess it was more complex than that.
a) when there are many new lines the second parameter is larger than the length of the string.
b) when num of chars was less than 3 the length was negative
here is a proof that no other explanation can exist. from the error message we know that the substring call is the culprit. as the first argument is an always valid constant the second argument must be the reason. it can either be < 0 or > string.length. so a) and b) are exhaustive.
Although most bugs were already found:
Replacing newline after checking string length
The offset of 3 for the ellipsis isn't taken into account when checking the length
I think I've got another one. Nothing prevents the method to be called with a negative int, isn't it? This would also blow up, I guess.
Safer to use an unsigned int for the number of characters?
On a side note, I also don't like some things about the implementation in general.
For instance that a null reference returns an empty string. I'd prefer to see it return null instead.
Also, if the length == numOfChars, why don't you just return the string? It's not too long in this case.
@Roberto: I think it's not that much the length of the string that should be greater than 3. Rather than the length of the ellipsis that should depend on numOfChars. If someone requests numOfChars == 1, you cannot add an ellipsis of 3 (and take it in account for your substring). The requested numOfChars should be > 3 for this implementation to work.
I think Roberto is right for negative substring ...
I wrote a test program and to my surprise the following code
asdf
returns " ..."
don't know why
This is exactly the sort of code which is easy to cover with unit tests. The fact it obviously isn't is a big WTF.
Rik,
Why do you assume that it isn't covered in tests?
Because it's very easy to make it fail. Or are the tests useless?
Rik,
I am watching the thread going on and growing more & more amused by the second.
People are trying to make this into a very generic routine, when in fact it is used only in very specific circumstances.
The tests are there for those circumstances, not for generic behavior.
In which case the code isn't covered by unit tests... which is the WTF.
I am not quite sure how to respond to that, but I think I'll refer you to my recent posts about unit testing.
Substring(0, numOfChars - 3) will fail for numOfChars=0,1,,2 hence FirstCharacters(0), FirstCharacters(1),FirstCharacters(2) will give you exception , provided your original string is long enough to bypass first two if conditions
@ Ayende,
Well, in all honesty it should be treated as a very generic routine. You are adding it as an extension method to String so it's as public and available as it could possibly be.
You've tripped one bug, but as mentioned by Roberto, there are at least 2 issues in there. (newline replacement reduces the length by 1 for each replacement so it will crash after 4 or more CRLFs, and numOfChars cannot be less than 3.)
For your specific error, this will occur when numOfChars-3 > self.Length after the replacement of the NewLine with " ".
This requires that there be 4+ NewLines in the string (and using multibyte NewLine), and numOfChars is between string.Length and (string.Length - numOfNewLine) + 4
I didn't take time to read the comments, but here are my two cents:
It's preety easy to Spot: You don't check numOfChars > 3
Despite the Length & Offset issues, i find something a lot more troublesome:
The fact that a method "FirstCharacters" removes spaces and that it appends "..." in some situations is something that i would consider a very naste side-effect of a method that returns the first characters of a string.
The string length could be less than 3 charachters
Well I had Pex running over the code and it has found the ArgumentOutOfRangeException 2 times. Once for numOfChars 0 and once for numOfChars as a negative number.
Did I cheat, or did I find a use for Pex? ;-)
The second parameter of SubString() would be negative if numOfChars == 1 or 2...
You are missing another pre-condition:
If(numOfChars <0) { throw ArgumentException(); } or
If(numOfChars <0) { return "" }
depending on desired solution...
You check for a shorter string length than numOfChars but not for one with the same length (which you could immediately return without change). Another one is that numOfChars could be smaller than 3, which would also cause the same exception as the length parameter for the substring would negative.
if (self == null)
if(self.Length == 0)
the first problem I would find with the method is that it doesn't describe what it does in it's name.
@ Ayende,
So, are you saying the possible negative parameters for SubString is not the problem since this routine is not designed for that purpose anyway? If so, I am out of clue why it throws that error. Can you shed some light?
Yes, the bug is that on Win Environment.NewLine is 2-bytes, and you replace it with one-byte space, so, for instance, this code (5 line breaks in it):
@"
".FirstCharacters(10);
gives you an exception you've listed above.
M?
Get creative people! I say Ayende needs to replace what he has with code that calls Regex.Replace successively, each time building on the last Regex.Replace. Incremental design at its best!
So what's the answer I'm curious now!?
A string of size 256 with a "\r\n" somewhere in it