PR ReviewCode has cost, justify it
There is a reason why people talk about idiomatic code. Code that is idiomatic to the language matches what it expect and it generally faster / easier to work with for both developers and the compiler / runtime.
During a PR review, I run into this code:
The idiomatic manner for writing this code would have been any of:
- “@id” == property
- Constants.Documents.Metadata.Id == property
- property.Equals(Constants.Documents.Metadata.Id)
- Constants.Documents.Metadata.Id.Equals(property)
I can argue that the second option is the most idiomatic, and that the 3rd option can fail with Null Reference Exception if the property is null, but all of them are pretty clear.
Now, RavenDB has a lot on non idiomatic code, usually when we need to get more performance. For example:
This is code that is doing very much what is done above, but it does this on the raw byte buffer, and it knows that it is accessing UTF8 characters, so we can do some nice optimizations there to compare by just doing two instructions.
Indeed, when queried, the developer answered:
Most of the time its going to be false and comparing ints is cheaper than strings
There are several problems with this. First, this particular piece of code isn’t in a part of the code that is extremely performance sensitive. The string buffer work above is for processing requests from the network, a piece of code that can be called tens and hundreds of thousands of times per second. Performance there matters, a lot. This code is meant to be called as part of streaming results to the user, so it is likely to handle very large volume of data. Performance there matters, for sure, but we need to consider how much it matters.
Second, let us peek into what will actually happen if we drop the property.Length check. The call will end up calling to the native string routines in the CLR, and the relevant portion is:
In other words, this check is already going to happen, we didn’t really save anything from making it.
Third, and the most subtle of them all. This check is using a check against a constant, whose value is “@id”. It also check that the property .Length is equal to 3. The whole point of using a constant is that we need to replace it in just one location. But in this case, we will likely change the constant value, not realize that there is a hardcoded length elsewhere in the code and fail miserably with hard to explain behavior.
More posts in "PR Review" series:
- (19 Dec 2017) The simple stuff will trip you
- (08 Nov 2017) Encapsulation stops at the assembly boundary
- (25 Oct 2017) It’s the error handling, again
- (23 Oct 2017) Beware the things you can’t see
- (20 Oct 2017) Code has cost, justify it
- (10 Aug 2017) Errors, errors and more errors
- (21 Jul 2017) Is your error handling required?
- (23 Jun 2017) avoid too many parameters
- (21 Jun 2017) the errors should be nurtured
Comments
I hope that sample of non-idiomatic code is the result of a decompilation, because there is no valid reason not to use some constants there instead of magic numbers.
Calling Equals is code smell — equality operator is better recognized than Equals method by both compiler and humans.
Which is another way of saying the most idiomatic way is: property == Constants.Documents.Metadata.Id
Note that changing the order of comparison (constant equals to property) is regression in readability, maintainability with no upside.
The length check saves the call into native code and the other stuff that is before the length check there. Seems like a valid optimization to me in the spirit of your own optimization.
Maybe prop.Length == MyStringConst.Length would be optimized by the compiler or JIT to the same code, not sure. That would eliminate the redundancy.
@tobi there isn't a call into native code:
System.String.Equals (both virtual and strong-typed)
Indeed this complication feels like an optimisation 'in the spirit of' one or another trick. That is why every such contraption should come with a safety check: does it help? is it justified? And by default the answer is NO, don't do it, write simpler naive code.
@tobi What @Oleg said is the correct approach for that. That type of code is super relevant to be measured and there is a very strict guideline on when you can use it. In general the response will be always don't. Because of how the compiler treats constants, the code emitted by the JIT will be:
cmp [reg], 3
So it will be treated like a constant assembler wise. But given all of the preconditions are met, such an optimization is worthless.
Comment preview