Ayende @ Rahien

It's a girl

Unprofessional code, take II–Answer

Yesterday, I called this code unprofessional, and it is.

The reason that I call this code unprofessional is that it is a black box. Oh, sure, this is a great code in terms of readability and understand what is going on. But it is also a very complex piece of code, with a lot of things that are going on and it is in a critical location.

This sort of code cannot be a black box. Case in point, take a look at this email thread. We have a customer who is trying to understand why dynamic indexes are being created, when he has the proper indexes to answer query.

And I had to ask a lot of questions to figure out what the problem was. Turned out, it is by design issue, but we could fix that Smile.

That is not a good place to be. I want my customers to be able to figure those things on their own.

That is why the code now looks like this:

image

And you can ask RavenDB to explain how he got to a specific decision.

For example, when we want to figure out why a specific index was selected (or, more likely, was not selected), we can issue the following request:

http://localhost:8080/databases/HibernatingRhinos.Orders/indexes/dynamic/Orders?query=OrderNumber:1293182&explain=true

Which will give us the following output:

[
   {
      "Index":"Auto/Orders/ByContractIdAndPayments_PaymentIdentifier",
      "Reason":"Can't choose an index with a different number of from clauses / SelectMany, will affect queries like Count()."
   },
{ "Index":"Freebies/Search", "Reason":"Index does not apply to entity name: Orders" }, { "Index":"Orders/Search", "Reason":"The following fields are missing: " }, { "Index":"Orders/Stats", "Reason":"Can't choose a map/reduce index for dynamic queries." }, { "Index":"Products/Stats", "Reason":"Can't choose a map/reduce index for dynamic queries." }, { "Index":"Raven/DocumentsByEntityName", "Reason":"Query is specific for entity name, but the index searches across all of them, may result in a different type being returned." }, { "Index":"Test", "Reason":"Index does not apply to entity name: Orders" }, { "Index":"Trials/Search", "Reason":"Index does not apply to entity name: Orders" }, { "Index":"Auto/Orders/ByOrderNumber", "Reason":"Selected as best match" } ]

As you can see, that is a pretty good way to debug things, in production, without any complex steps or dedicated tools.

And, like in the previous example, reducing support costs is important.

Comments

Hans
07/16/2012 01:37 PM by
Hans

Ayende, although the solution is nice indeed as it can really help the user I'm having a bit of a problem with the following statement:

Oh, sure, this is a great code in terms of readability and understand what is going on.

While I think it's certainly not bad, I don't think you can say such a thing based on your own code in an article like this. Of course your own code looks very readable to yourself that is not the point of making something readable, the real challenge is to make something readable to someone new to the system.

As it goes on for 120 lines, why not extract submethods with really clear names. This can really help if you're not familiar with the system and can do a lot of the inline documenting you have inside the code now.

No, I'm not saying it is hard to read but be open to other people's opinions before making a statement about your own code!

Pete
07/16/2012 03:27 PM by
Pete

I've seen these kind of extensive "if..return" blocks in many pieces of code that I've worked with and I've always thought that the recommended way to do it was as a list of strategies that need to be applied over the list of inputs.

Each strategy encapsulate the individual IF and return (and therefore the typeof(Strategy).Name can translated to a particular error string). Each strategy is easier to test as a unit and the other code is also easier to test as a unit.

Am I overcomplicating things?

Slava
07/16/2012 03:41 PM by
Slava

If this is not a high level glue code binding some workflow logic to underlying infrastructure parts, then i kinda lost :) Im sure there are more to an eye here, but in general when I see big IFs my reasoning goes as follow:

Big IF statement stinks of coupling and total lack of composability. Not verifiable in any way except tests, and even then it depends not on the static types but on equality of variables, error here would be runtime. (goes with big try/catch usually) Not documentable, need to write down whole logic in some way simmilar to the comments in the code you odne above (types and composition on other hand allows you to see the flow of data and state of execution)

Well, at the end if you feel comfortable with maintaining it in future and customers are happy - thats all what matters.

Trystan
07/16/2012 07:49 PM by
Trystan

I think it's interesting that your definition of professional is not about SOLID code, infrastructure, or any other technical issues. Professional means that you, or the support staff, can easily see what the system is doing in production and why.

Ariel
07/17/2012 03:27 AM by
Ariel

Well, this is interesting. I think the key here is what the code does and how it could help you to troubleshoot. Having an strategy sounds good, but having the strategy without the explanation has the same issue. Just my two cents.

Martin D
07/17/2012 05:03 AM by
Martin D

If anyone is interested you can find the full code of that expression here: http://pastebin.com/naXmKFv5

In my opinion this code is unprofessional because it's extremely complex - I haven't done the numbers but the cyclomatic complexity of this piece of code would be way over what many would consider acceptable in a production system. I highly doubt it has been thoroughly tested.

Brad
07/17/2012 05:12 AM by
Brad

Which part of that code fits this sentence? "Oh, sure, this is a great code in terms of readability and understand what is going on."

Ayende Rahien
07/17/2012 08:39 AM by
Ayende Rahien

Hans, A lot of the comments in the previous post also talked about this, but they are missing something quite important. We need to be able to see this. Breaking it apart to multiple methods is going to make it harder to see the global picture and will scatter logic in many methods. Sure, you can use naming to help, but in here, it is much easier to scan and see the entire behavior.

Ayende Rahien
07/17/2012 08:41 AM by
Ayende Rahien

Pete, That really depend. In this case, this is perf critical code path (it runs, sometimes multiple times) for every query that we do. Abstracting this could cause this to run slower, which is not acceptable.

Paul
07/17/2012 01:40 PM by
Paul

This code is more useful, but smell remains or is even bigger. Calling explain (which obviously has side effects) inside "where", points out that this piece of code is really not a query, but standard imperative logic. Morever if the code is performance critical, as you say, then switching LINQ to for loop will buy you a little more speed.

Slava
07/17/2012 04:29 PM by
Slava

Paul, IMO If code performance would be critical you would use async reactive chain to react on each index queried. Right now this code is similar to a lock (won’t return until all indexes queried), hopefully it’s not global lock.

Claim that putting all query logic into one method makes its more readable is biased on assumptions such as: user to know context of execution, knowing what all variables and their values can be and so on.

I highly appreciate Ayende blog posts, I think misunderstanding here steams from Ayende tending to write his code as an API for a customer to understand inner workings of the system and to debug it in live mode, and emphasis of "professional" goes on this points.

Ayende Rahien
07/18/2012 06:10 AM by
Ayende Rahien

Slava, This is a purely sync function, we need to value before we can do anything else. This is highly internal piece of code, if you got there, you have to be aware of the context of execution.

And there is a difference between external API / visible code and internal implementation details. For client visible API, there are different considerations than internal workings. The first is all about ease of use and discovery. The second is all about making sure that it works right, works fast and doesn't cause us issues in production.

Comments have been closed on this topic.