Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,592
|
Comments: 51,223
Privacy Policy · Terms
filter by tags archive
time to read 8 min | 1418 words

Both Analtoly and Oren Ellenbogen  had commented on my commenting post (pun intended). The example we are using right now is this piece of code:

outputs = new EntitySet<IConnection>(
      delegate(IConnection connection) { Raise(OutputAdded, connection); },
      delegate(IConnection connection) { Raise(OutputRemoved, connection); }
      );

I argue that this piece of code should stand on its own, without any comments to explain what is going here. I think at least part of the issue here is that I am focusing entirely on the comments on the code and not XML Comments. Here is the code for the EntitySet, the class start with a big XML comment that start with general explanation on the class and then goes on to some detail with regard to some detailed implementation that has to do with NHibernate.

To quote the first part of the comments:

/// This class handle strong typing collections that you get from
/// NHibernate.
/// It also handles adding/removing items to the other end (by allowing
/// you to specify delegates to do it).

And the constructor that I'm calling about is defined as:

public EntitySet(Action<T> add, Action<T> remove)

Putting this two together, I'm comportable with not commenting what is going on in the output declaration. It should be obvious by reading the xml comment on EntitySet<T> and seeing the constructor overload I'm using.

With regard to commenting on a library code, Anatoly said:

When you call the library function, the comment what the function does should be in the header of the function. When you change implementation you change the comment. When you call the library function (even in several places) need to say what the line of code does (calling x expecting y) here it may be very light. But still need to comment.

If I need to comment on the callsite for a method, that is a sign for bad design on my code. The possible exceptions is I'm doing something non standard there (like expecting spesific result because the conditions are just right) and if I'm doing it more than once, I need to refactor to remove the comments.

The case above is a classic one, I think. If you don't understand what is going on there, you really don't understand what is going here. It's not a question of not understanding some side affects. In this case it's whatever you know what the library does or not.

Ayende: You seem to say that the code should be optimized for someone completely ignorant about the project, just seating down and understanding what is going on.

Anayoly: Something like this. My point is that clear and self-descriptive code is a must but it does not mean that you don't need comments.

This is the part I disagree with most strongly, I think. Clear code doesn't need call site comments. They are a chore to write when you write the code, they are noise when you read the code. On heavily commented code, my eyes gets into 'ads-skipping' mode, and I literally don't see them more than green blurs on the screen.

I tend to think that a big enough project require some rumping up time before you understand what is going on. Anatoly says that it is possible to have a new guy on a project and starting to contribute useful features in a very short times (minutes to hours).

I've seen accessible projects. You feel the difference right away.

Could you explain what made it accessible? Was it that you were familiar with the technology and the development methods that were used? Was it good documentation? How much time did you spend reading the comments in the code in order to learn what is going on there. I said in the comment that I believe that it is possible only on "common ground" projects, where the architecture is mostly the same.

Moving to Oren's comment about this now. The first suggestion that you had (refactor creating the outputs into a method that would have a comment there) is something that I would generally prefer to do if the code needed explaining. In this case, do you think that the comments on the class and the naming of the parameters are enough?

It seems to me (I'm guilty at this as well) that you fall in love with complex code but you forgot that someone else would have to maintain your code somewhere in the future.

Yes, and no. I sometimes does complex code for the fun of it, but I don't intend to take this code to production. There is some elegance for complex code that works. But if I look at it afterward and need more than a minute to understand what i did there, it's useless, in my eyes. This is part of the reason I like peer programming. You can't do stuff too complicated without sounding it out first, and then the other guy will likely offer a better / simpler way to do this.

I just had a case where i wrote a method that passed a test, but when I went back to the method and looked at it, I had no idea what was going there. I ended up breaking the method to several smaller methods, so I got something like:

AssertNoCycles();
List<Node> visited = new List<Node>();
foreach(Node node in graph.Nodes)
{
    DoWork(node, visited);
}
AssertAllNodeWereVisited(visited)

Imagine that each of those methods in inlined, and you'll start to see the problem, I guess. My first instinct is to break the code to manageble pieces rather than explain what I did there.

I liked Oren's second suggestion, which is to have a certral repository of "Required Reading" before you can work on the project. I'm not talking about 1,600 design guides. I'm talking about much simpler stuff like:

This project uses the following libraries, and here are the documentation. You probably need to check the following three links to get up to speed with what they do, and we are using them to do this and that (links to everything).

This allows you to assume a common base for the people that would work on the project.

Election Day

time to read 1 min | 97 words

I thought that I would do the world a service and inform them who I'm going to be voting for:

 

 

Overall, I find that they gave this company has the highest standards and the most consistent plans for the future. I also liked the fact that they bribed me to vote for them and that their campaign remained aloof from childish attacks on the competitors.

time to read 8 min | 1404 words

Anatoly Lubarsky has commented on my previous post about commenting:

Sorry for this but you are totally wrong.
I don't want to search or read documentation or debug to understand your code.
It is totally useless work until you write for yourself only.

I also don't like this code and don't think it is descriptive.

I started to reply, but it turned to be pretty long and pretty important, so I'm posting it here.

Codebases do not exist in isolation. Explaining what is happening in the code because it's not doing the common thing get repetitive and annoying very quickly. To take few examples:

EntitySet<T> and its associated actions are not simple to grasp, but the moment you grasp them, there isn't any point to comments, it is just as useless as the "disable the button" one. Take a look at the code here. This is not commented, and is not likely to be. The moment you understand that EntitySet will call the delegate methods on add and removal, it should be clear what is going here. In this case, the collection will raise events in the parent class with very little coding on our part. Could this use a comment for somebody other than me?

Well, this did deserve its own post when I did it, so I guess that this means that I admit that it is not something trivial (not that I don't post about trivialities), but the other approach means that the code will look like:

// This will raise the OutputAdded event when a connection is added to the outputs
// and OutputRemoved event when a connection is removed from the outputs.
outputs = new EntitySet<IConnection>(
      delegate(IConnection connection) { Raise(OutputAdded, connection); },
      delegate(IConnection connection) { Raise(OutputRemoved, connection); }
      );


// This will raise the InputAdded event when a connection is added to the outputs
// and InputRemoved event when a connection is removed from the inputs.
inputs = new EntitySet<IConnection>(
      delegate(IConnection connection) { Raise(InputAdded, connection); },
      delegate(IConnection connection) { Raise(InputRemoved, connection); }
      );

Take a look at this code, it explains what is going on here, but is it useful to you? The first time, maybe. But the second time? And the third? And the 100th? This code has a comment bug in it, by the way. Can you find it?

Here is another example, this time it is from NHibernate Query Analyzer and it has to do with executing a database query in a background thread. This is complex stuff, I am sure that you would agree:

ICommand execQuery = new ExecuteQueryCommand(view, mainPresenter.CurrentProject, view.HqlQueryText, typedParameters);
mainPresenter.EnqueueCommand(execQuery);

Here is the commented version of this code:

// This command will be executed in a background thread. ExecuteQuery will take care of syncronizing the updates to UI and notifing it 
// it when it was done or it failed.
ICommand execQuery = new ExecuteQueryCommand(view,  mainPresenter.CurrentProject, view.HqlQueryText, typedParameters)

// Enqueues the command on the main presenter work items queue, which will be picked by the background thread 
// as soon as it is done with other stuff. This allows to keep a responsive UI in the face of long-running queries.
mainPresenter.EnqueueCommand(execQuery);                  

Now it also explain what is happening, and you know that the ExecuteQuery class will take care of handling returning the query to the UI. But I'm using code similar to this one for all my multi-threaded communication, do I really need to give basicaly the same comment over and over again? Do I need to do it just once? What if the other guy that is maintaning the code is looking at another location where I did this, but not where I left the comment?

Here is another example. Here Blog is an Active Record class with a lazy loading for Posts:

Blog blog;
using(new SessionScope())
{
   blog = Blog.Find(id);
}
foreach(Post p in blog.Posts)
{
   // do something
}

This is wrong. It will throw an exception when you try to access the Posts collection, but unless you are familiar with the way Active Record works (or you already go this exception), it will make not sense. Do I need to put the foreach inside the using and explain why? What about all the dozens of other places where I access lazy loaded collections (something that is certainly not obvious)?

Like I said, I don't believe that comments should be used to communicate the common stuff (even if it is not obvious). If you try to work with any code base of meaningful size, you either has to avoid doing anything of any complexity (which mean a lot of repetitiveness) or you are going to develop common idions (for this spesific codebase) for dealing with it. Not reading the documentation {best scenario} (or talking to other team members, or reading the tests and sweaping the code) when you are new to a project makes absolutely no sense unless you are confident that you can manage as you go along.

A project has common idioms for dealing with problems. In the context of the project, explaining what they do amounts to commenting "adding 1 to x" when you use: x++. When I work, I assume that the other people working on the project are familiar with the same idioms. Yes, it make it harder to understand the code in the start, but show me one significant project when someone can just come in, have a few minutes to look at the code and can start real work on it.

time to read 3 min | 499 words

Continuing on the TDD list, this message from Robert Hanson has the best description of the goals of XP that I've ever read.

In an excellent programming environment there is no fear.

  • The customer has told us exactly what he values most. I have no fear that I might be working on something he does not want.
  • The code I'm working on is test driven. I have no fear (or at least, very little fear) that the code might do something I don't want it to do.
  • All other code has unit tests; and I have continuous integration in place. I have no fear that my code will break something without me knowing it; and I have every confidence that if it does break, I can quickly find the problem and fix it, and verify that the fix works and does not break anything else.
  • I have source code control in place; if a change is really catastrophic, I just go back to yesterday's version and start over.
  • I have short iterations, so that I have no fear that if priorities or business needs change, I can adapt to the new requirements without fear.
  • I have a pair, which I change every few hours or at the start of each day. I don't worry that some key piece of knowledge is known by just one person.

And boy, do I wish I actually worked at a place like this.

I wish I worked there too. Check out the replies, they contain some great discussion.

time to read 5 min | 900 words

There is an interesting discussion about commenting code in the TDD mailing list. I think that we can agree that this comment is uselss:

//Disable the button
btnAdd.Enabled = false;

I don't like commenting in general, and I've been told that it makes my code harder to read and understand. In general I really hate to explain what is happening, even if the code itself is fairly complicated. To take an example, Rhino Mocks has some fairly complex code in it to handle nested method orderring. This is a piece that was very hard to get right (both from design perspective and from coding perspective). It has a single comment explaining the implication of the else case of an if statement.

In all of Rhino Mocks, there are 8 files (out of 56) that contains non (XML / documentation) comments. The comments I have are stuff like:

//I don't bother to check for RepeatableOption.Never because
//this is handled the method recorder
if (repeatableOption == RepeatableOption.Any)

(Where RepeatableOption is an enum with several states, out of which I'm checking only one.)

This one was tricky, though:

//because the expectation doesn't exist, an exception will be thrown
return repository.Replayer.GetRecordedExpectation(proxy, method, args);

Took me a moment to understand what was going there (I'm passing it to the replayer so it would format the appropriate exception message, instead of handling this myself) and it would need debugging to understand why this is needed.

One last one (this is in the part of the code that adds special handling for methods):

//Just to get an error if I make a mistake and try to add a normal
//method to the speical repeat method
Debug.Assert(expectation.RepeatableOption != RepeatableOption.Normal);

The rest of the comments are more in the same style, they are intent revealing and they are far and few between. I was actually surprised to find that I needed the comment to understand what was going on.

Those comments (and the overall thinking behind them) are not meant to explain what the code does, or even how / why it does it. They were written with the explicit assumstion that if you read them you know what you are doing in the codebase. They are there to explain implementation decisions, not how things work. If you need to know, it's not hard to figure out in most cases, since I try to use decriptive class and method names.

I would say that I take the same approach elsewhere in my code. Take the example I posted earlier today:

outputs = new EntitySet<IConnection>(
      delegate(IConnection connection) { Raise(OutputAdded, connection); },
      delegate(IConnection connection) { Raise(OutputRemoved, connection); }
      );

This code is using EntitySet<T> from NHibernate.Generics and anonymous delegates to raise event notifications in the class that create the outputs field. You can't understand what this code does unless you are aware of the things that EntitySet will do for you, or if you don't understand what anonymous delegates are.

But this code doesn't warrant a comment explaining its use, in my opinion. If you need to understand what anonymous delegates are, get a C# 2.0 book. If you need to know what is so special in EntitySet you can check the documentation (assuming you know to search for those :-)).

  The difference in my opinion is that the purpose of comments is to explain the uniqueness of a situation, nor how the code works, even if the code is complex / unusal.

time to read 3 min | 415 words

The part that I hate the most about a project of a type that I never done before is the start. The issue is that I produce a design that mostly works, but when new stuff need to be implemented, I often need to explore several approaches before I know what will work for this project, my last post in an example of the approaches I thought about for persisting object graphs.

The thing that really bothers me is that changing those approaches can lead to major changes in the code. This means that there are extended periods of time that I can't even build the code successfully (30 minutes to an hour, at times). This makes me very nervous, but I don't see how I can avoid doing this as I study the various ways to implement this functionality. In this case, the changes involved changing an interface that was common to many classes in the project, so that caused quite a bit of pain until I settle on the correct approach.

I was at the Israel User Group meeting today and talked with Justin who termed this as a "Design Bug". I like this term. The only encouraging part is that after the initial pain I get a stable base to work from. The bad part is that the changes are big enough that I can't really test drive them beyond very big tests (save the object graph and then load it, for example). Those type of tests are more integration oriented than developer oriented. They help me define the big goal, but not how I get there.

I guess that this is why XP has Spiking for unknown functionality. I'm pretty sure that I'm commiting horrendous crime for doing this with code that will go to production, but in this case, I don't have much implemented yet (only some interesting graph output functionality), so I feel that it is safe to do this.

One thing that I noticed happening is that I lose abstraction along the way. For instnace, when I tried XML Serialization, it couldn't handle interfaces, so I changed several interfaces to abstract classes. When I realized that XML Serialization would probably not work, I revert the changes. For now, I'm seeing this as removing premature abstraction on my part, and if I need this functionality later, it wouldn't be a problem.

time to read 3 min | 585 words

I need to save and load an object graph to a file.  A pretty routine task, but this time I thought I would dedicate a little thought to this. I need to support a flexible structure that allows saving objects that I don’t know about right now. I’m mostly free to develop the persistence layer any way I want it. My first thought was to try to use Active Record to persist it all to a SQLite database. But I need to support objects with inheritance that I don’t know about in advance. I suppose that I can solve this issue with Table Per Class approach, but I want to reserve this approach for later, mainly because this would mean generating the schema dynamically. I thought that XML Serialization would be easiest, but I run into problems with generics and serializations of interfaces, which I tend to use heavily.

Another issue that I run into is object identity. Here is the simplest example:

Before Serialization:

After Serialization (bad – disconnect in the graph):

  

U0 and U2 points to the same instance of U1

U0 and U2 points to different instances of U1

I couldn’t find a way to persist the first graph and get it back without doing some work that I would rather avoid. I have done some XML Serialization in the past, but I never done anything that cared about object identity. Using Active Record would solve this problem, but this brings me to the dynamic schema problem that I want to avoid if I can.

 

Right now I’m thinking about saving it in an executable format. The idea is to generate (using Code DOM) the actions that are needed to create the graph on save, and on load to compile and run the code to get the live object graph. For instance, take this graph:

 

 

This can be saved as this code:

 

Node u0 = new Node(),  u1 = new Node(), 2 = new Node();
u0.To(u1);
u0.To(u2);
u2.To(u1);
u1.To(u0);

 

There is code generation in this project anyway, so I’m not introducing something completely foreign. This is an intriguing concept for me, since it means that saving the file is quite complex, but the richness of the file format is unparallel.

time to read 2 min | 252 words

I needed to create a class with a collectiont that would raise events when items where added or removed to it. It was the work of a few minutes to add the events and then write this code:

outputs = new EntitySet<IConnection>(

      delegate(IConnection connection) { Raise(OutputAdded, connection); },

      delegate(IConnection connection) { Raise(OutputRemoved, connection); }

      );

Where Raise() looks like this:

private void Raise(EventHandler<ConnectionEventArgs> eventToRaise, IConnection connection)

{

      ConnectionEventArgs args = new ConnectionEventArgs(connection);

      if (eventToRaise != null)

            eventToRaise(this, args);

}

It starting too look like O/RM and frieds are the first tools that I pool out of the toolbox.

 

FUTURE POSTS

  1. Semantic image search in RavenDB - 3 days from now

There are posts all the way to Jul 28, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}