Ayende @ Rahien

It's a girl

The importance of comments

A while ago I got a comment to a post of mine “if you [ in here the commenter is talking to another commenter ] think good code does need comments you are obviously unqualified to be in this conversation”. And I wanted to demonstrate that this is a very false statement.

Let us look at the following piece of code.

   1: private long GetNewLength(long current)
   2: {
   3:     DateTime now = DateTime.UtcNow;
   4:     TimeSpan timeSinceLastIncrease = (now - _lastIncrease);
   5:     if (timeSinceLastIncrease.TotalSeconds < 30)
   6:     {
   7:         _increaseSize = Math.Min(_increaseSize * 2, MaxIncreaseSize);
   8:     }
   9:     else if (timeSinceLastIncrease.TotalMinutes > 2)
  10:     {
  11:         _increaseSize = Math.Max(MinIncreaseSize, _increaseSize / 2);
  12:     }
  13:     _lastIncrease = DateTime.UtcNow;
  14:     current = Math.Max(current, 256 * PageSize);
  15:     var actualIncrease = Math.Min(_increaseSize, current / 4);
  16:     return current + actualIncrease;
  17: }

This piece of code is responsible for decided the new file size when we run out of room in the current space we use.

That is 10 lines of code. And what happens is quite clear, even if I say so myself. But the why for this code is lost. In particular, there is a lot of reasoning behind the actual logic here. You can see that we play with the actual increase size, to make sure that if we increase often, we will reserve more space from the OS. That seems clear enough, but what about lines 14 – 15?

In this case, just before those two lines we have:

   1: // At any rate, we won't do an increase by over 25% of current size, to prevent huge empty spaces
   2: // and the first size we allocate is 256 pages (1MB)
   3: // 
   4: // The reasoning behind this is that we want to make sure that we increase in size very slowly at first
   5: // because users tend to be sensitive to a lot of "wasted" space. 
   6: // We also consider the fact that small increases in small files would probably result in cheaper costs, and as
   7: // the file size increases, we will reserve more & more from the OS.
   8: // This also plays avoids "I added 300 records and the file size is 64MB" problems that occur when we are too
   9: // eager to reserve space
  10:  

And yes, this isn’t about comments such as // increment by 1. I try writing comments like that when explaining policy or heuristics. Or when I am doing something pretty funky all around for some (hopefully) good reason.

Another example might be this:

   1: // here we try to optimize the amount of work we do, we will only 
   2: // flush the actual dirty pages, and we will do so in sequential order
   3: // ideally, this will save the OS the trouble of actually having to flush the 
   4: // entire range
   5: long start = sortedPagesToFlush[0];
   6: long count = 1;
   7: for (int i = 1; i < sortedPagesToFlush.Count; i++)
   8: {
   9:     var difference = sortedPagesToFlush[i] - sortedPagesToFlush[i - 1];
  10:     // if the difference between them is not _too_ big, we will just merge it into a single call
  11:     // we are trying to minimize both the size of the range that we flush AND the number of times
  12:     // we call flush, so we need to balance those needs.
  13:     if (difference < 64)
  14:     {
  15:         count += difference;
  16:         continue;
  17:     }
  18:     FlushPages(start, count);
  19:     start = sortedPagesToFlush[i];
  20:     count = 1;
  21: }
  22: FlushPages(start, count);

Again, relatively short code. And really easy to follow. But good luck trying to figure out that this code is responsible for a 100% perf boost, or that you need to balance multiple aspects to get an optimal solution without getting comments.

Now, I think that when people talk about “good code doesn’t need comments”, they think about code like this (all samples taken from a popular OSS project):

   1: var query = _backInStockSubscriptionRepository.Table;
   2: //customer
   3: query = query.Where(biss => biss.CustomerId == customerId);
   4: //store
   5: if (storeId > 0)
   6:     query = query.Where(biss => biss.StoreId == storeId);
   7: //product
   8: query = query.Where(biss => !biss.Product.Deleted);
   9: query = query.OrderByDescending(biss => biss.CreatedOnUtc);

Yes, I can read, thank you. And here is an example of two comments, the first one good, and the second bad:

   1: if (_commonSettings.UseStoredProceduresIfSupported && _dataProvider.StoredProceduredSupported)
   2: {
   3:     //stored procedures are enabled and supported by the database. 
   4:     //It's much faster than the LINQ implementation below 
   5:  
   6:     #region Use stored procedure
   7:     
   8:     //pass category identifiers as comma-delimited string
   9:     string commaSeparatedCategoryIds = "";
  10:     if (categoryIds != null)
  11:     {
  12:         for (int i = 0; i < categoryIds.Count; i++)

The first comment explain why. And that is crucial. You should only need to explain the how very rarely, and then, you need a comment to justify why you need the comment. (Performance, maybe?)

Comments

tobi
08/29/2013 09:40 AM by
tobi

I wish this blog was read by my work colleagues. They feel compelled to add the most useless XML docs to all members, but document nothing else. A property FirstName does not need to be documented as "The person's first name.".

Xing Yang
08/29/2013 09:44 AM by
Xing Yang

In the first example, parts of the comment can be replaced by two extra variables:

long firstAllocateSize = 256 * PageSize long maxAllowedIncrease = current / 4

And other parts can be expressed in test case name:

[Test] public void Sizeincrementshouldbelimitedsomycustomerwont_scream()

Comment can be good sometimes, but better when it's organic part of the code.

Xing Yang
08/29/2013 09:55 AM by
Xing Yang

For the second example, maybe a batchSize variable will do the same.

Ayende Rahien
08/29/2013 09:56 AM by
Ayende Rahien

Xing, Tests aren't located near the code, they cannot explain reasoninig.

Arnis Lapsa
08/29/2013 10:48 AM by
Arnis Lapsa

This applies when code is in more simple form without 'why' than it would be when 'why' inclusion were forced with abstractions that do just that.

Single responsibility principle plays important role in this.

Rob Lyndon
08/29/2013 11:37 AM by
Rob Lyndon

I think the days of

// Make a cup of tea x.MakeACupOfTea();

are numbered, if not entirely gone. Few developers who are committed to learning would advocate that kind of verbosity. In a similar vein, I'm seeing less and less of this sort of thing.

// Some lame smart-alec remark x.MakeACupOfTea();

Top tip: You're not as funny as you think you are.

However, I have had at least one manager who maintained that ALL comments were evil, and should be expunged from the codebase. In some ways, I can understand where he's coming from. In this day and age, it is perfectly possible to make a career out of writing bog-standard LOB e-commerce apps to sell cheese graters online. Fire up your .NET MVC4 template; write a few views, controllers and API calls; steer clear of innovation; use standard, time-tested patterns everywhere. If you need to explain yourself, you're doing too much. Ultimately, that's a bit of a soul-destroying way of working, and a waste of your own talent. But it pays the bills, and for some people, that is the only consideration worth your time.

Carlos Ble
08/29/2013 11:44 AM by
Carlos Ble

Although the comments for your first snippet make sense, as they explain the rationale behind the algorithm, I think the code can be rewritten to be more readable:

1: private long NewLength(long currentSize) 2: { 3: increaseSize = calculateIncreaseSizeBasedOnLengthChangeFrequency(); 4: currentSize = atLeastTheMininumNumberOfPagesConfigured(currentSize); 5: var actualIncrease = recalculateIncreaseAlwaysKeepingItBelow25percent(increaseSize, current); 6: return currentSize + actualIncrease; 7: }

Or:

1: private long NewLength(long currentSize) 2: { 3: return currentSize + recalculateIncreaseAlwaysKeepingItBelow25percent( 4: increaseBasedOnLengthChangeFrequency(), 5: atLeastTheMininumNumberOfPagesConfigured(currentSize)); 6: }

I have removed the underscore from the instance members on purpose. I don't think it helps at all. I also have removed the "Get" from the method, it does not bring any value to me.

In general, comments are necessary when they add context to the code, when that context can't be inferred from the code itself. Usually referring to where the code is plugged, how is it consumed. Complex algorithms certainly deserve comments.

Thanks :-)

Rafal
08/29/2013 11:59 AM by
Rafal

what about:

var increase = lastUpdateTime < 2 minutes ? 20% : 10% this doesn't require comments and probably works just as well

Joshka
08/29/2013 12:29 PM by
Joshka

The comment about the first increase being 1MB seems to be a lie. The first allocated size by this code is at least 1MB + MinIncreaseSize.

Gene Hughson
08/29/2013 12:56 PM by
Gene Hughson

Absolutely agree.

Both the 'throw in a bunch of inane comments' and 'all comments are teh evil' camp seem to come down to preferring absolutes over thought. The right measure, IMHO, is value, not slogan conformance.

Rasmus Schultz
08/29/2013 01:15 PM by
Rasmus Schultz

@Carlos: I'm not sure sure if you're being facetious or not :-)

Using names like "recalculateIncreaseAlwaysKeepingItBelow25percent" instead of "recalculateIncrease" communicates both what you're doing and how you're doing it - but it still doesn't communicate why. When there's a longer explanation behind the "why", I don't think you could cram all of that into a method-name even if you wanted to. And when a method-name communicates how it's doing something, you add a new burden of renaming the method every time the "how" changes.

One way to make code more self-documenting, is to avoid using literal values - I use lots of constants to make methods more self-documenting, for example, EXPIRATION_TIME when seen in an expression computing an expiration date, is more semantic than for example the literal expression 606024.

Most people use constants for values they need to use more than once - try using them simply to give name and meaning to certain values or expressions. The code shown in this post could use some of that.

joshka
08/29/2013 01:15 PM by
joshka

Second example (pseudo-ish code)

foreach (var pageRange in pages.Sort().PagesRangesOptimalToFlushInASingleCall())
{
    pageRange.Flush();
}

...

Fourth example:

IWhateverStrategy FastestWhateverStrategyAvailable()
{
    if (CanUseStoredProcedures())
        return new StoredProcedureWhateverStrategy();
    else
        return new LinqWhateverStrategy();
}

...

Jeff
08/29/2013 01:23 PM by
Jeff

I tend to lean towards the "you really shouldn't need comments" camp (I'm not saying it's an absolute, but I don't know that I've seen a really good example of 'need'). This blog expresses a common argument in support of comments. As Xing and Carlos have alluded, when you find yourself agreeing that there should be a comment to explain the 'why', it's typically a case where the code violates a SOLID principle or exercise poor naming choices.

I would aver that the examples above suffer from both of those mishaps.

Rob Lyndon
08/29/2013 01:47 PM by
Rob Lyndon

@Jeff, I really don't understand why you'd say that when the comment explains precisely what user experience the strategy is addressing, and the attitudes of the users, based on their feedback over several years. Do you really think it's better to try to embed that information within variable names and method names?

You could say that such information belongs in external documentation, but I'd say it's at its most useful when it's right there next to the place it's being applied.

I don't see how the code snippet violates any kind of SOLID Principle: certainly not the Single Responsibilty Principle. The method is called GetNewLength(); it does what it says it does. You can inject an external strategy if you want, but unless other strategies are used, or that strategy is used elsewhere, that's just code bloat for the sake of dogma.

Carlos Ble
08/29/2013 02:48 PM by
Carlos Ble

@Rasmus, not am not trying to be facetious. I just wanted to leave the code better than I found it. Obviously it is not the perfect code, it can be improved. Code can always be improved. One improvement could be this:

recalculateIncrease(increaseSize, current).alwaysKeepingItBelow(MAX_INCREASE).percent()

Usually method names should not reveal how they are implemented, IF that "how" is not significant. IF that "how" can be extended. In the case of concrete algorithms though, the "how" can't be changed, it is significant and it can be part of the method name if necessary. SRP and OCP always depend on the nature of the problem, the domain. Given that I don't really know the context of this particular problem and don't care about the algorithm, I can't really tell if there are several reasons to change or not, or whether it is open to extension enough.

The "why" is usually part of the context, which is what I sometimes need to explain with comments.

Cheers :-)

Jiggaboo
08/29/2013 03:13 PM by
Jiggaboo

That comment was from funny guy named Howard Chu to me. I wonder what he says now. He will probably tell you to get any computer programming course Ayende. Just like he did last time.

Stacey
08/29/2013 05:12 PM by
Stacey

I comment a lot, and much of it is very verbose and unnecessary - but I am still in that rookie "fresh out of school solo developer" phase of my career, and I think I could do a lot worse than just having foolish comment pattern.

Much of it is just my OCD demanding things look nice and even on the page. It seems silly, but ... it is what it is. I haven't found it to hurt my development time, at least.

Piotr Perak
08/29/2013 05:14 PM by
Piotr Perak

My thoughts exactly.

Jiggaboo
08/29/2013 05:49 PM by
Jiggaboo

I think I don't understand first 2 sentences. It' either my english or yours.

“if you think good code does need comments you are obviously unqualified to be in this conversation”. And I wanted to demonstrate that this is a very false statement.

What is very false statement? The only statement I see is from Howard Chu. But now I think that you think about "good code doesn't need comments". Please someone with better english skills correct me but I think it isn't clear. That's why I posted my first comment above.

So I wanted to make it clear. I said exactly "Good code doesn't need comments most of the time.". Notice "most of the time". The time comments are needed is what you show in your examples. That's exactly what I meant. In everyday I see things like // increment x, // call webservice, or my favourite // default constructor :)

Howard Chu
08/29/2013 05:50 PM by
Howard Chu

Actually Ayende was quoting Daniel Moreira Yokoyama. This entire post completely agrees with what I said.

Ayende Rahien
08/29/2013 05:55 PM by
Ayende Rahien

Carlos, How do you do the reasoning behind it? Size vs responsiveness vs angry emails in the mailing list?

Carlos Ble
08/29/2013 06:14 PM by
Carlos Ble

Sorry @Ayende, I am not sure what are you asking. I am not angry at all :-) I am not worrying about size or responsiveness, just code clarity. I like code that can be read like prose, that I can understand without much thinking. Just reading like a book. Sorry if I am looking angry, might be my English :-s

peter
08/29/2013 06:57 PM by
peter

many times I pseudo-code up some logic, and then fill in the code between the comments. They don't come out until the base is stable.

Perhaps the oss comments you found are similar.

joshka
08/30/2013 12:49 AM by
joshka

Another counterpoint. Comments support developers reading the code in future times. Code naming etc. support every developer that uses that code from a point external to that method. For me, I think the latter is the more useful optimisation.

David Gudeman
08/30/2013 02:23 AM by
David Gudeman

In any complex program, comments should be used to help the reader understand the model, if nothing else. Sure, you can use meaningful names like "getPageSize", but that doesn't help if the reader doesn't know what a page is in your program. "setUpdateFlag" is a nice name, but if the reader doesn't know which of several possible kinds of changes you consider an "update", it isn't very helpful. As a general rule, describing the underlying model and concepts of the program takes more words than you can reasonably fit into a variable name. Once the reader understands your model, only then do the names start being really helpful.

It would be like starting a work of fiction with ambiguous references: "The Regulator dropped his cloak and stepped forward into the breach." What's a "Regulator"? A title for some kind of court officer? A robot? What's a "cloak"? A piece of clothing? Some form of magical invisibility? What's the "breach"? A breach in a battle line? A breach in a ship's hull? And is he stepping into the breach to seal the breach or to exploit the breach?

In fiction, that kind of uncertainty is fine because the point is to hook the reader and reveal things later. In programming, your responsibility is to make things as clear as possible with as little effort as possible on the part of the reader. So early explanations of the overall model are important.

Tudor
08/30/2013 09:35 AM by
Tudor

There is another good reason for code comments - until your code is less-than-perfect and not yet self-explanatory, do add comments to compensate for this - do not avoid comments in the hope that someday you will refactor the code to be perfect.

We live in an imperfect world, and not all developers write self-explanatory code that "reads like prose" from start - do not use this ideal as an excuse to deliver code without any comments that only you will understand.

First make sure that your source control system is backed-up regularly and that you know how to restore it and upgrade it to a new version, before deciding that the commit comments can replace comments in the source code telling why a change was made.

Phil
08/31/2013 04:46 PM by
Phil

I advocate commenting what the intent of the method/class is. That is what it supposed to do. I can read the code to see what it does, but is that what it is supposed to do? Obviously, there should be unit tests to validate what the code is supposed to do.

I am not saying there should a huge prose of comments added to you code. In .NET, adding xml doc comments to indicate what exceptions are thrown in what cases is important. Of course the developer can read the code to determine which exceptions are thrown, but they should'nt have to. This be comes more of a problem when using 3rd party libraries via NuGet. Is the developer expected to read through the Entity Framework/Json.NET/Unity/StructureMap/... source code to determine what errors they should avoid? Of course not.

As usual, great post Ayende.

Comments have been closed on this topic.