Documentation can be ambiguous in the most insidious ways

time to read 8 min | 1474 words

Frans had a long comment to my last post, I started to reply in a comment, but it grew too big for that.

Let's use ObjectBuilder from the entlib as an example. Anyone who hasn't read the code or its non-existing dev docs, go read the code and the unittests, then come back here and explain in detail how it works inside and proof you're right.

xJust to answer that, I have read the OB source, and I have never bothered to look at whatever documentation that exists for it.
After reading the code and the tests, I was able to extend OB to support generic inferencing capabilities. (Given IRepository<T> whose implementor is NHibernateRepository<T>, when asked for IRepository<Customer>, return NHibernateRepository<Customer>). That is an advanced feature of IoC, added to a container that I wasn't familiar with, without affecting the rest of the functionality.

Oh, and while I can probably give a short description of how OB works, I am by no means an expert, nor can I really explain how OB works, but I was able to go in, understand the area that I wanted to modify, make a change that greatly benefit me, without breaking anything else. That is the value in maintainable code.

And this is for the people that thinks that I bash the P&P code. I made similar changes to ObjectBuilder and to Windsor, at around the same time frame. I had harder time dealing with the ObjectBuilder, but that is probably due to unfamilariarity with the project.

The simple fact that I was able to do a significant change without having to grok the entire ObjectBuilder says quite a but the quality of the code.

If you can reverse engineer that from unit-test code, well, good for you and I'm sure your boss will be very happy to hear that you won't create a single bug ever again

This is something that you have repeated several times, and I want to explicitly disagree with this statement: understanding the code doesn't mean no bugs, it means less bugs, for sure, but no zero bugs. Most bugs occur not because you misunderstand what the code does, but because of simple mistake (if instead of if not, for instance) or not considering this particular scenario (who would ever try to paste 2Mb of text here?). Understanding helps reduce it, but it can't eliminate it. I doubt you can make the claim that LLBLGen has no bugs.

What I find a little funny is that you apparently forget which kind of comments were placed inside the nhibernate sourcecode before it went v1.0: things like "// I have no idea what this does" or "// why is this done here?" or similar comments. Apparently, the people who ported the hibernate code over to .NET didn't understand how it worked by simply looking at the code AND with all the unittests in mind.

There are similar comments there right now, and they exists in the original Hibernate source code as well. I will freely admit that there are parts of NHibernate that I have no idea how they work. What I do know is that this lack of knowledge about the way some parts work has not hinder my ability to work with NHibernate or extend it.

By digging into sourcecode and understanding what it precisely does already takes a lot of time as you have to parse and interpret every line in the code and REMEMBER the state of the variables it touches! Can you do that in your head? I can't.

That is not something that I can do, due to this limitation, I am working in a way that ensures that I do not need to understand the entire system and the implications of each and every decision at any given one point. I consider this approach a best practice, because this means that I can work on a piece of code without having to deal with the implications of dozens other components being affected. Documentation wouldn't help here, unless I would have a paragraph per line of code, and keep them in sync at all times, and remember to read it at all times.

Add to that the wide range of decisions one has to make to build a system like that and with just the end-result in your hand it's a hell of a job to come to the level where you understand why things are done that way.

I disagree, I can determain intent from code and from tests, and I can change the code and see tests breaks if necessary. That is much faster way then trying to analyze the flow of each instruction in the application.

Typical example: saving an entity graph recursively in the right order. That's a heck of a complex pipeline, with several algorithms processing the data after eachother, and the code branches out to a lot of different subsystems.

If one can determine by JUST LOOKING AT THE CODE why it is designed the way it is, more power to him/her, but I definitely won't be able to do so. What I find surprising is that some people apparently think they can, with no more info than the end result of the code.

You wouldn't be able to understand that from the end result of the code, but having a test in place will allow you to walk through that in isolation, if needed, and understand what is going on. Here is a secret, I can usually understand what NHibernate is doing in such scenarios without looking at the code. Because the logic is fairly straight-forward to understand (but not to implement). I taught myself NHibernate by building the NHibernate Query Analyzer, no documentation, very little help from other people at the time, but a lot of going through the code and grokking the way it works.

What I find even more surprising is that it apparently is a GOOD thing that there's no documentation.

No, what I am saying is that I would rather have a good code with unit tests than code that has extensive documentation. Again, I am not documentation at the right level, high level architecture, broad implementation notes, build scripts documentation. To go further from that seems to be to get to the point of diminishing returns.

It apparently is EASIER to read code, interpret every line, remember every variable's state it touches, follow call graphs all over the place, write down pre-/post- conditions along the way

That is the part that I think we disagree about, I don't need to do that.

Perhaps they're payed by the hour

Around 90% of the time that I spent on NHibernate is time that goes out of my own free time, not paid for anyone. You can rest assure that I care a lot about not wasting my own time. This is the real world, and good, unit tested, code is maintainable, proved by the fact that people goes in and make safe changes to it, without having to load all the premutations of the system to your head.

Tweet Share Share 19 comments

Comments

19 Jun 2007
12:11 PM

Michael Hawksworth

I love this argument.

It reflects two distinct camps of what you anticipate your developers to be capable of and what their needs are (PHD anyone?).

This need to document everything to the nth degree is a feature of large software houses and the management that they create where part of the documentation is designed to allow the lowest denominator plug-n-play developer team to be used to work on a project and also to give plenty of paperwork for the covering of backsides. While there can be something said for strong design work for large projects (requirements planning, better utilisation of resourses etc.) is this taken too far?

With modern(?) tools inline documentation should take card of the low level stuff in a far more understandable context than a 200 page description of procedures ever will.

As you (Oren) appear to have a different world model to the big software houses it is unlikely you will ever agree with the Frans of this world. Perhaps Frans is trying to secure all those 'technical writer' jobs?

19 Jun 2007
16:49 PM

Mats Helander

Frans,

Your example with saving an entity graph is excellent. During the first iterations of NPersist, I put the statements InsertNewObjects(); SaveDirtyObjects(); RemoveDeletedObjects(); in every possible combination, always in response to some new bug having generated a deeper understanding but also always introducing ever more subtle bugs. I contend one can not deduce exactly why those statements have to be in the order above just by looking at them. Without some doc (there is, of course, none :-P), it might be tempting for someone (as it was for me) to try and solve some bug by changing the order. Now they would run into the nunit tests, but chances are those rather obscure tests (catching weird bugs that only happen if someone fiddles with the order of those statements) had not been written if I had not fiddled with that order myself. Case in point: Had I been able to forsee all those bugs and designed the code right in the first place, those tests would probably not have been there and so without some doc there would have been nothing to save the dev who changed that order.

Ayende,

There is a difference between extending a system/framework and actually changing the code in it. Frans has, as far as I can see, been talking about the case where existing code may have to be changed.

If a framework is well written, extension should be easy and safe, even without docs. In fact, if the API is lucid, reading it should often suffice. Rewriting parts of the framework, now that is a different matter. As in Frans' example how would you know not to touch the order of those statements without any docs? As I described, the tests to save you probably wouldn't be there. You certainly can't write tests for every corner case supported by an O/R Mapper, and the bugs that show up when you change that order take some decidedly cornercasey nunit tests to catch.

If the framework is nicely written, changing stuff in its code is easier and safer. But for a complex framework like an O/R Mapper, while a talented dev (rather more talanted than me, for sure) can figure out all the relevant aspects from just looking at (a lot of) the code, in all likelyhood they could have saved a fair bit of time from some docs!

...I know that's what people who have tried to fiddle around with the NPersist code have certainly told me! ;-) (that is, they wish I'd written some docs, or at least a code comment here or there (like //don't touch the order of these statements in the UnitOfWork.Commit() :-P))

/Mats

19 Jun 2007
17:16 PM

Christopher Bennage

This documentation argument seem to center around the style of code you are used to working with, and the level of bad experiences that accompanied it.

I am struggling to find the right labels, but if a developer is most familiar/comfortable with a "procedural" style of code then Ayende's arguments might seem non-nonsensical. If a developer has an OO perspective influenced by DDD then it is likely to resonate.

I have not read all of Frans posts, and I do not want to make assumptions about his experiences, but I think it is safe to say that his approach has work for him (as his passion indicates).

My experience, tempered by my knowledge of human nature, is that Ayende's approach (and let's not limit is to just documentation) will have a higher success in general.

19 Jun 2007
18:16 PM

jdn

Since I have a philosophy background, a lot of this is familiar in a certain respect.

Suppose you could succinctly describe the differences that Ayende and Frans are talking about.

How would you go about proving which approach was better? Could you?

It's hard to put a percentage on it, but how much of 'best practices' are truly best practices as opposed to personal preference?

On a slighly separate tangent: As someone put it to me in an email (paraphrasing): "TDD/BDD/etc. all look really good because the people talking about it are generally really good programmers. Put those things in the hands of lousy programmers, and I bet they still produce lousy code."

19 Jun 2007
18:52 PM

Mike G

Here's my short comment (pun intended)...When you are changing existing code, low level comments explaining the 'Why' of complex code blocks is very helpful and cannot be captured in unit tests, or at least I don't know how it could be. If you don't know the 'Why' you may change the code in a way that will cause a break because of an unforseen side effect that there was no unit test for. Of course, if you have unit tests that cover every possible side effect of changing the existing code, then you don't need the documentation, but this doesn't sound realistic!

19 Jun 2007
20:34 PM

Jimmy Bogard

Mike,

The problem with low level 'Why' comments is that they lie. They become stale as there is no way to automatically and consistently prove that they are correct, unlike with unit tests. Unit tests don't lie. That's why I keep Michael Feathers' legacy code book and Fowler's refactoring book close by as they can help me introduce tests to non-ideal (real-world) code. If I need comments, I might put them in the test code to explain results, if the test name itself can't be made clear enough. I'd rather focus on my efforts making the 'Why' clear through executable code and tests than providing comments that may be correct now, but have a good chance of lying to me in the future.

19 Jun 2007
20:52 PM

Ayende Rahien

Mike,

My ideas about commenting hasn't significantly changed in the last year or so. You can see them here:

http://www.ayende.com/Blog/archive/2006/12/24/7007.aspx

19 Jun 2007
21:34 PM

Jimmy Bogard

http://grabbagoft.blogspot.com/2007/06/problem-with-code-comments.html

19 Jun 2007
21:46 PM

Adam Dymitruk

@ Mats

.. As I described, the tests to save you probably wouldn't be there. You certainly can't write tests for every corner case supported by an O/R Mapper, and the bugs that show up when you change that order take some decidedly cornercasey nunit tests to catch.

By the same token, you can't rely on docs to cover all corner cases or unintended usage or extension of a framework.

If order (or something else) becomes crucial to a new way of using the framework or extending it or modifying it, I expect to write a test that exemplifies that. When looking at the the source change log, you see what was involved to make that change. No need to dig through documentation.

This also drives the design to clearer instead of "the easy way out:" putting in an asterisk in the docs saying "order is important" where that part of the system is described. I'd prefer the refactored code to the asterisk hiding in the docs somewhere.

Adam

19 Jun 2007
22:20 PM

Mats Helander

@ Andy

"By the same token, you can't rely on docs to cover all corner cases or unintended usage or extension of a framework."

Oh I couldn't agree more! And if I had to pick between taking just tests or just docs, as Ayende asks about, I would take the tests in a heart-beat. I am just saying having both is even nicer as they really can complement each other nicely. I certainly don't believe in extensive documentation.

What I would advocate is that: if there's a design desicion the intent of which you for some reason feel isn't best captured by some, perhaps unweildly, tests - consider using the complementing option of using a bit of documentation for that instead...sometimes, just /sometimes/, that option makes sense :-)

About the rest of your comment:

I agree with you again, but I would still argue that Frans has chosen an excellent example of the type of function where unit tests just won't express all the intent.

A simple method: UnitOfWork.Commit() - how hard could it be? Well, the basic assumption sounds simple: It should take a unit of work and commit it, simple as that. The problem is that there's really no end to how complex object graphs, in bazillions of possible states, that the Commit() method múst be able to accept. You really very quickly run inte some severe combinatorial explosion when you start to take into account the different types of graphs the method must be able to handle correctly.

So, a heckuva lot of graphs we need to deal with? Well I guess we'll need a heckuva lotta tests then boys, eh? Stop whining and get to work? Well, an admirable constitution, but in this case almost Don Quijotesque because it simply isn't possible to cover even a small part of the types of weird graphs you'll run into, not even with thousands and thousands of tests for just the commit method.

Of course there are lots and lots of submethods and subsystems, all unit tested separately, but none of those tests will mean you will have to write any fewer tests now if you really want to make a dent in the combinatorial monster that only a small slice of just the most commonly found object graphs from the wild represents - not to even start talking in terms of covering all possible graphs, since that would be an infinite amount....

The truth of the matter is that you'll have to think through really really hard what it is the commit method is actually doing, and just try to find the /right way to do it/...it may take a while to find, but when you find it, you understand in detail why it has to be exactly the way it is - basically lots of little reasons like "that has to go before that to respect that type of FK relationship in the DB". Those little reasons are really hard to express in code - even in test code.

You could write lots of tests with graphs in the right order that pass and other tests with graphs that break in expected ways, but without any comments anywhere describing why the heck things are the way they are in terms of FK relationships in the DB that have to be respected, many of those reasons are bound to be lost on the maintainer and not actually covered in tests.

So that means that .nunit + .doc > .nunit ;-)

19 Jun 2007
22:25 PM

Ayende Rahien

I am just saying having both is even nicer as they really can complement each other nicely

No argument here. I like overview documents, but rarely implementation ones.

And Frans' sample is indeed very good for why documenting intent is good.

My disagreement is the level he seems to take it.

19 Jun 2007
22:27 PM

Mats Helander

Just to make clear: Just because you can't cover even a small amount of all possible graphs with your tests doesn't mean you give up and don't use any. One will, indeed, be Don Quijote and have lots and lots of tests, many of those capturing reported bugs. It is just that it is a prime example of a function where we can't even pretend to approach decent test coverage and where many little nuggets of relevant wisdom may actually be better expressed in docs than in tests.

19 Jun 2007
22:29 PM

Mats Helander

@ Ayende

"No argument here. I like overview documents, but rarely implementation ones."

Exactly. And I'd say that if some intent is already nicely expressed in some test - why bother documenting it "again"?

19 Jun 2007
22:50 PM

Mats Helander

@ Adam

Doh! Sorry Adam! "@ Andy"...

20 Jun 2007
08:09 AM

Frans Bouma

Mats says it best:

"What I would advocate is that: if there's a design desicion the intent of which you for some reason feel isn't best captured by some, perhaps unweildly, tests - consider using the complementing option of using a bit of documentation for that instead...sometimes, just /sometimes/, that option makes sense :-)"

I couldn't agree more.

I picked the save pipeline example for a reason: I also did write the pipeline for llblgen pro first with homegrown algo's without much in-depth design decision documentation and with a set of tests which succeeded, however the stack of tests grew and grew every time a customer submitted a cornercase example of a weird graph which failed.

After a while I had enough, ripped it all out, designed it up front again with docs, proper hardened algos and properly documented design decisions. Didn't have an issue since. Sure the tests are still there, but as Mats said: if I wouldn't have had all these corner cases submitted with the previous code, these tests weren't be there, just the ones to proof that some graphs worked.

The thing is: can you proof with code alone that ALL graphs work? I'm not convinced. However if you use a description where you proof that topology sorted graphs result in a working order queue which you can split into 2 based on the type of work, and you add to that proper fk/pk syncing mechanisms, it will work in all possible scenario's. I don't need code to proof that, in fact, code will only pollute the proof description.

With these kind of docs, you can always check if you have written the right code, i.e.: does the code match the algo you designed?

Yes, I design my algo's up front. Not before I start coding the complete application, but when I add a new feature. Every addition is a migration project from one version to the next, with live code all over the planet so breaking stuff is not something you can do. This thus requires proper knowledge of where what can be changed and simply digging in and changing stuff all over the place isn't what's going to bring you a lot of progress.

What I also still find odd is that people favor a unittest with code you have to parse and interpret in your head over a doc which describes what is done in the code called by the test (!). But I disgress... no time to write yet another follow up :). I've tried to make clear what I think is needed, if some people don't want to use readable docs, who am I to deny these people that 'pleasure' ? ;)

20 Jun 2007
09:13 AM

Adam Dymitruk

@Franz

One more follow up ;0)

".. people favor a unit test with code you have to parse and interpret in your head.."

This screams for either a better unit test or a more explicit design or both if its hard to grok. There is such a thing as an unclear unit test just like there is unclear documentation. I think the 2, when both done right (or wrong - but both), are quite interchangeable - but the unit tests are more likely to be up to date at any point in time among other advantages.

@Mats

NP. ;0)

Adam

20 Jun 2007
09:14 AM

Adam Dymitruk

oops! I meant FRANS.. :0)

now I did it too.. it's getting late..

Adam

20 Jun 2007
20:13 PM

Frans Bouma

"There is such a thing as an unclear unit test just like there is unclear documentation. I think the 2, when both done right (or wrong - but both), are quite interchangeable - but the unit tests are more likely to be up to date at any point in time among other advantages."

Oh? Interchangeable?

So, tell me, your unittests reflect the options you DIDN'T take when deciding which option to implement for a given feature? I doubt it. The test only describes the option you DID take, and only from the outside.

So, let's take the save pipeline of an o/r mapper. You have to take a lot of decisions along the way. If you don't document them, if you just have unittests, you don't know which decisions you've made. If someone, after a year or 2, tells you and proofs you that your code isn't bugfree and you need to rewrite parts of it, so in effect you have to re-consider the options you have and make a new decision: you can't use the info you already gathered the first time, as you don't have that documented!

So you are doomed to repeat it all over again and if you're unlucky you take a wrong decision because you can't gather all the info for example you had the first time.

POOF

There goes the framework.

There are tools for documenting design decisions and the options you rejected and the info you used to proof your theory behind the code is correct and how the theory behind the code works. These tools are called word processors, with the lack of better tooling.

If you still are convinced you can document your design decisions in a unittest or set of unittests, please reconsider your ideas about this. It doesn't make you a bad developer to use unittests for tests and documentation for documenting your decisions. In fact, the people who are stepping in after you left for another project will be grateful because they can leverage the information you already gathered, can re-examine the decisions you made and based on what info you made them (!).

21 Jun 2007
15:43 PM

James Crickett

Is it just me, Frans, or have you moved from "document everything" to "well, you need some documents!" ?

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB