Working software over comprehensive documentation
Frans has a long post about how important is documentation for the maintainability of a project. I disagree.
Update: I have another post in this subject here.
Before we go on, I want to make sure that we have a clear understanding of what we are talking about, I am not thinking about documentation as the end user documentation, or the API documentation (in the case of reusable library), but implementation documentation about the project itself. I think that some documentation (high level architecture, coding approach, etc) are important, but the level to which Frans is taking it seems excessive to me.
The thing is though: a team of good software engineers which works like a nicely oiled machinery will very likely create proper code which is easy to understand, despite the methodology used.
So, good people, good team, good interactions. That sounds like the ideal scenario to me. I can think of at least five different ways in which a methodology can break apart and poison such a team (assigning blame, stiff hierarchy, overtime, lack of recognition, isolate responsibilities and create bottlenecks). Not really a good scenario.
The why is of up-most importancy. The reason is that because you have to make a change to a piece code, you might be tempted to refactor the code a bit to a form which was rejected earlier because for example of bad side-effects for other parts.
And then I would run the tests and they would show that this causes failure in another part, or it would be caught in QA, or I would do the responsible thing and actually get a sense of the system before I start messing with it.
If you don't know the why of a given routine, class or structure, you will sooner or later make the mistake to refactor the code so it reflects what wasn't the best option and you'll find that out the hard way, losing precious time you could have avoided.
This really has nothing to do with the subject at hand. I can do the same with brand new code, to go on in a tangent somewhere, it is something that you have to deal with in any profession.
That's why the why documentation is so important: the documented design decisions: "what were the alternatives? why were these rejected?" This is essential information for maintainability as a maintainer needs that info to properly refactor the code to a form which doesn't fall into a form which was rejected.
Documentation is important, yes, but I like it for the high level overview. "We use MVC with the following characteristics, points of interest in the architecture include X,Y,Z, points of interest in the code include D,B,C, etc". But I stop there, and rarely updates the documents beyond that. We have the overall spec, we have the architectural overview and the code tourist guide, but not much more. Beyond that, you are suppose to go and read the code. The build script usually get special treatment, by the way.
This also assumes that the original builders of the systems was omniscient. Why shouldn't I follow a form that was rejected? Just because the original author of an application thought that Xyz was the end-all-be-all of software, doesn't means that Brg isn't a valid approach and should be considered. It should not surprise you that I reject the idea out of hand.
Code isn't documentation, it's code. Code is the purest form of the executable functionality you have to provide as it is the form of the functionality that actually gets executed, however it's not the best form to illustrate why the functionality is constructed in the way it is constructed.
Code can be cumbersome to express the true intent with, that is why I am investing a lot of time coming up with intent revealing names and pushing all the infrastructure concerns down. The best way to illustrate why certain functionality exists in such a way is to cover it with tests, that was you can see intended usage and can follow the train of thoughts of the previous guy. I routinely head off to the unit tests of various projects to get an insight about such things can work.
I've seen technical documents which did make a lot of sense and were essential to understanding what was going on at such a level that making changes was easy.
Frans, at what level where they? Earlier you were talking about routine level, but I want to know what you think is the appropriate documentation coverage for a system?
If your project consists of say 400,000 lines of code, it's not a walk in the park to even get a slightest overview where what is located without reading all of those lines if there's no documentation which is of any value.
The problem here is that you make no assumption about the state of the code. I would take undocumented 400,000 LOC code base that has (passing) unit tests over one that had extensive documentation but little to no tests any time. The reasoning is simple, if it is testable, it is maintainable, period. Yes, it would take time to wrap my head around a system this size, but I can most certainly do it, and unit tests allows me to do a lot of things safely. Assume that you have extensive documentation, what happens when the code diverge from the Documentation?
You see, documentation isn't a separate entity of the code written: it describes in a certain DSL (i.e human readable and understandable language) what the functionality is all about; the code will do so in another DSL (e.g. C#). hats the essential part: you have to provide functionality in an executable form. Code is such a form, but it's arcane to read and understand for a human (or is your code always 100% bugfree when you've written a routine? I seriously doubt it, no-one is that good), however proper documentation which describes what the code realizes is another.
Documentation is not a DSL, and it is most certainly not understandable in many cases. Documentation can be ambiguous in the most insidious ways. The code is not another DSL, this assumes that the code and the documentation are somehow related, but the code is what actually run, so that is the authoritive on any system. Documentation can help understanding, but it doesn't replace code, and I seriously doubt that you can call it a DSL. The part that bothers me here is that the documentation is viewed as executable form, unless you are talking about something like FIT, that is not the case. I can't do documentation.VerifyMatch(code);
When I need to make a change and need to know why a routine is the way it is, I look up the design document element for that part and check why it is the way it is and which alternatives are rejected and why. After 5 years, your own code also becomes legacy code. Do you still maintain code you've written 2-3 years ago? If so, do you still know why you designed it the way it is designed and also will always avoid to re-consider alternatives you rejected back then because they wouldn't lead to the right solution?
I still maintain code that I wrote two years ago (Rhino Mocks come to mind), and I kept very little documentation about why I did some things. But I have near 100% test coverage, and the ability to verify that I still have working software. Speaking on the paid side of the fence, a system that I have started written two years ago has gone two major revisions in the mean time, and is currently being maintained by another team. I am confident in my ability to go there, sit with the code, and understand what is going on. And of course, "I have no idea why I did this bit" are fairly common, but checking the flow of the code, it is usually clear that A) I was an idiot, B) I had this and that good reason for that. Sometimes it is both at the same time.
It needs pointing out again, what was true 5 years ago is something that you really need to reconsider today.
What's missing is that a unit test isn't documenting anything
And here I flat out disagree. Unit tests are a great way to document a system in a form that keeps it current. Reading the tests for a class can give you a lot of insight about how it is supposed to be used, and what the original author thought about when he built it.
It describes the same functionality but in such a different DSL that a human isn't helped by wading through thousands and thousands of unit tests to understand what the api does and why.
Not really any different than wading through thousands of pages of documentation, which you can't even be sure to be valid.
Using unit tests for learning purposes or documentation is similar to learning how databases work, what relational theory is, what set theory is etc. by looking at a lot of SQL queries.
The only comment that I have for this: That is how I did it.
Wouldn't you agree that learning how databases work is better done by reading a book about the theory behind databases, relational theory, set theory and why SQL is a set-oriented language?
Maybe, but I believe that the best way to learn something is to use it in anger and there is absolutely nothing that can beat having something to play with and explore.
Comments
All well and good, but I think you using your own experience as proof for something is like having Michael Jordan using his own experience as a basketball player as proof for how all players should play and act. Which is why he hasn't really done all that well as a GM (and is true of almost all 'star' players who try to coach or GM).
You aren't a representative sample. As I've mentioned previously, I've watched your first Hibernating Rhinos video (and need to remember to watch the others). You really aren't a representative sample.
If it isn't clear, that's a compliment.
jdn,
I'm no where near as good as Oren. But I do use tests as documentation and have the same experience as him. I'm by no means as smart as he is.
WatiN is a great example of a project where the tests were more informative than the documentation.
Adam
Let's use ObjectBuilder from the entlib as an example.
Anyone who hasn't read the code or its non-existing dev docs, go read the code and the unittests, then come back here and explain in detail how it works inside and proof you're right.
You probably can't. Is it bad code? No, it's probably spendid code. Is it lacking unittests? not at all.
One thing misses: you can't reverse engineer the process which led to the code by simply looking at the code: unittests aren't helping at all: they just test and show how to use the code from the outside, they don't show WHY it is setup on the INSIDE. If you can reverse engineer that from unit-test code, well, good for you and I'm sure your boss will be very happy to hear that you won't create a single bug ever again, simply because you can read code and reverse engineer every step in the process :).
For the record, I can't. To some extend I can follow my own code without going back to the docs, but after a couple of years, and with hundreds of thousands of lines of code and a lot of classes, algorithms etc. etc., it's impossible.
What I find a little funny is that you apparently forget which kind of comments were placed inside the nhibernate sourcecode before it went v1.0: things like "// I have no idea what this does" or "// why is this done here?" or similar comments. Apparently, the people who ported the hibernate code over to .NET didn't understand how it worked by simply looking at the code AND with all the unittests in mind.
This is understandable, simply because the system internally is very complex, an o/r mapper IS very complex internally no matter what you do. By digging into sourcecode and understanding what it precisely does already takes a lot of time as you have to parse and interpret every line in the code and REMEMBER the state of the variables it touches! Can you do that in your head? I can't.
Add to that the wide range of decisions one has to make to build a system like that and with just the end-result in your hand it's a hell of a job to come to the level where you understand why things are done that way. If that doesn't become clear, you have to assume why and if that's wrong (and why shouldn't it be, assumptions often are), you WILL have to wade through the same process again and again if you want to alter that piece of code, as you don't have that info present. You can't determine that from the code.
Typical example: saving an entity graph recursively in the right order. That's a heck of a complex pipeline, with several algorithms processing the data after eachother, and the code branches out to a lot of different subsystems.
If one can determine by JUST LOOKING AT THE CODE why it is designed the way it is, more power to him/her, but I definitely won't be able to do so. What I find surprising is that some people apparently think they can, with no more info than the end result of the code.
What I find even more surprising is that it apparently is a GOOD thing that there's no documentation. It apparently is EASIER to read code, interpret every line, remember every variable's state it touches, follow call graphs all over the place, write down pre-/post- conditions along the way, THAN it is to read a design doc which describes why it is done that way and how it works and why B or C were rejected so you don't have to consider them again if the situation hasn't changed. Perhaps they're payed by the hour, I don't know. :)
Frans, the debate isn't about NO documentation, it is about which is more important.
Given the choice - would you rather have a working application that had:
a) Full unit test coverage but no documentation
or
b) Full documentation, but no unit tests
Personally, I would go for (A) every time - because documentation is a leaky abstraction of the highest order.
On top of the full unit tests, I would love lots of (GOOD) documentation. But it is all about where priorities lie.
There's not a choice available. You also have to spend time on designing the proper class graph, or come up with proper names for your methods, classes, variables and unittests. This is also part of the whole software writing experience.
Your choice is a bit odd. Unittests don't replace docs, as they're not a substitute.
It's not that hard, as we all already write these docs :) Most software engineers, and I'm definitely sure you do this too, think before they write code. :). How is that thinking process look like? Don't you write things down, perhaps on a whiteboard or piece of paper? Draw a graph or image of the situation, think things through etc?
Why not document these things along the way? My experience is that it won't take much longer than you would do it otherwise. The advantage is that it's recorded: the info is available for later reference. Add to that some discipline also mastered for class names, variable names, comments (!), keeping these things in sync isn't that hard either, you just have to force yourself to do so.
IMHO the motivation to do so has to be there and often developers simply aren't motivated enough, because they won't consume the docs anyway: when these are needed, they're already moved on to other projects.
IF I have to choose, I'd go for b. I get slaughtered by this choice perhaps, but it's motivated by the fact that the documentation reflects the intention of the code and why it is that way. With that I can make changes to the code. Sure, testability is a problem, but make no mistake: before unittests were widely used, people already wrote large pieces of software including OS-es.
The motivation NOT to have docs seems to come from the assumption that the documentation WILL be bad. Though, where is it proven that the set of unittests given is 100% reflecting the complete set of use cases in all particular situations possible ? (if these things are even comparable ;)). That also assumes a thing, namely that the set of unittests is correct. I won't deny that, but why should a doc which is kept in sync have to be assumed to be bad?
I've been down this road before, where you see something, forget why it was that way, and start to refactor only to then remember you already tried that or find out it doesn't work. But I don't think throwing some work away is always a waste of time.
It's entirely dependent on your system and your developers. If you spend a ton of your time on your project reverse engineering old decisions, that's an indicator that you could use a little more design docs. If it doesn't happen all that often, you probably have the right balance.
When I do want to document such things, I've created an internal team blog. When someone investigates/runs across/implements something, they can create an entry describing what they did and why, with the appropriate tags. That's pretty quick, and people find it easier to do it in blog format, and it's time stamped. Our Google mini crawls our blogs, so if you run across something strange, you can google for the class, or the subsystem, and likely quickly come across any relavent design notes.
That works, as well as capturing the details of decisions in the defect tracking system and referencing particular defects in comments (and checkin comments).
From what I've seen, It's at least as hard to write good documentation as it is to write good code, just as time consuming (if not more), and much less likely to be useful than a well tested and coded feature that the business needs to have.
I have enjoyed reading this discussion about documentation. However, I have to agree formal documentation always ends up out of date. I think both design and testing are equally important. If you design good methods, properties and class names you can get a sense of what the intent of a component is. Encapsulation is very important to understanding intent and use. Unit tests are also extremely important for validating and refactoring. However, if your components are not designed well no amount of documentation, unit tests or developer talent will help you understand the code. Most of the time the code will be re-written (re-invented) (not refactored).
Frans,
There is a significant difference between the way I think and the way I write documentation. I don't need to write stuff for myself that makes sense on its own.
Take a look at the diagram here:
http://worsethanfailure.com/Articles/Drive-By_Architecture.aspx
Does it make any sense?
In the context of the discussion, it certainly did. But outside of it, it is a set of random lines, rarely useful.
Because if the documentation is meant to be useful, it should include a lot of things that are implicit in the way you think already.
The Apache foundation has a policy that all decisions must be made on the mailing list, because that keeps a log of what happens, I like this approach.
I think the the reasoning behind the design and implementation of a system is more important (and interesting) than the current state at some fixed point in time.
/johan/
Comment preview