Legacy code with really good tests is still legacy code

time to read 4 min | 728 words

I got into an interesting discussion on LinkedIn about my previous post, talking about Code Rot. I was asked about Legacy Code defined as code without tests and how I reconcile code rot with having tests.

I started to reply there, but it really got out of hand and became its own post.

“To me, legacy code is simply code without tests.” Michael Feathers, Working Effectively with Legacy Code

I read Working Effectively with Legacy Code for the first time in 2005 or thereabout, I think. It left a massive impression on me and on the industry at large. The book is one of the reasons I started rigorously writing tests for my code, it got me interested in mocking and eventually led me to writing Rhino Mocks.

It is ironic that the point of this post is that I disagree with this statement by Michael because of Rhino Mocks. Let’s start with numbers, last commit to the Rhino Mocks repository was about a decade ago. It has just under 1,000 tests and code coverage that ranges between 95% - 100%.

I can modify this codebase with confidence, knowing that I will not break stuff unintentionally. The design of the code is very explicitly meant to aid in testing and the entire project was developed with a Test First mindset.

I haven’t touched the codebase in a decade (and it has been close to 15 years since I really delved into it). The code itself was written in .NET 1.1 around the 2006 timeframe. It literally predates generics in .NET.

It compiles and runs all tests when I try to run it, which is great. But it is still very much a legacy codebase.

It is a legacy codebase because changing this code is a big undertaking. This code will not run on modern systems. We need to address issues related to dynamic code generation between .NET Framework and .NET.

That in turn requires a high level of expertise and knowledge. I’m fairly certain that given enough time and effort, it is possible to do so. The problem is that this will now require me to reconstitute my understanding of the code.

The tests are going to be invaluable for actually making those changes, but the core issue is that a lot of knowledge has been lost. It will be a Project just to get it back to a normative state.

This scenario is pretty interesting because I am actually looking back at my own project. Thinking about having to do the same to a similar project from someone else’s code is an even bigger challenge.

Legacy code, in this context, means that there is a huge amount of effort required to start moving the project along. Note that if we had kept the knowledge and information within the same codebase, the same process would be far cheaper and easier.

Legacy code isn’t about the state of the codebase in my eyes, it is about the state of the team maintaining it. The team, their knowledge, and expertise, are far more important than the code itself.

An orphaned codebase, one that has no one to take care of, is a legacy project even if it has tests. Conversely, a project with no tests but with an actively knowledgeable team operating on it is not.

Note that I absolutely agree that tests are crucial regardless. The distinction that I make between legacy projects and non-legacy projects is whether we can deliver a change to the system.

Reminder: A codebase that isn’t being actively maintained and has no tests is the worst thing of all. If you are in that situation, go read Working Effectively with Legacy Code, it will be a lifesaver.

I need a feature with an ideal cost of X (time, materials, effort, cost, etc). A project with no tests but people familiar with it will be able to deliver it at a cost of 2-3X. A legacy project will need 10X or more. The second feature may still require 2X from the maintained project, but only 5X from the legacy system. However, that initial cost to get things started is the killer.

In other words, what matters here is the inertia, the ability to actually deliver updates to the system.