Ayende @ Rahien

It's a girl

Natural Event Syntax for Rhino Mocks

 

I asked before, but didn't get any conclusive answers, what do you think about this syntax for raising events in Rhino Mocks. I spiked the implementation, and the code blow works. As I said, I don't like the awkward syntax of GetLastEventRaiser(), nor the reliance on EventRaiser.Create(mock, "Load"), because it relies on strings.

Does it make sense? Readable? Maintainable?

[Test]
public void Raise_FromEventRaiser_RaiseTheEvent()
{
    MockRepository mocks = new MockRepository();
    IWithCustomEvents withCustomEvents = mocks.DynamicMock<IWithCustomEvents>();
    mocks.ReplayAll();
    
    bool myEventCalled = false;
    withCustomEvents.MyEvent += delegate { myEventCalled = true; };

    withCustomEvents.MyEvent += EventRaiser.Raise(this, EventArgs.Empty);

    Assert.IsTrue(myEventCalled);
}

I wanted to say that the implementation was simple, but it relies on emitting code at runtime, so is it simple?

Anyway, I am waiting for some additional responses

Wish: Distributable Windows-based Virtual Machines

Sahil Malik points out that Microsoft has found a way to distribute Virtual Machines for Windows by time bombing them. In fact, they now have quite a number of them available for download.

Sahil has another request, to be able to do the same himself:

Extend that time bomb mechanism, so parties other than Microsoft can play. I should be able to create a solution based on MSFT technologies, and hand over a VHD for the world to play.

That is something that I would like to see very much. My company has a lot of virtualization stuff going on, and I would be very interested in being able to legally distribute a VM that can be use to boot-and-go in various scenarios.

I would like to add another request to that, not only do we need to be able to distribute time bombed VMs, we also need a way to white-list them, if we want to use the VM for more than the allowed period. My company has done several projects where deployment has consisted of xcopy the VM file to the VM server, and the booting.

I spoke about it in the past, here

Syntax: Multi Something

As I have already explained, I am doing a lot of work with NHibernate's MultiCriteria and MutliQuery. There are very powerful, but they are also mean that I am working at a level that has a lot of power, but a bit of a complex syntax. I want to improve that, but I am not sure what the best way to do it. Anything here is blog-code, meaning that I didn't even verified that it has valid syntax. It is just some ideas about how this can go, I am looking for feedback.

The idea here is to have a better way to use NHQG expressions, and to remove the need to manually correlate between the index of the added query and the index in the result set. It should also give you better syntax for queries that return a unique result.

new CriteriaQueryBatch()
 .Add(Where.Post.User.Name == "Ayende", OrderBy.Post.PublishedDate.Desc)
   .Paging(0, 10)
   .OnRead(delegate(ICollection<Post> posts) { PropertyBag["posts"] = posts; })
 .Add(Where.Post.User.Name == "Ayende")
   .Count()
   .OnRead(delegate(int count) { PropertyBag["countOfPosts"] = count; })
 .Execute();
 

Waiting for you thoughts...

You really want to rethink your localization

imageThis is from a advertising brochure that my company has distribute, for a virtualization conference that we recently did.

I have included the sponsorships part only, it says "Gold Sponsor: IBM", "Silver Sponsors: VMWare, FilesX".

The problem is that in Hebrew, the word for Silver is the same word for Money.

This has the affect of me reading it as: "Money Sponsors: VMWare, FilesX".

That sounds... crass.

The problem is actually not limited to my company's conferences, but it actually wide spread in Israel. I really wish they would think about a different term.

Query Building In The Domain / Service Layers

Here is an interesting topic. My ideal data access pattern means that there is a single query per controller per request (a request may involve several controllers, though). That is for reading data, obviously, for writing, I will batch the calls if needed. I am making heavy use of the Multi Criteria/Query in order to make this happen.

I have run into a snug with this approach, however. The problem is that some of the services also do data access*. So I may call to the authorization service to supply me with the current user and its customer, and the service will call the DB to find out about this information. I would rather that it would not do that, since that means extra round trips. Now, NHibernate has the idea of a DetachedCriteria, a query that has semi independent life, can be constructed at will and massaged to the right shape anywhere in the code.

Now, I have the following options. Use the normal method:

ICollection<Customer> customers = AuthorizationService.GetAssociatedCustomers();
ICollection<Policy> policies = Repository<Policy>.FindAll(
	Where.Policy.Customer.In(customers)
  );

PropertyBag["customers"] = customers;
PropertyBag["policies"] = policies;

Use DetachedCriteria as a first level building block:

DetachedCriteria customersCriteria = AuthorizationService.GetAssociatedCustomersQuery();
IList results = session.CreateMultiCriteria()
	.Add(customersCriteria)
	.Add(DetachedCriteria.For<Policy>()
		.Add( Subqueries.PropertyIn("id", CriteriaTransformer.Clone(customersCriteria)
.SetProjection(Projections.Id())
) ) ).List(); ICollection<Customer> customers = Collection.ToArray<Customer>(results[0]); ICollection<Policy> policies = Collection.ToArray<Policy>(results[1]); PropertyBag["customers"] = customers; PropertyBag["policies"] = policies;

Remember, I consider Querying a Business Concern, so I like the ability to move the query itself around.

Thoughts?

*  By data access, I mean, they call something that eventually resolves in a DB call.

NHibernate's Xml In

I wrote it because of a particular problem that I have run into, which is not something that I have heard much discussion about. In my application, I need to query the database about a certain data, and the best way to do that would be using an in query. For the purpose of discussion, I want to find all the customers associated with the current user.

Unfortantely, I can't do something as simple as "Where.Customer.User == CurrentUser", because a customer may be associated to a user in many complex and interesting ways (the end result is a 3 pages query, btw). Therefor, I make the calculation of who are the relevant customers for a user when they login, and cache that.

So, I need to ask the database for all the customers assoicated with the user, and since I already know about the user's customers. In() was a natrual way to go. The problem is that I actually began to run into SQL Server 2,100 parameters per query limit when important users (who has a lot of associations) started to use that.

Can you say, major stumbling block? There are several solutions for that, and the one I choose is to extend NHibernate to perform an IN query on an XML datatype, as described here. You can see the implementation here.

Why use an IN on XML instead of a join against the data? I can send it to the database using BulkCopy and then join against that very easily, no? (I describe one such way here)

Using Bulk Copy & Join approach would probably turn out to be faster than an IN on an XPath (haven't tested, though). But as it turn out, I had several reasons for that:

  • Using the BulkCopy & Join approach would mean that I need to perform two DB Queries, instead of one.
  • Someone should be responsible for clearing the joined table at one point, that is another thing to deal with.
  • It requires a two steps process, with no easy way to back out of that if there is a small amount of items that we want to check.
  • Caching

Using the BulkCopy & Join basically means that I have no real way to avoid hitting the database altogether. Using this approach, I am merely adding a (potentially large) parameter to the query, and let NHibernate deals with everything else.

The way you use this is simple:

session.CreateCriteria(typeof(Customer))
    .Add(XmlIn.Create("id", potentiallyLargeAmount))
    List();

It will automatically default to normal IN behavior if you are not running on SqlServer 2005 or if the amount of items that you are checking is smaller than 100.

Hierarchical Containers

This is somewhat of a specific scenario, but let us assume that you have an application where you want to specialize the services of the applications by the current user. If the user belongs to the Northwind customer, you want to have one behavior, and if it belongs to the Southsand customer, you want to have a different behavior. All users from all others customers get the default behavior.

To make it simple, let us talk about NHibernate configuration. You have a default schema that you use for most customers, and you specialize that for those customers that wants extra. This means that you need to keep a session factory per customer, because you have different schema that in the default one (changing the connection string is not enough).

To be clear, this is not about entity inheritance, this is about specialization of the entire application, which I just happened to demonstrate via NH configuration.

Now, Windsor supports this ability by having a parent container and child containers, but Binsor didn't expose this functionality easily, I did some work on it today, and the end result is that you can configure it like this:

We have the global container (implicit to Binsor, since we created it before we run the Binsor script), then we run over the configuration files and create a container per each file. We register them in the ContainerSelector, which is an application level service (below).

import HierarchicalContainers
import System.IO

Component("nhibernate_unit_of_work", IUnitOfWorkFactory, NHibernateUnitOfWorkFactory,
	configurationFileName: """..\..\hibernate.cfg.xml""")
	
Component("nhibernate_repository", IRepository, NHRepository)
Component("container_selector", ContainerSelector)

for configFile in Directory.GetFiles("""..\..\Configurations""", "*.cfg.xml"):
	continue if Path.GetFileName(configFile) == "hibernate.cfg.xml"
	print "Build child configuration for ${configFile}"
	child = RhinoContainer(IoC.Container)
	using IoC.UseLocalContainer(child):
		Component("nhibernate_unit_of_work", IUnitOfWorkFactory, NHibernateUnitOfWorkFactory,
			configurationFileName: configFile)
		Component("nhibernate_repository", IRepository, NHRepository)
	#need to remove both .cfg and .xml
	containerName = Path.GetFileNameWithoutExtension(Path.GetFileNameWithoutExtension(configFile))
	IoC.Container.Resolve(ContainerSelector).Register(containerName, child)

You can use it like this, and enter/leave the context of the a client at will:

RhinoContainer container = new RhinoContainer("Windsor.boo");
IoC.Initialize(container);
ContainerSelector containerSelector = IoC.Resolve<ContainerSelector>();
containerSelector.PrintChildContainers();
using(UnitOfWork.Start())
{
    Console.WriteLine(
        NHibernateUnitOfWorkFactory.CurrentNHibernateSession
            .Connection.ConnectionString
        );
}
using(containerSelector.Enter("Northwind"))
{
    using (UnitOfWork.Start())
    {
        Console.WriteLine(
            NHibernateUnitOfWorkFactory.CurrentNHibernateSession
                .Connection.ConnectionString
            );
    }
}
using (containerSelector.Enter("Southsand"))
{
    using (UnitOfWork.Start())
    {
        Console.WriteLine(
            NHibernateUnitOfWorkFactory.CurrentNHibernateSession
                .Connection.ConnectionString
            );
    }
}

Because this is a fairly complex topic, I have created a simple reference implementation that you can get here:

https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/trunk/SampleApplications/HierarchicalContainers

Exceptions Usability

I just made a small change to the EnsureMaxNumberOfQueriesPerRequestModule, when it detects that the amount of queries performed goes beyond the specified value, it would also include the queries that it detected in the exception message. Very minor change, but the affect is that I can just scroll the page and say: "Oh, I have a SELECT N+1 here", directly off the exception page.

On a side note, I am getting better at optimizing NHibernate based application, and I strongly suggest anyone using NHibernate to look at Multi Query in 1.2 (and Multi Criteria on the trunk) for those kind of things. It give you quite a bit of power.

I run into several places today that we would read & write large amount of data on a single request, and that triggered the max query limit. It took a while, but some interesting usage of both batching & multi queries dropped the database roundtrips by an order of magnitude. Nice.

Efficently loading deep object graphs

Here is an interesting approach to get deep object graphs effectively.  This will ensure that you will get all the relevant collections without having to lazy load them and without a huge cartesian product. Especially useful if you want to load a collection of items with the associated deep object graph.

public Policy GetPolicyEagerly(int policyId)
{
	IList list = ActiveRecordUnitOfWorkFactory.CurrentSession.CreateMultiQuery()
		.Add(@"from Policy policy left join fetch policy.PolicyLeadAssociations
where policy.Id = :policyId") .Add(@"from Policy policy left join fetch policy.PolicyEmployeeAssociations
where policy.Id = :policyId") .Add(@"from Policy policy left join fetch policy.PolicyManagerAssociations
where policy.Id = :policyId
") .Add(@"from Policy policy left join fetch policy.PolicyDepartmentAssociations
where policy.Id = :policyId
") .Add(@"from Policy policy left join fetch policy.PolicyCustomerAssociations
where policy.Id = :policyId
") .SetEntity("policy", Policy) .List(); IList firstResultList = (IList) list[0]; if(firstResultList.Count==0) return null; return = (Policy) firstResultList [0]; }

The domain above is a fake one, by the way, don't try to make any sense of it.

Shocking Rob

I am posting this mainly because I want to see how far I can shock Rob Conery

image

The exception is raised by the EnsureMaxNumberOfQueriesPerRequestModule, and it is currently set on the development level, for QA/Staging, I would probably reduce it further, although I have some pages where I

Oh, and to Rob, that was a classic error of doing query per node (instead of doing a single query) (added an eager load instead of a query and was done). I am doing some performance tuning right now, and all in all, it is very boring. Find a hot spot, consolidate data access, use MultiCriteria or MultiQuery, move on.

Imprisoning Mort

Nick Malik responded to the discussion around his Tools for Mort post. He has a very unique point of view.

If you cannot make sure that Mort will write maintainable code, make him write less code.    Then when it comes time for you (not Mort) to maintain it (he can't), you don't.  You write it again.

Okay, so you have a tool that makes sure that Mort doesn't write a lot of code with it. Now Mort has left and I need to maintain the code. How do I do it? I can't do it with the tools that Mort has used, because there are intentionally crippled. You guessed it, time to rewrite, and it is not just a rewrite of Mort's code, it is a rewrite that would need to add functionality that existed in Mort's framework, but crippled so Mort wouldn't damage himself with it.

Sorry, that is the wrong approach to take.

Someone 'smart' has written the 'hard' stuff for Mort, and made it available as cross cutting concerns and framework code that he doesn't have to spend any time worrying about.  Mort's code is completely discardable.

I thought that Pie-In-The-Sky frameworks were already widely acknowledged as a Bad Thing.

Does Mort put process first or people first?  He puts people first, of course.  He writes the code that a customer wants and gets it to the customer right away.  The customer changes the requirements and Mort responds.  If it sounds like a quick iteration, that is because it is.

Agile doesn't mean iterations, agile means working software and enabling change. You says that Mort can respond quickly to changes in the application, but is only during the first few iterations, after that, Mort is too busy fighting with the code to be able to significantly add any value to the application.

Possible Answer: We can have Mort consume a service.  He can't change it.  He can't screw it up.  But he can still deliver value

I really don't have your faith in Mort's inability to screw things up. What do you do when Mort decide to "improve" performance by making the service calls in an endless loop on another thread?

After getting those points across, I would like to protest most strongly about the general tone of Nick's post. It is extremely derogatory toward Mort, and it preclude in advance the ability to improve. I am a Mort on quite a few levels (the entire WinFX stack comes to mind), does this mean that I am bound to write unmaintainable code and should be locked down to a very small set of "safe" choices, which were chosen for me by those Above Me ?

Sorry, I really can't accept this approach, and while it explains some of the stuff that Microsoft puts out, the only thing that it helps is to stifle innovation and development on the platform. If this is Nick's idea about how things should be, it is very sad. I seriously hope that this isn't the accepted position at Microsoft.

Did you know: Find out if an exception was thrown from a finally block!

This is a big biggie for me, because it enables a much nicer syntax for a lot of stuff. But first, let us show this:

using(new ExceptionDetector())
{
	if(new Random().Next(1,10)%2 == 0)
          throw new Exception();
}

How can you tell, from the ExceptionDetector, if an exception was thrown or not? Well, conventional wisdom, and what I thought about until 15 minutes ago, says that you can't. I want to thank Daniel Fortunov, for teaching me this trick:

public class ExceptionDetector : IDisposable
{
    public void Dispose()
    {
        if (Marshal.GetExceptionCode()==0)
            Console.WriteLine("Completed Successfully!");
        else
            Console.WriteLine("Exception!");
    }
}
Amazing!

Having a good foundation

I talked about the tree problem that I had to create, we need to create something very similar (but for another object graph) today. It took me roughly three days to get it all working correctly, it took Rinat about two and a half hours to get it to work on the next set.

Documentation can be ambiguous in the most insidious ways

Frans had a long comment to my last post, I started to reply in a comment, but it grew too big for that.

 Let's use ObjectBuilder from the entlib as an example. Anyone who hasn't read the code or its non-existing dev docs, go read the code and the unittests, then come back here and explain in detail how it works inside and proof you're right.

xJust to answer that, I have read the OB source, and I have never bothered to look at whatever documentation that exists for it.
After reading the code and the tests, I was able to extend OB to support generic inferencing capabilities. (Given IRepository<T> whose implementor is NHibernateRepository<T>, when asked for IRepository<Customer>, return NHibernateRepository<Customer>). That is an advanced feature of IoC, added to a container that I wasn't familiar with, without affecting the rest of the functionality.

Oh, and while I can probably give a short description of how OB  works, I am by no means an expert, nor can I really explain how OB works, but I was able to go in, understand the area that I wanted to modify, make a change that greatly benefit me, without breaking anything else. That is the value in maintainable code.

And this is for the people that thinks that I bash the P&P code. I made similar changes to ObjectBuilder and to Windsor, at around the same time frame. I had harder time dealing with the ObjectBuilder, but that is probably due to unfamilariarity with the project.
The simple fact that I was able to do a significant change without having to grok the entire ObjectBuilder says quite a but the quality of the code.

 

If you can reverse engineer that from unit-test code, well, good for you and I'm sure your boss will be very happy to hear that you won't create a single bug ever again

This is something that you have repeated several times, and I want to explicitly disagree with this statement: understanding the code doesn't mean no bugs, it means less bugs, for sure, but no zero bugs. Most bugs occur not because you misunderstand what the code does, but because of simple mistake (if instead of if not, for instance) or not considering this particular scenario (who would ever try to paste 2Mb of text here?). Understanding helps reduce it, but it can't eliminate it. I doubt you can make the claim that LLBLGen has no bugs.

What I find a little funny is that you apparently forget which kind of comments were placed inside the nhibernate sourcecode before it went v1.0: things like "// I have no idea what this does" or "// why is this done here?" or similar comments. Apparently, the people who ported the hibernate code over to .NET didn't understand how it worked by simply looking at the code AND with all the unittests in mind.

There are similar comments there right now, and they exists in the original Hibernate source code as well. I will freely admit that there are parts of NHibernate that I have no idea how they work. What I do know is that this lack of knowledge about the way some parts work has not hinder my ability to work with NHibernate or extend it.

By digging into sourcecode and understanding what it precisely does already takes a lot of time as you have to parse and interpret every line in the code and REMEMBER the state of the variables it touches! Can you do that in your head? I can't.

That is not something that I can do, due to this limitation, I am working in a way that ensures that I do not need to understand the entire system and the implications of each and every decision at any given one point. I consider this approach a best practice, because this means that I can work on a piece of code without having to deal with the implications of dozens other components being affected. Documentation wouldn't help here, unless I would have a paragraph per line of code, and keep them in sync at all times, and remember to read it at all times.

Add to that the wide range of decisions one has to make to build a system like that and with just the end-result in your hand it's a hell of a job to come to the level where you understand why things are done that way.

I disagree, I can determain intent from code and from tests, and I can change the code and see tests breaks if necessary. That is much faster way then trying to analyze the flow of each instruction in the application.

Typical example: saving an entity graph recursively in the right order. That's a heck of a complex pipeline, with several algorithms processing the data after eachother, and the code branches out to a lot of different subsystems.

If one can determine by JUST LOOKING AT THE CODE why it is designed the way it is, more power to him/her, but I definitely won't be able to do so. What I find surprising is that some people apparently think they can, with no more info than the end result of the code.

You wouldn't be able to understand that from the end result of the code, but having a test in place will allow you to walk through that in isolation, if needed, and understand what is going on. Here is a secret, I can usually understand what NHibernate is doing in such scenarios without looking at the code. Because the logic is fairly straight-forward to understand (but not to implement). I taught myself NHibernate by building the NHibernate Query Analyzer, no documentation, very little help from other people at the time, but a lot of going through the code and grokking the way it works.

What I find even more surprising is that it apparently is a GOOD thing that there's no documentation.

No, what I am saying is that I would rather have a good code with unit tests than code that has extensive documentation. Again, I am not documentation at the right level, high level architecture, broad implementation notes, build scripts documentation. To go further from that seems to be to get to the point of diminishing returns.

It apparently is EASIER to read code, interpret every line, remember every variable's state it touches, follow call graphs all over the place, write down pre-/post- conditions along the way

That is the part that I think we disagree about, I don't need to do that.

Perhaps they're payed by the hour

Around 90% of the time that I spent on NHibernate is time that goes out of my own free time, not paid for anyone. You can rest assure that I care a lot about not wasting my own time. This is the real world, and good, unit tested, code is maintainable, proved by the fact that people goes in and make safe changes to it, without having to load all the premutations of the system to your head.

Navigating large code bases

I just needed to find an answer to a question in MonoRail. MonoRail code base is over 75,000 lines of code, including all the supporting projects and tests. Castle.MonoRail.Framework has about 37,700 lines of code. The question was something that I never really thought about, and had no idea where to start looking at. I opened the solution and started hunting. It took about five or six minutes to find the correct spot, another two to verify my assumption and be shocked that there is a private method there :-) and then I was done.

Under ten minutes, to find a question that I never thought about in a significant code base. ReSharper helps, of course, but nothing beats well structured code for maintainability.

Oh, and MonoRail has very little implementation documentation.

Working software over comprehensive documentation

Frans has a long post about how important is documentation for the maintainability of a project. I disagree.

Update: I have another post in this subject here.

Before we go on, I want to make sure that we have a clear understanding of what we are talking about, I am not thinking about documentation as the end user documentation, or the API documentation (in the case of reusable library), but implementation documentation about the project itself. I think that some documentation (high level architecture, coding approach, etc) are important, but the level to which Frans is taking it seems excessive to me.

The thing is though: a team of good software engineers which works like a nicely oiled machinery will very likely create proper code which is easy to understand, despite the methodology used.

So, good people, good team, good interactions. That sounds like the ideal scenario to me. I can think of at least five different ways in which a methodology can break apart and poison such a team (assigning blame, stiff hierarchy, overtime, lack of recognition, isolate responsibilities and create bottlenecks). Not really a good scenario.

The why is of up-most importancy. The reason is that because you have to make a change to a piece code, you might be tempted to refactor the code a bit to a form which was rejected earlier because for example of bad side-effects for other parts.

And then I would run the tests and they would show that this causes failure in another part, or it would be caught in QA, or I would do the responsible thing and actually get a sense of the system before I start messing with it.

If you don't know the why of a given routine, class or structure, you will sooner or later make the mistake to refactor the code so it reflects what wasn't the best option and you'll find that out the hard way, losing precious time you could have avoided.

This really has nothing to do with the subject at hand. I can do the same with brand new code, to go on in a tangent somewhere, it is something that you have to deal with in any profession.

That's why the why documentation is so important: the documented design decisions: "what were the alternatives? why were these rejected?" This is essential information for maintainability as a maintainer needs that info to properly refactor the code to a form which doesn't fall into a form which was rejected.

Documentation is important, yes, but I like it for the high level overview. "We use MVC with the following characteristics, points of interest in the architecture include X,Y,Z, points of interest in the code include D,B,C, etc". But I stop there, and rarely updates the documents beyond that. We have the overall spec, we have the architectural overview and the code tourist guide, but not much more. Beyond that, you are suppose to go and read the code. The build script usually get special treatment, by the way.

This also assumes that the original builders of the systems was omniscient. Why shouldn't I follow a form that was rejected? Just because the original author of an application thought that Xyz was the end-all-be-all of software, doesn't means that Brg isn't a valid approach and should be considered. It should not surprise you that I reject the idea out of hand.

Code isn't documentation, it's code. Code is the purest form of the executable functionality you have to provide as it is the form of the functionality that actually gets executed, however it's not the best form to illustrate why the functionality is constructed in the way it is constructed.

Code can be cumbersome to express the true intent with, that is why I am investing a lot of time coming up with intent revealing names and pushing all the infrastructure concerns down. The best way to illustrate why certain functionality exists in such a way is to cover it with tests, that was you can see intended usage and can follow the train of thoughts of the previous guy. I routinely head off to the unit tests of various projects to get an insight about such things can work.

I've seen technical documents which did make a lot of sense and were essential to understanding what was going on at such a level that making changes was easy.

Frans, at what level where they? Earlier you were talking about routine level, but I want to know what you think is the appropriate documentation coverage for a system?

If your project consists of say 400,000 lines of code, it's not a walk in the park to even get a slightest overview where what is located without reading all of those lines if there's no documentation which is of any value.

The problem here is that you make no assumption about the state of the code. I would take undocumented 400,000 LOC code base that has (passing) unit tests over one that had extensive documentation but little to no tests any time. The reasoning is simple, if it is testable, it is maintainable, period. Yes, it would take time to wrap my head around a system this size, but I can most certainly do it, and unit tests allows me to do a lot of things safely. Assume that you have extensive documentation, what happens when the code diverge from the Documentation?

You see, documentation isn't a separate entity of the code written: it describes in a certain DSL (i.e human readable and understandable language) what the functionality is all about; the code will do so in another DSL (e.g. C#). hats the essential part: you have to provide functionality in an executable form. Code is such a form, but it's arcane to read and understand for a human (or is your code always 100% bugfree when you've written a routine? I seriously doubt it, no-one is that good), however proper documentation which describes what the code realizes is another.

Documentation is not a DSL, and it is most certainly not understandable in many cases. Documentation can be ambiguous in the most insidious ways. The code is not another DSL, this assumes that the code and the documentation are somehow related, but the code is what actually run, so that is the authoritive  on any system. Documentation can help understanding, but it doesn't replace code, and I seriously doubt that you can call it a DSL. The part that bothers me here is that the documentation is viewed as executable form, unless you are talking about something like FIT, that is not the case. I can't do documentation.VerifyMatch(code);

When I need to make a change and need to know why a routine is the way it is, I look up the design document element for that part and check why it is the way it is and which alternatives are rejected and why. After 5 years, your own code also becomes legacy code. Do you still maintain code you've written 2-3 years ago? If so, do you still know why you designed it the way it is designed and also will always avoid to re-consider alternatives you rejected back then because they wouldn't lead to the right solution?

I still maintain code that I wrote two years ago (Rhino Mocks come to mind), and I kept very little documentation about why I did some things. But I have near 100% test coverage, and the ability to verify that I still have working software. Speaking on the paid side of the fence, a system that I have started written two years ago has gone two major revisions in the mean time, and is currently being maintained by another team. I am confident in my ability to go there, sit with the code, and understand what is going on. And of course, "I have no idea why I did this bit" are fairly common, but checking the flow of the code, it is usually clear that A) I was an idiot, B) I had this and that good reason for that. Sometimes it is both at the same time.

It needs pointing out again, what was true 5 years ago is something that you really need to reconsider today.

What's missing is that a unit test isn't documenting anything

And here I flat out disagree. Unit tests are a great way to document a system in a form that keeps it current. Reading the tests for a class can give you a lot of insight about how it is supposed to be used, and what the original author thought about when he built it.

It describes the same functionality but in such a different DSL that a human isn't helped by wading through thousands and thousands of unit tests to understand what the api does and why.

Not really any different than wading through thousands of pages of documentation, which you can't even be sure to be valid.

Using unit tests for learning purposes or documentation is similar to learning how databases work, what relational theory is, what set theory is etc. by looking at a lot of SQL queries.

The only comment that I have for this: That is how I did it.

Wouldn't you agree that learning how databases work is better done by reading a book about the theory behind databases, relational theory, set theory and why SQL is a set-oriented language?

Maybe, but I believe that the best way to learn something is to use it in anger and there is absolutely nothing that can beat having something to play with and explore.

Understanding Bad Code

Frans' contribution to the conversation about maintainable code deserve its own post, but I would like to mention this part in particular:

you will not [understand the code]. Not now, not ever. And not only you, but everyone out there who writes code, thus that includes me as well, will not be able to read code and understand it immediately.

You know what? That is not limited to bad code. I had hard time grokking good code bases, simply because of their size and complexity (NHibernate and Windsor comes to mind). Other code bases are as large, but they have easier approachability, probably because they are dealing with less complex domain an tend to have a wide coverage rather than deep (MonoRail comes to mind).

A while ago I was involved with an effort to migrate an ancient system to SQL Server 2005. The system compromised of over 100,000 lines of code, spread over some thousands of files scattered randomly in a case sensitive file system (you can guess why this is significant). The code base was about 85% SQL and 15% bash shell scripts. The database in question was a core system and contained slightly over 4,000 tables. One of the core tables was called tmp1_PlcyDma and was used to do business critical processing. That code base took data driven code generation to a level I have never seen before. I gave up trying to track down 7(!) levels of code->generating code->execute code->generating code->rinse->repeat

To say that the code base was bad is quite an understatement. To mention that the only place where I could run the code was by using a telnet console into a test environment that was not identical to production is only the start. I could mention no debugging, runtime of ~5 hours, test time of ~3 hours, etc. The code grew organically over a ten years period and you could track the developer progress from merely annoying to criminally insane (he invented his own group by construct, using triple nested cursors and syntax so obscure that even the DBA that worked with the system for the last 5 years had no idea what was going on).

Perhaps the thing that I remember most from this project that we had a bug that kept two people hunting after it for three weeks. The issue was a missing ';'. Oh, and the criteria for success in this project was successful migration, with bug-per-bug compatibility, and no one really knew what it did, including the authors(!).

But, you know what, after a month or so of looking at the code, it got to the point where I could look at something like pc_cl_mn.sql and know that it would contains the monthly policy calculation, and that this piece of code was doing joins manually via cursors again, that plcy_tr_tmp.sql was the "indexes priming" script, etc, etc.

The code was still a horror, but once you understood that the authors of this code had a... "special" way of looking at databases, you got to the point where you could get the point of the code in an hour instead of a day, and then move that to a saner approach.

So, what does this horrifying story has to do with Frans' point above?

The premise that you can read and understand code immediately is highly dependant on what you are familiar with. I know of no one that can just sit in front of an unfamiliar  code base and start producing value within the first ten minutes. But on a good code base, you should be able to be able to start producing value very quickly.

People over Code

While there is value in the item on the right, I value the item on the left more.

This is in response to a comment by Jdn, I started to comment in reply, and then I reconsidered, this is much more important. A bit of background. Karthik has commented that "Unfortunately too often many software managers fall into the trap of thinking that developers are "plug and play" in a project and assume they can be added/removed as needed." and proceeded with some discussion on why this is and how it can be avoided.

I responded to that by saying that I wouldn't really wish to work with or for such a place, to be precise, here is what I said:

I would assert that any place that treats their employee in such a fashion is not a place that I would like to work for or with.
When I was in the army, the _ultimate_ place for plug & play mentality, there was a significant emphasis on making soldiers happy, and a true understanding of what it was to have a good soldier serving with you. Those are rare, and people fight over them.
To suggest that you can replace one person with another, even given they have the same training is ludicrous

From personal experience, when I was the Executive Officer of the prison, the prison Commander has shamelessly stole my best man when I was away at a course, causing quite a problem for me (unfortunately not something that you can just plug & play). That hurt, and it took about six months to get someone to do the job right, and even then, the guy wasn't on the same level. (And yes, this had nothing to do with computers, programming, or the like.)

Now, to Jdn's comment:

In a perverse way, I can see, from the perspective of a business, why having good/great developers, who bring in advanced programming techniques, can be a business risk.
[...snip...] you have to view all employees as being replaceable, because the good/great ones will always have better opportunities (even if they are not actively looking), and turnover for whatever reason is the norm not the exception.
Suppose you are a business with an established software 'inventory', and suppose it isn't the greatest in the world. But it gets the job done, more or less. Suppose an Ayende-level developer comes in and wants to change things.  We already know he is a risk because he says things like:
"not a place that I would like to work for or with."

If you view me as replaceable, I will certainly have an incentive to moving to somewhere where I wouldn't be just another code monkey. Bad code bothers me, I try to fix that, but that is rarely a reason to change a workplace. I like challenges. And there are few things more interesting than a colleague's face after a masterfully done shift+delete combination.
What I meant with that is that I wouldn't want to work for a place that thought of me and my co-workers as cogs in a machine, to be purchased by the dozen and treated as expendable.

You know what the most effective way to get good people? Treating them well, appreciating their work and making them happy. If a person like what they are doing, and they like where they are doing it, there would need to be a serious incentive to moving away. A good manager will ensure that they are getting good people, and they will ensure that they will keep them. That is their job.

Mediocre code that can be maintained by a wider pool of developers is in a certain respect more valuable to a business than having great code that can only be maintained by a significantly smaller subset of developers.

At a greater cost over the life time of the project. If you want to speak in numbers the MBAs will understand, you are going to have far higher TCO because you refuse to make the initial investment.

To quote Mark Miller, you can get more done, faster, if you have good initial architecture and overall better approach to software.

Jdn's concludes with a good approach:

I'm offering services for clients.  I can't disrupt their business because I don't think their code is pretty enough.
What I can do better, going forward, is learn to make the incremental changes that gets them on their way to prettier code.  My attitude is *not* "well, I can't do anything so I won't even try."
But at the end of the day, I have to do what is best for the *client*.  If that means typed datasets (picking on them, but include anything you personally cringe over), then I can partial class and override to make them better, but typed datasets it will be.

I would probably be more radical about the way that I would go about it, but the general approach is very similar, especially when you have an existing code base or architecture in place.

Working Effectively with Legacy Code

imageTime and time again, Working Effective with Legacy Code comes up in conversations that I have with like minded fellows. It is a very good guide to working with code, not necessarily legacy one. I have read it a few years ago, and have been vastly impressed, to quote myself:

Working Effectively with Legacy Code is a book that should be a mandatory reading for anyone who is interested in coding for a living.

I consider this book the #1 reason for the existence of Rhino Mocks, and I can't really recommend it heartily enough.

If you haven't read it yet, go and get it.

That and Evans' DDD are on my list of books to re-read, but I am saving that for when I need a serious productively boost. That is one hell of a book to set me off writing good code.

Tags:

Published at

Maintainable, but for whom?

Jdn is making an excellent point in this post:

Okay, so, TDD-like design, ORM solution, using MVP.  Oh, and talk to the users, preferably before you being coding.

One problem (well, it's really more than one).  I know for a fact that I am going to be handing this application off to other people.  I will not be maintaining it.  I know the people who I will be handing it off to, so I know their skill sets, I know generally how they like to code.

None of them have ever used ORM.

None of them do unit testing.  One knows what they are and for whatever reason hates them.  The others just don't know.

None of them have ever used MVP/MVC, and I doubt any but one has even heard of it.

All of them are intelligent, so could grasp all the concepts readily, and become proficient with them over time.  If they are given time by their bosses, or do the work overtime, or whatever.

There is a 'standard' architecture in place that they have worked with for quite some time.  I personally think it blows, and frankly, so do most of them, but it is familiar, and applications can be passed between developers as they use a common style.

There are several things that are going on in this situation.  The two most important ones are that the currently used practice of bad code, is also (luckily) wildly recognized as such and the people who work there are open minded and intelligent.

Before I get to the main point, I want to relate something about my current project. If you wish to maintain it, you need to have a good understanding of OR/M, IoC and MVC.  Without those, you can't really do much with the application. That said, good use of IoC means that it is mostly transparent, and abusing the language give you natural syntax like FindAll( Where.User.Name == "Ayende") for the (simple) OR/M, and MVC isn't hard to learn.

Back to Jdn's post, let us consider his point for a moment. Building the application using TDD, IoC, OR/M, etc would create a maintainable application, but it wouldn't be maintainable by someone who doesn't know all that. Building an application application using proven bad practices will ensure that anyone can hack at it, but that it has much higher cost to maintain and extend.

I am okay with that. Because my view is that having the developers learn a better way to build software is much less costly than continuing to produce software that is hard to maintain. In simple terms, if you need to invest a week in your developers, you will get your investment several times over when they produce better code, easier to maintain and extend and with fewer bugs.

Doing it the old way seams defeatist to me (although, in Jdn's case, he seems to be leaving his current employee, which is something that I am ignoring in this analysis). It is the old "we have always done it this way" approach. Sure, you can use a mule to plow a field, it works. But a tractor would do much better job, even though it require knowing how to drive first.

Redefining reality

image The "Tools For Mort" post from Nick Malik had me check outside to verify that the skies are still blue.

Nick seems to define a Mort as:

Mort works in a small to medium sized company, as a guy who uses the tools at hand to solve problems.  If the business needs some data managed, he whips up an Access database with a few reports and hands it to the three users who need the data.  He can write Excel macros and he's probably found in the late afternoons on Friday updating the company's one web site using Frontpage.

Mort is a practical guy.  He doesn't believe in really heavy processes.  He gets a request, and he does the work. 

So far, he is following the same well known path of describing Mort. The problem is that he then seems to decide that Mort is a super agile guy. Take a look at Sam Gentile's comment:

MSFT is making tools for Morts (the priority) at the expense of every other user (especially Enterprise Developers and Architects). They have nothing for TDD. And I would further contend that making these tools "dumbed down" has significantly contributed to why Morts are Morts in the first place and why they are kept there.

And Nick's response:

Wow, Sam.  I didn't know you had so much animosity for the Agile community!  Are you sure that's what you intended to say? 

Do you really mean that Microsoft should make a priority of serving top-down project managers who believe in BDUF by including big modeling tools in Visual Studio, because the MDD people are more uber-geeky than most of us will ever be?  I hate to point this out, Sam, but Alpha Geeks are not the ones using TDD.  It's the Morts of the programming world.  Alpha geeks are using Domain Specific Languages.

I really have no idea how to respond to such a claim. It certainly doesn't match my experience.

The right UI metaphor

imageAfter some discussion about whatever a tree is the correct UI to show the use for my permissions issue, I decided to see if there is another way to handle that.

This is not the way the UI looks like, obviously, but it should give you an indication about how it works. Gray check mark means that something below this level is checked, a check mark means that it has permission on this node. Note that I can have permission on a node, and permission on sub nodes, and it has different meanings. If I have a permission on a node I have cascading permission to all its children, but a child may be associated with multiply parents (and not always at the same level of the tree, sigh).

In this case, we have Baron that has permission to schedule work for the Tel Aviv's help desk stuff. He also has permissions to schedule work to John & Barbara, no matter in what capacity.

It other words, even thought Baron can assign work to Jet, he can only do so when Jet works for the Tel Aviv Help Desk,  he cannot assign Jet to work in the Pesky Programmers role. He can do that to John & Barbara (assuming he has the rights to do assign work on the Pesky Programmers department, of course, which is another tree.

The idea is that you can assign detailed permissions to any parts of the tree that you are interested in. There is another screen that allows you to find the hierarchy of objects if you are really interested (not shown here).

Naturally, permissions are many to many, the tree is many to many and I have a headache just try to figure it out. Just to point out, this is done on a web application, and the complexity is that the real tree has about two thousands entries at the lowest level (and ~7 at the top most level), so you need to get data lazily from the server, but, you also need to display the grayed check box, so the user will know that a child node is marked, that was the main difficulty, actually.

So, I am open for ideas about how to design this better.

Using partials in Web Forms

Partial is a MonoRail term to a piece of UI that you extract outside, so you can call it again, often with Ajax. This is something that is harder to do in WebForms. Yesterday I found an elegant solution to the problem.

ASPX code:

<div id="UserDetailsDiv">
   <ayende:UserDetails runt="server" ID="TheUserDetails"/>
</div>

User Control:

<div>
Name: <asp:label ID="Name" runat="server"/> <br/>
Email: <asp:label ID="Email" runat="server"/> 
</div>

Client Side Code:

function changeUser(newUserId, div)
{
	var srv = new MyApp.Services.UserDetails();
	srv.GetUserDetailsView(newUserId, onSucccessGetUserDetailsView, null, div);
}

function onSucccessGetUserDetailsView(response, userContext)
{
	var div = $(userContext);
	div.innerHTML = response;
	new Effect.Highlight(div);
}

Web Service Code:

[WebMethod(EnableSession = true)]
public string GetUserDetailsView(int userId)
{
	User user = Controller.GetUser(userId);
	//there may be a better way to do this, I haven't bothered looking
	UserDetails userDetails = (UserDetails)new Page().LoadControl("~/Users/UserControls/UserDetails.ascx");
	userDetails.User = user;
	userDetails.DataBind();
	using(StringWriter sw = new StringWriter())
	using(HtmlTextWriter ht = new HtmlTextWriter(sw))
	{
		userDetails.RenderControl(ht);
		return sw.GetStringBuilder().ToSTring();
	}
}

The myth of the all inclusive meta-entity

Jeff Brown has a good post about information in software:

There is no technical reason preventing software applications from adopting common standards in the representation of their information.

[...snip...]

Would software interoperability improve if we could just agree on common meta-classes for data structures?

[...snip...]

In any case, it bothers me profoundly that software is so vertical. There is too little common ground. Each application contains a wealth of information but remains steadfastly inaccessible.

It is worth point out that most organization can't agree on what something as fundamental as the Customer within the organization. This is because different parts of the organization are responsible for different aspects of the customer, and they have radically different needs. 

As Jeff points out, software that is open & extensible usually carry a price tag of six figures as well as a hefty customization fee. That is just the nature of the beast, because being a generalist costs, because the business doesn't care if you you can handle fifty different ideas of customers, they want your to fit their idea of customer, do it well, and fit with the different view of a customer within the organization. That doesn't come easily.