Why I hate implementing Linq

time to read 6 min | 1200 words

Linq has two sides. The pretty side is the one that most users see, where they can just write queries and C# and a magic fairy comes and make it work on everything.

The other side is the one that is shown only to the few brave souls who dare contemplate the task of actually writing a Linq provider. The real problem is that the sort of data structure that a Linq query generates has very little to the actual code that was written. That means that there are multiple steps that needs to be taken in order to actually do something useful in a real world Linq provider.

A case in point, let us take NHibernate’s Linq implementation. NHibernate gains a lot by being able to use the re-linq project, which takes care of a lot of the details of linq parsing. But even so, it is still too damn complex. Let us take a simple case as an example. How the Cacheable operation is implemented.

(
  from order in session.Query<Order>()
  where order.User == currentUser
  select order
).Cacheable()

Cacheable is used to tell NHibernate that it should use the 2nd level cache for this query. Implementing this is done by:

  1. Defining an extension method called Cachable:
  2. public static IQueryable<T> Cacheable<T>(this IQueryable<T> query)
  3. Registering a node to be inserted to the parsed query instead of that specific method:
  4. MethodCallRegistry.Register(new[] { typeof(LinqExtensionMethods).GetMethod("Cacheable"),}, typeof(CacheableExpressionNode));
  5. Implement the CacheableExpressionNode, which is what will go into the parse tree instead of the Cacheable call:
  6. public class CacheableExpressionNode : ResultOperatorExpressionNodeBase
  7. Actually, the last thing was a lie, because the action really happens in the CacheableResultOperator, which is generated by the node:
  8. protected override ResultOperatorBase CreateResultOperator(ClauseGenerationContext clauseGenerationContext)
    {
        return new CacheableResultOperator(_parseInfo, _data);
    }
  9. Now we have to process that, we do that by registering operator processor:
  10. ResultOperatorMap.Add<CacheableResultOperator, ProcessCacheable>();
  11. That processor is another class, in which we finally get to the actual work that we wanted to do:
  12. public void Process(CacheableResultOperator resultOperator, QueryModelVisitor queryModelVisitor, IntermediateHqlTree tree)
    {
        NamedParameter parameterName;
    
        switch (resultOperator.ParseInfo.ParsedExpression.Method.Name)
        {
            case "Cacheable":
                tree.AddAdditionalCriteria((q, p) => q.SetCacheable(true));
                break;

 

Actually, I lied. What is really going on is that this is just the point where we are actually registering our intent. The actual code will be executed at a much later point in time.

To foretell the people who knows that this is an overly complicated mess that could be written in a much simpler fashion…

No, it can’t be.

Sure, it is very easy to write a trivial linq provider. Assuming that all you do is a from / where / select and another else. But drop into the mix multiple from clauses, group bys, joins, into clauses, lets and… well, I could probably go on for a while there. The point is that industrial strength Linq providers (i.e. non toy ones) are incredibly complicated to write. And that is a pity, because it shouldn’t be that hard!