Why I hate implementing Linq
Linq has two sides. The pretty side is the one that most users see, where they can just write queries and C# and a magic fairy comes and make it work on everything.
The other side is the one that is shown only to the few brave souls who dare contemplate the task of actually writing a Linq provider. The real problem is that the sort of data structure that a Linq query generates has very little to the actual code that was written. That means that there are multiple steps that needs to be taken in order to actually do something useful in a real world Linq provider.
A case in point, let us take NHibernate’s Linq implementation. NHibernate gains a lot by being able to use the re-linq project, which takes care of a lot of the details of linq parsing. But even so, it is still too damn complex. Let us take a simple case as an example. How the Cacheable operation is implemented.
( from order in session.Query<Order>() where order.User == currentUser select order ).Cacheable()
Cacheable is used to tell NHibernate that it should use the 2nd level cache for this query. Implementing this is done by:
- Defining an extension method called Cachable:
- Registering a node to be inserted to the parsed query instead of that specific method:
- Implement the CacheableExpressionNode, which is what will go into the parse tree instead of the Cacheable call:
- Actually, the last thing was a lie, because the action really happens in the CacheableResultOperator, which is generated by the node:
- Now we have to process that, we do that by registering operator processor:
- That processor is another class, in which we finally get to the actual work that we wanted to do:
public static IQueryable<T> Cacheable<T>(this IQueryable<T> query)
MethodCallRegistry.Register(new[] { typeof(LinqExtensionMethods).GetMethod("Cacheable"),}, typeof(CacheableExpressionNode));
public class CacheableExpressionNode : ResultOperatorExpressionNodeBase
protected override ResultOperatorBase CreateResultOperator(ClauseGenerationContext clauseGenerationContext) { return new CacheableResultOperator(_parseInfo, _data); }
ResultOperatorMap.Add<CacheableResultOperator, ProcessCacheable>();
public void Process(CacheableResultOperator resultOperator, QueryModelVisitor queryModelVisitor, IntermediateHqlTree tree) { NamedParameter parameterName; switch (resultOperator.ParseInfo.ParsedExpression.Method.Name) { case "Cacheable": tree.AddAdditionalCriteria((q, p) => q.SetCacheable(true)); break;
Actually, I lied. What is really going on is that this is just the point where we are actually registering our intent. The actual code will be executed at a much later point in time.
To foretell the people who knows that this is an overly complicated mess that could be written in a much simpler fashion…
No, it can’t be.
Sure, it is very easy to write a trivial linq provider. Assuming that all you do is a from / where / select and another else. But drop into the mix multiple from clauses, group bys, joins, into clauses, lets and… well, I could probably go on for a while there. The point is that industrial strength Linq providers (i.e. non toy ones) are incredibly complicated to write. And that is a pity, because it shouldn’t be that hard!
Comments
Glad you like re-linq!
We tried to remove every kind of complexity that is not inherent in the differences between LINQ and the target language. But there's always something more you can do, we're open to suggestions or contributions here. Adding provider-specific hints like caching or eager fetching might be an area where we can improve the experience even more.
But at the end of the day, a fully featured transformation from LINQ to anything that is not LINQ will always be a lot of work. We're just glad we reduced it from insanely difficult to very difficult ;-)
I thought that remotion does a lot of things, but as I see there are still some nasty implementations behind LINQ provider.
@scooletz here's a list of what re-linq does: http://relinq.codeplex.com/
if you have to do all of that too, the problem goes from hard to impossible fast for anybody who operates under normal time constraints... but it's not a piece of cake with re-linq either. it's hard enough to figure out what kind of translation exactly you want, and you need to implement it too ;-)
What Ayende describes here is a minor inconvenience compared to the kinds of things you have to solve when translating from a powerful, orthogonal language like LINQ to the mess that SQL sometimes is. (I guess HQL is closer to LINQ that way, which makes it easier for NH.)
LINQ by itself doesn't magically remove the impedance mismatch, it just moves it to the infrastructure layer. and solving problems generically is always more difficult.
I've often complained about NHibernate 3.0's LINQ provider being lacking compared to LINQ 2 SQL or EF, but reading about the complexity of implementing a LINQ provider, I can see why. All I can say is keep up the good work guys.
PS: Just noticed NHibernate 3.0 Beta 1 is out, when did that happen?
@Stefan Wenig,
I didn't make myself clear: "I thought that remotion is a silver bullet" - that more precisely describes what my imagination of remotion was:) I did like the 'who operates under normal time constraints' part :P
I really like the picture you uploaded. It shows two-faced nature of linq. Well done!
It actually is a silver bullet. you just need to adapt your requirements ;-)
re-linq (part of the re-motion project, but stripped of any dependencies by now) solves the problems that the inner workings of LINQ (namely, the transformed IQueryable expression tree) introduces. It cannot solve the impedance mismatch between the LINQ language and the target language, which leaves a lot to be done. (Unless the target language is SQL, which we built a backend for that's soon going to be released. It's in SVN already.)
Let me put it this way: re-linq makes it as easy as it can be, as it should have been from the start, and as you should be able to expect. You get to a working solution quick, and can then more powerful transformations one by one. But some things just don't translate easily, such as group-by in the case of SQL.
I guess you really have to dive deep into the atrocities that IQueryable offers to really appreciate what re-linq does for you ;-)
@scooletz
As someone who tried to do a linq provider without re-linq, I can tell you that the difference is between giving up entirely and being able to provide a production ready, fully functional linq provider.
Linq is simply complex, incredibly complex.
Writing a linq provider is what, I think it was Kathleen Dollard, describes as "Survival Programming", and I really like that term: you constantly struggle to stay alive and keep things working, but there's always a query just around the corner which will prove your work and time weren't enough and it falls apart. Even when you know that knowledge.
I have no illusion about whether MS will ever release a layer where things are easier for linq provider writers, when looking at the command tree stuff they pass to an ef provider: it's the same hell, and every provider writer has to re-do it all over again, instead of filling in blanks (as SQL is, you know, actually pretty similar looking across all databases)
re-linq helps, but it's far from a turn-key framework where you just fill in the blanks and you're on your way: the hard stuff is still on your plate, however things like keeping track of what sources were referenced for example are taken care of (as well as other preprocessor stuff) which are extremely handy and a big time saver.
@Ayende: can I use this as a testimonial? ;-)
Also, I might decide to steal the two-face metaphor on occation if you don't object!
Typo above: "[not LINQ, but] re-linq by itself doesn't magically remove the impedance mismatch, it just moves it to the infrastructure layer."
@Frans: it depends. The re-linq front-end is target-language agnostic. The remaining problem is the huge difference between LINQ and SQL, and re-linq's front-end can't possibly take this into account.
OTOH, the new back-end specifically transforms to SQL, and solves exactly these problems. We're always considering moving stuff from back- to front-end, so that non-SQL providers would be able to use it too. But most of the time, it's really just SQL-specific.
A provider that translates to XQuery would have to solve very different problems. Even a SQL-inspired language like HQL is too different to actually share a significant amount of code between SQL and HQL providers. (If someone does have good ideas: always welcome at http://groups.google.com/group/re-motion-users!)
You say that MS probably never will release a smooth layer for provider makers. What would you expect from such a layer that re-linq is missing?
Checkout LINQ, Take Two – Realizing the LINQ to Everything Dream from 30 mins
player.microsoftpdc.com/.../bfa72307-6534-41ad-...
Doesn't implementing a LINQ provider essentially amount to writing a compiler of sorts? So no wonder it's difficult.
@contextfree not quite. in the case of SQL and similar target systems, it needs to translate from one high-level declarative language to another. so whenever things are different you have to step back and ask yourself, how would I tell the target system to solve this problem in its own language? (a compiler would rather have to emit the code that actually solves the problem)
all i can say keep up good work
lazy bastards
just use sql
linq causes more problems than it solves
Comment preview