The cost of inappropriate linqing

time to read 1 min | 100 words

I read this post with interest, apparently Linq for SQL is doing something odd, because I can't quite believe the results that this guy is getting. From the post:

So I dug into the call graph a bit and found out the code causing by far the most damage was the creation of the LINQ query object for every call! The actual round trip to the database paled in comparison

I can't imagine what they are doing there to cause this performance characteristics.

From my own experiments with Linq, it should definitely not produce this amount of efforts.

Tweet Share Share 8 comments

Tags:

Comments

04 Dec 2007
20:09 PM

Alex Henderson

I've just experienced this myself with a client while developing a solution - not using Linq to SQL - but using something home-brewed to work with their in-house ORM... caused by a naive implementation of the Linq expression parser, and a costly metadata-querying layer we ended up with very simple queries having the same characteristic - the Linq cost was dwarfing the sql execution cost. for one-off compilation.. Easily fixed with the aid of a profiler, but it took me by surprise - I'm so used to sql being the bottle neck.

05 Dec 2007
09:08 AM

Frans Bouma

The main problem is the expression tree. As it can have any shape or form, you end up with 'visitors' (not really visitors, the visitor pattern isn't the same) which traverse the tree, transform a couple of nodes and move to the next phase.

The downside of this is that a node can be visited a lot of times. In simple Linq providers, which only do where and select for example, it's not that much of a problem. The things start getting serious when you want to handle group joins, nested aggregates, selects on one or both sides of a join etc. You then have to do some serious expression tree transformations which aren't doable in 1 pass: you don't know what's up ahead at a given location in the code, as you for example just see a single node, you don't have a lookahead.

So if you need info about the tree up-ahead, you use a visitor to check what's up ahead. For example if you need an object from the subtree you can use a visitor to get that for you.

I'm now at 70% or so of implementing a Linq provider (groupby/orderby are still left) and I have just a fraction of the amount of visitors MS has in their code (according to reflector). I think they went a little overboard in the visitor stuff. For example: if you pass a tree once in full, you can collect a lot of data from it, e.g. assign aliases, see scopes etc. You can store all that data in easy access dictionaries and other datastructures. This frees you from traversing the tree again and again just to get some element out of it.

They do have a problem others like you and I don't have: they have to transform the expression tree COMPLETELY to sql. We don't have to do that: we just transform it to our own query api and from there it's no longer transformed by visitors. This is one area where they lose a lot of time I think, because they have to transform the tree a lot of times. (I have currently 5 passes, they a truckload more).

05 Dec 2007
09:52 AM

Ayende Rahien

Frans,

But the cost of the visitors is offset by query API cost of generating itself.

Frankly, they should do their caching for this.

05 Dec 2007
16:30 PM

Stefan Wenig

I've just posted a suggestion on connect. I was thinking about doing this for quite some time, but I never expected this to be too relevant for SQL queries, merely for stuff like index4objects.

here's the link: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=315491

This would allow caching of transformation results, so the LINQ provider only has to do it once. In fact, if you take this out of context, it is absolutely ridiculous that such a massive computation should be done on every call, just because some argument values change.

Vote for it!

Stefan

05 Dec 2007
17:50 PM

Stefan,

I agree. I don't see any reason why there shouldn't be some sort of a simple switch on the IQueryable or Expression to enable caching of all but the parameters. I would prefer, opt-out as well. The memory overhead would be, I think (not that I've done any memory profiling on the expressions), pretty minimal for most apps.

-JD

05 Dec 2007
19:36 PM

Stefan Wenig

JD, if you think of all the metadata overhead every single projection (anonymous type) or closure creates, I don't think that optimizing the compiler output for memory usage is very high on the C# team's priority list anyway.

05 Dec 2007
20:58 PM

Alex Henderson

The almost recursive visitation of some nodes is what was costing me, in my case I've implemented a blackboard for collecting all the information about the expression as it's visited by different visitors (well agents I guess they should be called), and function-level memoization to avoid repeated visits and metadata lookup, which definitely made it a lot faster.

For many ORM's I think you could probably write a single parser, that produced a much easier to parse query AST, which could then be easily visited to map it into the underlying ORM's query API - I definitely get the feeling that a large amount of developers are all working hard solving the same problems in different ways, with varying levels of success when they could be adding new features to their products and leveraging some common implementation provided by a 3rd party to do the heavy lifting.

06 Dec 2007
11:39 AM

Stefan Wenig

Alex, that's exactly what we're doing here right now. We've just started to develop it, and it's not a full-time task right now, so it's probably gonna take us a few weeks before we're ready for the first release. We're planning to release it under the LGPL.

Anybody who's interested can email me at stefan dot wenig at rubicon-it dot com, so I can notify you as soon as anything useful is online. (Ayende, sorry that I'm using your blog for announcements, but mine's not set up yet.)

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB