Runtime code compilation & collectible assemblies are no go

time to read 5 min | 875 words

The problem is quite simple, I want to be able to support certain operation on Raven. In order to support those operations, the user need to be able to submit a linq query to the server. In order to allow this, we need to accept a string, compile it and run it.

So far, it is pretty simple. The problem begins when you consider that assemblies can’t be unloaded. I was very hopeful when I learned about collectible assemblies in .NET 4.0, but they focus exclusively on assemblies generated from System.Reflection.Emit, while my scenario is compiling code on the fly (so I invoke the C# compiler to generate an assembly, then use that).

Collectible assemblies doesn’t help in this case. Maybe, in C# 5.0, the compiler will use SRE, which will help, but I don’t hold much hope there. I also checked out Mono.CSharp assembly, hoping that maybe it can do what I wanted it to do, but that suffer from the memory leak as well.

So I turned to the one solution that I knew would work, generating those assemblies in another app domain, and unloading that when it became too full. I kept thinking that I can’t do that because of the slowdown with cross app domain communication, but then I figured that I am violating one of the first rules of performance: You don’t know until you measure it. So I set out to test it.

I am only interested in testing the speed of cross app domain communication, not anything else, so here is my test case:

public class RemoteTransformer : MarshalByRefObject
{
    private readonly Transformer transfomer = new Transformer();

    public JObject Transform(JObject o)
    {
        return transfomer.Transform(o);
    }
}

public class Transformer
{
    public JObject Transform(JObject o)
    {
        o["Modified"] = new JValue(true);
        return o;
    }
}

Running things in the same app domain (base line):

static void Main(string[] args)
{
    var t = new RemoteTransformer();
    
    var startNew = Stopwatch.StartNew();

    for (int i = 0; i < 100000; i++)
    {
        var jobj = new JObject(new JProperty("Hello", "There"));

        t.Transform(jobj);

    }

    Console.WriteLine(startNew.ElapsedMilliseconds);
}

This consistently gives results under 200 ms (185ms, 196ms, etc). In other words, we are talking about over 500 operations per millisecond.

What happen when we do this over AppDomain boundary? The first problem I run into was that the Json objects were serializable, but that was easy to fix. Here is the code:

 static void Main(string[] args)
 {
    var appDomain = AppDomain.CreateDomain("remote");
    var t = (RemoteTransformer)appDomain.CreateInstanceAndUnwrap(typeof(RemoteTransformer).Assembly.FullName, typeof(RemoteTransformer).FullName);
    
    var startNew = Stopwatch.StartNew();
     
     for (int i = 0; i < 100000; i++)
     {
         var jobj = new JObject(new JProperty("Hello", "There"));

         t.Transform(jobj);

     }

     Console.WriteLine(startNew.ElapsedMilliseconds);
 }

And that run close to 8 seconds, (7871 ms). Or over 40 times slower, or just about 12 operations per millisecond.

To give you some indication about the timing, this means that an operation over 1 million documents would spend about 1.3 minutes just serializing data across app domains.

That is… long, but it might be acceptable, I need to think about this more.

Tweet Share Share 30 comments

Tags:

Programming

Comments

24 Aug 2010
12:36 PM

Samuel Jack

In my experience, WCF using a NetNamedPipesBinding is much faster than remoting. The new default configuration feature in .Net 4.0 makes it pretty painless to use.

24 Aug 2010
12:55 PM

andres

Can you do that with expressions?

like weblogs.asp.net/.../...-dynamic-query-library.aspx ...

24 Aug 2010
13:00 PM

Ajai Shankar

Hi Ayende

Have you already considered the dynamic query sample that ships with VS2008? ( http://msdn.microsoft.com/en-us/library/bb397982(VS.90).aspx)

I had used it some time back, there was some restriction on it only supporting method calls on some basic types, but very easy to around that :-)

It does compile expressions into a dynamic method which I think is ideal for the Raven scenario.

I also noticed a reference to NRefactory (LInqPAD uses it I think) somewhere, haven't used it but assume you could walk it's AST and transform 1:1 to a expression tree & compile to lambda?

Ajai

24 Aug 2010
13:11 PM

Andrew Davey

Looks like you might end up using (or likely writing, given your history ;)) a C# parser to generate an AST you can translate in an expression tree.

Alternatively, could you avoid repeated calls to the other app domain by making it get the documents to process, rather than sending each one over?

24 Aug 2010
13:13 PM

configurator

What happens if you try the same with two different assemblies for the Transformer and RemoteTransformer classes? I'm thinking there may be some smart optimization going on in the simple case since your loop is pretty simple.

24 Aug 2010
13:14 PM

configurator

Oh, scratch that. They're both on the same (remote) appdomain...

24 Aug 2010
13:18 PM

Paul Hatcher

One obvious thing that doesn't work is batching the calls, i.e. passing a list of values to be transformed so that the interface is not so 'chatty' - quick experiment showed that it only improved things by <10%

24 Aug 2010
13:38 PM

Philip

Here's what I had to do in a (not very) similar case:

I took the code that was static and would normally be communicating across the appdomain boundary and injected it into each appdomain when built. Then I had a slim appdomain manager that took requests and routed them to the appdomains to be worked on entirely there.

the downside to this is extra code in every appdomain, and writing the code to inject my base compiled code into them.

The tricky thing was just finding the right spot to put the boundary - make all of the work into a single cross-boundary call was the best case scenario, but not always possble.

24 Aug 2010
13:38 PM

Daniel Grunwald

Looks like you might end up using (or likely writing, given your history ;)) a C# parser to generate an AST you can translate in an expression tree.

That's nontrivial as you need to implement the C# type system, overload resolution, etc. Even just the parsing part is extremely complex for C# (think of all the ambiguous syntax like "M(a <b,> d(7))" or near-ambiguous like "bool b = a is B?;" vs. "bool b = a is B ? c:d;")

But if you are interested, I wrote some hacky prototype of this a year ago, based on the SharpDevelop C# code-completion system. It's not that hard if you use the right components: SharpDevelop solves many of the nasty issues understanding C# code, and Linq.Expressions solved many of the nasty issues generating IL.

Of course there will be subtle differences to the actual C# semantics, but it might work OK for Ayende's usecase.

For our own usecase, we settled on using csc.exe and living with resulting the memory leak.

24 Aug 2010
13:54 PM

Ernst Naezer

I gave the NamedPipe suggestion a try. I'm not a WCF expert so I'm probably doing something wrong / funky with the serializer stuff. But it looks quite slow.

265ms v.s. 25434ms on my machine. So around 100 times slower. But then again.. it could be I'm doing something stupid.

code: http://pastebin.com/mxuydvDf

24 Aug 2010
14:43 PM

Hi Ayende,

I've been using the Microsoft's Dynamic sample mentioned above for years in production. It is based on DynamicMethod and so does not generate new assemblies.

However, it does not support C# 4.0 "dynamic".

24 Aug 2010
15:32 PM

Justin Chase

I have created a Pattern Matching library that parses DSLs or a general purpose programming langauge (of my creation) that looks similar to C# for doing dynamic expressions that get built into collectible assemblies.

Theoretically you could do this with boo also once it's ported to .net 4.

http://metasharp.codeplex.com

example:

metasharp.codeplex.com/.../6635d57d84f1

Actually there are two things going on in that sample. A DSL is parsed into an AST then that AST is passed into this template, this template produces code that gets compiled into linq expressions, which get compiled into a lambda expression and executed, collected etc.

24 Aug 2010
15:53 PM

@Justin, the problem is that Raven uses C# 'dynamic' objects. LINQ expressions do not support them directly.

24 Aug 2010
16:32 PM

fschwiet

When I started looking at dealing with enums in the linq queries for RavenDB, I came to appreciate the difficulty of what you're trying to do. One problem, as soon as people can use arbitrary linq expressions, they want to mix in arbitrary code from some odd DLL that isn't necessarily on the server. I wonder if after you solve the one problem if you'll still hit another wall.

I wanted to mention some approaches that came to mind, in case you haven't considered them:

1) RavenDB, via MEF, supports adding extensions via DLL copy. I wonder if it would be easier to just require the client to send a DLL to the server that has their queries precompiled. This way they could include arbitrary code. The user has more work to do when they change their queries, but the raven usage model is already such that you're supposed to think about your queries early.

2) If this app domain business is to support the Map/Reduce linq expressions, and not the query expressions, I wonder if the whole indexing business could live in a separate app domain, so you're not crossing app domain boundaries so much.

24 Aug 2010
17:14 PM

@fschwiet, MEF directory catalog also leaves assemblies in memory, even if you remove them from the directory and refresh the catalog.

24 Aug 2010
17:15 PM

Justin Chase

I believe that it does support it actually.

Expression.Dynamic(...);

Should allow you to do the equivalent of the dynamic keyword inside of a linq expression.

msdn.microsoft.com/en-us/library/dd324059.aspx

However the Dynamic sample stuff was created in .net 3.5 time frame and does not include support for a lot of the new linq expressions such as Dynamic and Block.

24 Aug 2010
17:26 PM

SHSE

I tried WCF with NetNamedPipes and got 500ms vs 4000ms. Here is the code http://pastebin.com/MMg5SCup.

24 Aug 2010
19:39 PM

tobi

Try to find out what is causing the slowdown. Maybe it is erialization (write a custom serializer that is 100 times faster) or maybe it is the infrastructure that you cannot control (use batching to transfer multiple objects).

LambdaExpressions can be compiled to dynamic methods (LExpression.Compile(ILGenerator)). That might help too.

24 Aug 2010
19:43 PM

tobi

You could also run the whole server in a second appdomain so you do not have to do any marshalling. Then you recycle that domain periodically from the main domain that acts as a coordinator.

24 Aug 2010
21:40 PM

Frank Quednau

It may sound a bit hackish, but with Mono.Cecil it could be possible to extract the ILCode you create by doing the compile you do today. You could then feed the ILCodes into a method of your dynamic assembly, which makes it elligible for collection.

24 Aug 2010
21:55 PM

Matt Warren

@fschwiet - I think the general idea for Ravan is that an index created as a string should be simple. I believe this was a design decision to make people use simple indexes.

There is the option of compiled indexs, see the Event-Sourcing sample at github.com/.../Raven.Bundles.Sample.EventSourci....

Although the enum case you're see is somewhere inbetween, you shouldn't have to write a compiled index just to get enum's to work.

24 Aug 2010
21:57 PM

Ajai Shankar

Assuming dynamic method is what we are after, and speaking of hackish, here is one crazy link: blogs.msdn.com/.../...odinfo-to-dynamicmethod.aspx

24 Aug 2010
22:11 PM

Steve Py

Probably a silly question, but why do you kick off a compiler instance instead of using SRE?

25 Aug 2010
01:03 AM

Demis Bellot

Nice use of the 2nd App Domain, its similar to some of the impressive stuff that Second Life is doing with Mono to speed up their scripts:

http://www.youtube.com/watch?v=QGneU76KuSY

Since you're spending so much time Serializing you might want to consider a different serializer @marcgravel's protobuf-net is about 8-9x faster than Microsofts BinaryFormatter. Otherwise if you still want to use JSON you should be able to get a perf boost with my JsonSerializer which is around 3x quicker than the other JSON serializers out there atm: http://www.servicestack.net/mythz_blog/?p=344

25 Aug 2010
01:22 AM

Chris Wright

Could you serialize the expression trees that you generated in another appdomain? Should be good enough if you don't need to run them directly.

25 Aug 2010
08:41 AM

Rafal

Now you are solving a problem that would not exist if you chose a different technology for your product. An interpreted query / data manipulation language would be much better.

25 Aug 2010
08:47 AM

Frans Bouma

The binary serializer in .net is very slow. You should implement (if you can), your own serializer, which is actually pretty simple (Implement ISerializable and then simply do it in the GetObjectData and the ctor for deserialization to hook your own serialization stuff). Your own serialization code should not branch out to the .net one but should simply create a byte[] which you add to the info object. This is very fast (and also compact in data, most of the time).

See Simon Hewitt's work:

www.codeproject.com/kb/dotnet/FastSerializer.aspx

This might shave off miliseconds per call, so in the end it will benefit greatly

31 Aug 2010
17:14 PM

John Fisher

I've never looked at Raven code, but what are JObject and JProperty? Are they MarshalByRef objects? If they are not, changing them may provide the speed you need.

01 Sep 2010
09:18 AM

Ayende Rahien

John,

Making them MBRO would actually increase the timing, since you would have a lot more cross app domain chatter.

25 Sep 2010
14:45 PM

Rob R

How about using the IronPython interpreter from c# and pass in the linq query as a python script.

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB