Ayende @ Rahien

Refunds available at head office

Design patterns in the test of time: Interpreter

In computer programming, the interpreter pattern is a design pattern that specifies how to evaluate sentences in a language. The basic idea is to have a class for each symbol (terminal or nonterminal) in a specialized computer language. The syntax tree of a sentence in the language is an instance of the composite pattern and is used to evaluate (interpret) the sentence.

More about this pattern.

Along with the visitor pattern, this is still a very useful pattern, but in a very small but important context. Parsing and executing code. We see those quite often. In particular, JavaScript is probably the most common place where we see interpreters.

That said, unless you are actually dealing with executing code, there is very little reason for you to want to apply this pattern. In fact, I have seem people go for that several times for purposes that I really can’t explain.

Interpreter is for code execution. It has some interesting differences from compiling the code. For one, it is a lot easier to write, and for the most part, performance is great. This is especially true because the hard parts (the text parsing) are usually done up front and then you are just executing the AST.

From my perspective, we use Jint, a JS interpreter in RavenDB because compiling to IL and running that was complex. Any bugs there was complex to figure out, and most important from our point of view, we really needed to be able to place limits on what you could do. The number of steps that can be taken, the recursion depth, and so on. Doing so with compiled code requires you to have kernel level access, or doing things like Thread Abort.

So Interpreters are good, but watch out when you use it, if it ain’t code that you are going to run, why are you writing this in the first place?

Comments

James McKay
01/17/2013 10:46 AM by
James McKay

I've never really understood why exactly this was listed as a design pattern by the Gang of Four in the first place. Is it supposed to be merely an expression tree, or is it supposed to be a full-blown programming language, complete with lexer, parser, and so on? In the former case, it's confusingly named and should really only be called the Expression Tree pattern; in the latter case, no design patterns book can do it justice: you need a complete treatise on compiler theory to understand what you're doing.

Catalin Pop
01/17/2013 11:01 AM by
Catalin Pop

I've used the interpreter pattern successfully in a business app. There was this complex tariff definition mechanism with subdivisions for days of the week, time intervals, product categories, client categories, and various rules, everything organized in a tree. It was a really robust implementation, a complex system that didn't had a single functional bug (only some UI bugs) for for 8 years in production.

Therefore I must disagree that interpreters (and the pattern) are only good for code. There are good use cases for them in other contexts.

erik
01/17/2013 11:56 AM by
erik

I need this pattern to parsing and import PDF files (about e-billing), can you show me an rela world example of this pattern?

thank

erik

Ayende Rahien
01/17/2013 01:23 PM by
Ayende Rahien

Erik, You generally don't need to execute pdf files, you need their DOM, which is something quite different.

Colin Bull
01/17/2013 02:03 PM by
Colin Bull

Catalin, I actually think you are agreeing with ayende here. Your tariff definition is your language... And the interpreter is executing your defined language. Exactly as ayende states in the post.

Catalin Pop
01/17/2013 04:12 PM by
Catalin Pop

@Colin, it might be a suitable pattern whenever you have hierarchical data that you need to interpret in some way, conditionally (to differentiate from visitor pattern)

This situation is not generated by languages or code alone. And regarding the your statement that my tariff definition is my language, I'd say yes and not, because then even a list of business rules can be called a language, everything in CS could potentially be called that. IE: The UI is the visual language of the computer ;)

Justin
01/17/2013 04:37 PM by
Justin

To bad Jint is so far behind in the ECMA standard and has bugs. Jurassic is fully ECMA 5 compliant and compiles to IL so it's much much faster than Jint:

http://jurassic.codeplex.com/

Then there is Javascript.net which uses the V8 engine from Chrome which compiles to machine code underneath and is by far the fastest of the 3:

http://javascriptdotnet.codeplex.com/

Very disappointing MS won't make a .Net interface to IE's javascript engine Chakra and make it well supported rather than COM interop.

Chris Wright
01/17/2013 05:17 PM by
Chris Wright

Ayende, these days you can apparently put javascript in your PDFs.

http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/jsapireference.pdf

Something of an abhomination, not to mention a source of exploits, but it exists.

Justin
01/17/2013 05:18 PM by
Justin

Here are benchmarks between the previously mentioned javascript engines just to get an idea how big the performance difference is between them:

http://jurassic.codeplex.com/wikipage?title=Benchmarks

Ayende Rahien
01/17/2013 06:18 PM by
Ayende Rahien

Justin, I do NOT want it to compile to IL. I want to be able to control very closely what can happen there, and that is very hard with IL.

Justin
01/17/2013 07:34 PM by
Justin

Ayende, I heard you the first time, which is why I mentioned how it's a shame the current state of Jint, there are definitely uses for a interpreter like that so long as you know the kind of performance you are giving up ;). I do hope Jint gets fixed up and updated at some point, I fear it's a dead project.

However with Jurassic you can always run it in a separate app domain and kill it with .Unload or if you have a bad script or if you really want to get down to it modify Jurassic's code generator to do what you want. You can also debug Jurassic code through visual studio: http://jurassic.codeplex.com/wikipage?title=Debugging&referringTitle=Documentation

The V8 engine used Javascript.Net you can TerminateExecution for a bad script which has no issues like Thread.Abort in .Net. Debugging should be supported in Chrome developer tools through V8's remoting debugging support, this is how you debug NodeJS.

So you really have to have specific reasons to use an interpreter typically thread.abort and debugging is not one of them, usually it's for environments that don't support code generation for security reasons like Windows Phone and iOS, or as you mentioned controlling program complexity, however I believe Jurassic's AST is pretty accessible so some of that could be handled statically pre IL conversion.

Ayende Rahien
01/17/2013 07:40 PM by
Ayende Rahien

Justin, I really care for limit the amount of time that a script can run. I need to do this without costly things like threads and app domain and possible state corruption due to complex IL abort procedures.

Justin
01/17/2013 08:45 PM by
Justin

IMO it's pretty silly to call threads and app domains costly when Jint is 3 orders of magnitude slower than V8. For that kind of cost you could shell out to another process and run the script and still be an order of magnitude ahead cost wise.

I don't even see a clean way to limit script time in Jint don't you have to hook into the step event (slowing down execution even more) and throw an exception to stop the script?

Heres one way do it using Javascript.net using a thread an no corruption worries and 1000x faster execution than Jint:

        var timeout = 5000;
        JavascriptContext context = new JavascriptContext();

        var runThread = new Thread(() => {
            try
            {
                context.Run("while(true) {}"); //run script;
            }
            catch{}
        });

        runThread.Start();

        if (!runThread.Join(timeout)) //wait for timeout
        {
            context.TerminateExecution();
            runThread.Join(); //wait for termination to complete
        }
Martin Doms
01/17/2013 09:16 PM by
Martin Doms

I don't agree that the interpreter pattern should be used solely for code execution. One interesting example is Martin Fowler's recurring date schedule example available here: http://martinfowler.com/apsupp/recurring.pdf (PDF warning)

He uses interpreter to build "temporal expressions" and figure out whether a particular time intersects with the schedule.

Matt Warren
01/17/2013 11:08 PM by
Matt Warren

@Justin

Re: debugging in Jurassic, from that page (http://jurassic.codeplex.com/wikipage?title=Debugging&referringTitle=Documentation)

"To enable debugging, set the EnableDebugging property of the ScriptEngine to true. This will allow debugging within Visual Studio, but it also has the following negative consequences: Code generated with EnableDebugging turned on can not be garbage collected. The more scripts you run, the more memory your program will use. Generated code will be somewhat slower. Therefore, do not enable this option for production code."

So I think it would be hard for RavenDB to turn on debugging in Jurassic

Matt Warren
01/17/2013 11:12 PM by
Matt Warren

@Justin

Not that I don't think Jurassic is a really nice project, but I'm not sure if the benefits of it fit RavenDB's use case.

In RavenDB Jint is used to patch single docs, 1 at a time. The patching scripts are single-functions that are meant for settings/modifying properties or arrays. So the speed increase that Jurassic gives might not help here.

Matt Warren
01/17/2013 11:20 PM by
Matt Warren

@Jesús López

IronJS didn't give the level of control that Jint did, particularly around making sure a script doesn't run for too long.

Gareth
01/17/2013 11:55 PM by
Gareth

@Justin, how does Jurassic compare with Javascript.net in terms of performance? Also, neither of them seems to have been updated for a while. Do you know if they're abandoned?

Justin
01/18/2013 02:31 AM by
Justin

@Matt

What kind of debugging are you guys trying to do? Custom interactive in the Raven silverlight studio? Jurassic is going to be like running any .Net plug-in code in Raven, that means run in a separate app domain if you want to be sure its collected even in debug. A custom interactive debugger would probably be easier in Jint no doubt.

@Gareth

Check out the benchmark page I linked the Chrome column would be very close to Javascript.Net since they are both V8, so about 10-20x faster. Yes V8 is that fast, almost as fast as .Net itself! And yes Jint is that slow...

As far as Jurassic development the author has basically said "it's done" since it passes all ECMA 5 test's and is basically bug free, which is more than can be said for Jint at this point.

Ayende Rahien
01/18/2013 02:58 AM by
Ayende Rahien

Justin, Now do that when you don't have 5 seconds, but you need to limit to just 10,000 ops per doc, and you need to do that times 5,000 docs. You can't do that using threads.

Justin
01/18/2013 03:32 PM by
Justin

Yes you can by using System.Diagnostics.ProcessThread and checking UserProcessTime or SystemProcessTime in the outer thread, and if it exceeds a threshold TerminateExecution.

This will be more precise than counting Jint steps because one Jint step could be a few CPU cycles while another could be a million.

What happens when someone puts a Regex in a Jint javascript with an infinite loop, how are you going to control that?

How many other Jint ops call into the system that could potentially hang or consume excessive resources?

Threads are the only way to reliably preempt it why they exist.

Sebastien Ros
01/18/2013 11:41 PM by
Sebastien Ros

Maybe it's stupid or just a knee jerk reaction, but as I know that you can do whatever you want with a performance test, and because each usage is specific, I decided to do my own test on a very simple script which my look like Ravendb's usage of JS.

It uses the latest available beta of Jint (just fixing some reported bugs) and Jurassic. The executed script is:

var o = {}; o.Foo = 'bar'; o.Baz = 42; o.Blah = o.Foo + o.Baz;

I ran it 1000 times, one time by using a single engine instance for each iteration, a second time by instantiating a new engine for each iteration.

The results don't really show that Jint "is that slow" as I might have read. BUT, I might have made a huge mistake in the code, so feel free to fix it and I will apologize publicly ! I also want to repeat that I am sure anyone can find some benchmarks where Jint is awfully slow. It's just a matter of what to test.

Using a new engine for each iteration

Jint: 1000 iterations in 2134 ms Jurassic: 1000 iterations in 6957 ms

Reusing the same engine accross iterations

Jint: 1000 iterations in 247 ms Jurassic: 1000 iterations in 2418 ms

PS: Yes, I have created Jint, with the help by a few awesome devs. A reason to double check the results.

Justin
01/19/2013 05:43 PM by
Justin

With those times you are obviously not reusing compiled scripts so most of the time spent is the compile, which Jint will beat Jurassic as it's only parsing where as Jurassic must parse and compile to IL then Jit the IL.

Of course you should try the V8 engine in Javascript.Net it compiles to machine code yet blows away Jint and Jurassic in your test taking only 14ms! It's hard to be Google's V8 team no?

Now put your test loop in inside the script to actually exercise the execution engine rather than the compiler what are your results:

var o = {}; for (i = 0; i < 100000; i++) {o.Foo = 'bar'; o.Baz = 42; o.Blah = o.Foo + o.Baz; }

Here are mine in compiled release outside the debugger:

Jint: 360ms, Jurassic: 51ms, Js.Net: 8ms

Yes it's that much faster.

What about Sunspider, the industry standard javascript performance benchmark?

Given that, now that the use case has been explained for Raven (which it wasn't in the original post) perhaps Jint is the better engine than Jurassic since many trivial dynamic scripts will be submitted rather than actually reusing a script over and over, most of the time will be spent parsing rather than actually running javascript. Raven could however take the approach most SQL databases do and cache compiled statements and reuse them if it sees a duplicate precisely because of the compilation overhead, if it had some way to parameterize these patch statements.

It's is good to some activity from a Jint Dev I see you posted the first update to Jint in over a year yesterday, nothing to do with this post? ;) .Net needs a good fully managed javascript interpreter and Jint is great work!. Of course in an ideal world Jint and Jurassic would merge allowing one to choose between interpreted or IL mode based on the situation sort of like the .Net Regex engine can.

Ayende Rahien
01/21/2013 06:51 AM by
Ayende Rahien

Justin, We actually do keep compiled scripts around (parsed ones, at least).

Comments have been closed on this topic.