Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 10 | Comments: 37

filter by tags archive

Working with ANTLR: HQL Grammar

time to read 2 min | 338 words

I am currently trying to build an ANTLR grammar for HQL. There is already an existing one for Hibernate 3, but that one is based on ANLTR 2.x and supports quite a bit more than NHibernate does at the moment (DML statements, for instance). After several failed attempts to port the grammar to ANTLR 3 and generate C# code out of it, I gave up and started building my own.

I have read the ANTLR book not that long ago, so I ought to have a known what was in store for me. I didn't. I found out that this require a totally different mode of thinking. My recursion muscle is very tired at the moment, but I managed to create a simple grammar for:

select x.y, z.b from Entity as e join e.Children as c

I kept having false starts with the thing, until I went and read Boo's source, and figure out how Boo's parser works. Basically, instead of letting the tool to generate the parser and work of the generated tree, this approach calls for constructing our own tree while we parse the source.

image

As you can see, we constantly pass the Query instance down to lower rules, so we can operate on it and build our tree. This is much easier than trying to handle the CommonTreeAdaptor [sic] and derivatives.

I want to make the parser and the resulting AST as smart as possible, before trying to plug it into NHibernate's itself. That is going to be a significant undertaking, and I would like to have help, so feel free to contribute. 

Now that I have the initial stuff going, I am going to refactor it a bit to match this BNF (http://www.hibernate.org/89.html), and yes, I know it is outdated.

There are about a dozen tests for the syntax yet, so it is possible to just grab it and start working on it.

You can grab the source from: https://rhino-tools.svn.sourceforge.net/svnroot/rhino-tools/experiments/Hql


Comments

Frans Bouma

All LR(n) parsers are the same: they are basicly an engine which performs actions on the input stream of tokens by using a shift/reduce table and an action table.

All you do is create these 2 tables and that's most of the work. That's basicly what antlr does. You then simply have to write handler routines for each EBNF non-terminal reduction and you're done. This is straight forward.

These handlers are the places where you place your emit calls, e.g. emit SQL, build object trees for the sql engines etc. etc.

The biggest mistake made with LR(n) parsers is that people try to squeeze the handlers and the parser engine into 1 big pile of code. Don't do that. In the end, your head will hurt and the code will be unmaintainable, because what if a non-terminal changes?

The one book you should read is the one from Aho-Ullman, (the 'dragon' book). It contains the algo for an LR(n) parser and it's straight forward to implement.

Another mistake often made is that people try to combine lexical analyzer (the tokenizer) with the token parser engine.

If you want my implementation in C# of an LR(n) parser engine (commented!), shift/reduce table generator and lexical analyzer, feel free to drop me a line.

hammett

Ayende, if I can offer any advice it would be: try come up with a great error reporting infrastructure before getting the grammar right or completed. I think several projects made the mistake of not focusing much attention on it (including boo). Than it's a pain to fix later.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. Production postmortem: The case of the memory eater and high load - 17 hours from now
  2. Production postmortem: The case of the lying configuration file - about one day from now
  3. Production postmortem: The industry at large - 3 days from now
  4. The insidious cost of allocations - 4 days from now
  5. Find the bug: The concurrent memory buster - 5 days from now

And 4 more posts are pending...

There are posts all the way to Sep 10, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    14 Aug 2015 - The case of the man in the middle
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats