Observations on Learning
Here is an observation on learning. When I was at high school, I was thought Pascal, and I couldn't for the life of me understand dynamic memory allocation. I had little problem with everything else, but dynamic memory allocation (better known as pointers) was a mystery wrapped in an enigma stashed inside a headache.
About a year later, I was learning C++, and was one of the first in the class that grasped pointers and their usages. I remember trying to explain ***pppHead (sparse matrix) to another student, and he drew blank after the first level of indirection. I don't think that the quality of the teachers was that different, and the material is basically the same, but I grokked the second and couldn't figure out the first.
I have run into this many times since, usually a piece of technology just doesn't make sense to me, and at one point, it clicks together, and it is "Oh, that is simple!"
For a while now, I have been feeling my lack of knowledge in the area of parsers, and I kept trying to learn ANTLR on my own. I got to the point where I could read EBNF fairly well, and actually make sense of it, but taking the next step to actually building a language has been beyond me. Yesterday I picked up The Definitive ANTLR Reference, and I have been going through it with a fairly rapid pace. I don't think that at my level, the book is offering something that isn't already available online, but I have been able to understand how things mesh together much better now.
I feel that now, I am not competent with parser building, it is certainly something that I can be with a reasonable amount of real world practice. In other words, I think that I am going to be able parsers and parser building to my toolbox.
Comments
(maybe a little long for a comment, but...)
I had a couple similar experiences going through college myself. One of which was pointers, but I seemed to overcome that obstacle and managed not having to retake that introductory class on that topic.
Similarly, I struggled tremendously with the topic you are referring to; the university I attended had a mandatory course in "regular and context free grammars and languages, computational logic, finite state machines, and parsing", which loosely translates into the inner workings and knowledge of compilers and languages. Basically, a 50,000-foot view of ANTLR.
Aside from the professor being HORRIBLE, I ended up having to retake the class and only received a high enough grade -- the second time around -- to allow me to graduate.
It sounds, ironically, that you and I are close to the same level of understanding on this material; EBNF is becoming more familiar to me - granted, 4 years later - and I'm completely amazed and surprised by the fact that I'm: 1.) actually attempting to apply something from my undergraduate studies that I thought I would never would see again; 2.) really, really interested in the technology and how it can be used; and 3.) feeling pretty good about what's been accomplished thus far...
ANTLR is about LR(n) parsers. These are the hardest part of compiler writing. What I've read about ANTLR's docs doesn't really convince me that it teaches the backgrounds/basics properly.
I therefore suggest you pick up a copy of 'Compilers' by Aho, Sethi and Ulmann:
http://en.wikipedia.org/wiki/Compilers:_Principles,_Techniques,_and_Tools
It's THE book on compilers. It's kind of deep, but IMHO you won't get the core basics without this deep knowledge.
If you want the sourcecode of my own implementation of their LR(n) parser generator (it's in C# but uses hungarian coding style hehe, it's from 2002), drop me a line.
I need to dive into parsing sometime in the near future. I would like to create a fairly complete parser for TSQL (2000 and 2005), for a few reasons. One, renaming without fear of breaking stuff would be possible. Two, this would allow me to make a neat little editor with intellisense. Three, I would be able to ensure that fields read/used as criteria are populated first, which would be useful for a particularly complicated import/analysis process.
The above, especially the editor part, requires two extra features: 1) ability to parse realtime, not-necessarily-correct code and 2) ability to get better error messages than SQL Server typically reports. Ayende, I guess this isn't really your thing, as you live in the hygenic world of NHibernate, but if you or any others reading this have any tips for me, I'd appreciate it.
You learn by example. I learned ANTLR by studying (and in some cases modifying) the parsers for boo and groovy, for example. Those, along with the antlr grammar for ruby someone made, are probably the most complex examples out there though.
Plus it helps to have a good understanding of how regex works, too, before diving into EBNF.
p.s. There's a big argument over on the antlr list now about the lack of documentation :)
I wrote a couple examples using ANTLR a few months back. You might find it interesting: http://aaronfeng.com/articles/2007/04/14/csharp-to-pl-sql
Comment preview