Ayende @ Rahien

Ayende @ Rahienhttp://ayende.comAyende @ RahienCopyright (C) Ayende Rahien 2004 - 2021 (c) 202660Chris Wright commented on Rant: Compilers != ParsersI'm working on a compiler that has such things as compile-time function execution, templates, type inference, and mixins. Parsing's straightforward, and it takes up 1/4th the amount of code that semantic analysis currently does. Codegen is something that I find difficult. That's because it's verbose, strict, and using an API that I don't know very well (LLVM's C api, with some wrappers for some things) and that I can't easily extract information from for debugging (like the size of each instance of a type). http://ayende.com/4256/rant-compilers-parsers#comment18http://ayende.com/4256/rant-compilers-parsers#comment18Tue, 20 Oct 2009 13:40:38 GMTEugene Burmako commented on Rant: Compilers != Parsers@Frans: Well, I'm aware of that theory after all we had it back then in the university. In my project I used ANTLR, so there was a pita with circular dependencies. So I did have a fight with the grammar and yeah at times it was SO frustrating and I was progressing soooo slowly that I wanted cry. However I managed to tame ANTLR within several days. Tho all the remaining things - specifying type- and binding-related rules and implementing the spec took considerably much time and mental effort. Maybe it's C#-like syntax that made writing a parser to be relatively simple, but that's what I experienced. http://ayende.com/4256/rant-compilers-parsers#comment17http://ayende.com/4256/rant-compilers-parsers#comment17Tue, 20 Oct 2009 12:35:24 GMTRafal commented on Rant: Compilers != ParsersAyende, I'm not sure I get what you are trying to say. First you say that code generation is the easy part. Then you say that writing a parser was the easiest piece. So are you trying to convince us that compilers are easy, if not trivial? http://ayende.com/4256/rant-compilers-parsers#comment16http://ayende.com/4256/rant-compilers-parsers#comment16Tue, 20 Oct 2009 11:30:36 GMTAyende Rahien commented on Rant: Compilers != ParsersRoger, Sorry, I wasn't precise. From my point of view, parser == grammer. http://ayende.com/4256/rant-compilers-parsers#comment15http://ayende.com/4256/rant-compilers-parsers#comment15Tue, 20 Oct 2009 10:19:43 GMTRoger Alsing commented on Rant: Compilers != Parsers>>I wrote a few languages... >>The parser was the easiest piece of them all I'm not talking about the parser, i'm talking about the grammar.. So many times that the grammar tool have told me that my grammar is ambigious in some mysterious way that I'm unable to imagine. And trying to correct it only results in more ambiguity. Frustrating to the point where I want to give up and go into nerd rage. Once you are past the grammar, it's only code ;-) http://ayende.com/4256/rant-compilers-parsers#comment14http://ayende.com/4256/rant-compilers-parsers#comment14Tue, 20 Oct 2009 10:15:44 GMTAyende Rahien commented on Rant: Compilers != ParsersRoger, Please try it on anything that is non trivial, you'll find that it is _decidedly_ not just tedious. I wrote a few languages, the parser was the easiest piece of them all http://ayende.com/4256/rant-compilers-parsers#comment13http://ayende.com/4256/rant-compilers-parsers#comment13Tue, 20 Oct 2009 09:56:21 GMTRoger Alsing commented on Rant: Compilers != Parsers>>The usual problem is what Jeff is talking about, extracting enough semantic information from the AST to do something about it. Just run a couple of different visitors over your AST to add the relevant meta data to it. e.g. pass1, find all declarations and add type and scope to a lookup pass2, ensure that all identifiers are present in the lookup. ... passX, resolve and reduce constant expressions. Not that hard, but that doesnt mean there is alot of work involved. Tedious != Hard. (There are ofcourse tricky problems, but in general not as hard as some think it is) http://ayende.com/4256/rant-compilers-parsers#comment12http://ayende.com/4256/rant-compilers-parsers#comment12Tue, 20 Oct 2009 09:52:51 GMTAyende Rahien commented on Rant: Compilers != ParsersRoger & Frans, Emitting the code is actually pretty easy, if a bit obscure. The usual problem is what Jeff is talking about, extracting enough semantic information from the AST to do something about it. In comparison to the parser work, that is nothing. Frans, Here is a simple test to show this. Would you rather get a LINQ query as text and parse it yourself, or would you rather get the processed expression tree? http://ayende.com/4256/rant-compilers-parsers#comment11http://ayende.com/4256/rant-compilers-parsers#comment11Tue, 20 Oct 2009 09:15:07 GMTFrans Bouma commented on Rant: Compilers != ParsersI agree with Roger, your post doesn't make sense. The hardest part is either to create an (LA)LR(n) valid grammar so an LR(n) grammar parser / lexer combo can understand all input (as parsers are universal for LR(n)) or an LL(n) variant grammar. Both have pretty straightforward parse systems defined. For example LR(n) parsers reduce to non-terminal rules, and you know exactly what rule is reduced and which terminals/nonterminals the rule is made of, so all you have to do is interpret the rule handler input. LL(n) grammar parsers are also straightforward, as you move along the non-terminal rule's terminal/nonterminal input in a handler to recognize a rule and act accordingly when you have understood it completely. People often focus on AST's as the output of a parser, but frankly, parsers don't output ASTs you need to handle with visitors. Parsers do that for you, and they notify you along the way. @Eugene: these kind of problems have been solved decades ago already. Have a look at symbol tables and multi-pass parsing. Seriously, the hardest part is getting rid of ambiguity in the grammar, so automated parser tools can be used (so shift-reduce/reduce-reduce etc. conflicts are not present in LR(n), or the necessity of long lookaheads for LL(n) grammars.) have a look at the Dragon book (Aho/Sehti/Ullman: 'Compilers'), it contains the complete algo of an LR(n) parser. implementing it is fairly easy, and you then just have to fill in the rule handlers for every non-terminal which is really a simple job. @Jeff, you don't use the right approach. Parsing is actually one of the most researched topics in computer science and many tools are created to make things very easy, next to the mechanics to approach these problems. if you want, please take a look at the 'Gold parser'. It's an LR(n) parser generator with a fairly easy to use editor and it generates C# as well. With the code, you can just write code which is called when a non-terminal is reduced, really easy to do. http://ayende.com/4256/rant-compilers-parsers#comment10http://ayende.com/4256/rant-compilers-parsers#comment10Tue, 20 Oct 2009 08:22:04 GMTEugene Burmako commented on Rant: Compilers != Parsers@Roger. You overlook such stuff as type inference and various sorts of binding. If we talk about C#, the latter would include member lookup, generic args inference, overload resolution. Once I wrote a subset of C# compiler and that was the most difficult part of all the pipeline. http://ayende.com/4256/rant-compilers-parsers#comment9http://ayende.com/4256/rant-compilers-parsers#comment9Tue, 20 Oct 2009 07:37:38 GMTRoger Alsing commented on Rant: Compilers != ParsersI don't quite agree. Emitting executing code from an AST is pretty simple. Optimizations can be hard though. IMO, the hardest part of it all is to create a valid grammar for your language. Once you have the grammar, parsing and emitting code is fairly easy. http://ayende.com/4256/rant-compilers-parsers#comment8http://ayende.com/4256/rant-compilers-parsers#comment8Tue, 20 Oct 2009 06:56:59 GMTJeff LeBert commented on Rant: Compilers != ParsersYep. I've looked at that a couple times. I believe the Mono compiler ditches most of the AST as it is moves through the different phases of compiling. I don't think there is a place to stop where I have a usable AST and all the type information. My hopes are on C# 5.0 Compiler as a Service. I wish they would say more about that. I also looked at Phoenix project from Microsoft Research but it is hugely overcomplicated. It looks like FxCop for VS 2010 is migrating towards using it. I'm just installing VS 2010 Beta 2 so will know more soon. My reason for commenting in the first place was to agree that parsers are not compilers. I think we've made that point. :-) Thanks! http://ayende.com/4256/rant-compilers-parsers#comment7http://ayende.com/4256/rant-compilers-parsers#comment7Tue, 20 Oct 2009 03:30:51 GMTAyende Rahien commented on Rant: Compilers != ParsersJeff, Then what you want isn't the parser, you want the output from somewhere in the middle of the compiler pipeline. For that, I suggest taking a look at the mono C# compiler. That will give you that information http://ayende.com/4256/rant-compilers-parsers#comment6http://ayende.com/4256/rant-compilers-parsers#comment6Tue, 20 Oct 2009 02:23:45 GMTJeff LeBert commented on Rant: Compilers != ParsersI have been using NRefactory but haven't seen it do anything beyond parsing. In the example: private void Method() { foo = 5; } The foo would be represented in the AST as an IdentifierExpression if it was method parameter, local variable, field, event or property. What I want is to have foo be represented by, for example, something like a VariableIdentifier, and know that the type of foo is int. All IdentifierExpression tells you is that the identifier's name is "foo". I did look at the code completion that is also part of SharpDevelop but it was much too hairy for me to make sense out of it. Maybe that is what you are talking about? I am currently using NRefactory to parse the code, modify it and then spit it back out through "pretty printing". To get the method information, etc., I'm using source code line information from the .pdb and pulling MethodInfo, etc. from the IL. This could lead to ambiguous names in a single "line" of code but I haven't seen that yet. http://ayende.com/4256/rant-compilers-parsers#comment5http://ayende.com/4256/rant-compilers-parsers#comment5Tue, 20 Oct 2009 00:54:29 GMTAyende Rahien commented on Rant: Compilers != ParsersJeff, Take a look at NRefactory, it does GREAT job in giving you valid AST, and it handles things like local vs. field vs. argument http://ayende.com/4256/rant-compilers-parsers#comment4http://ayende.com/4256/rant-compilers-parsers#comment4Mon, 19 Oct 2009 23:50:33 GMTJeff LeBert commented on Rant: Compilers != ParsersYou are so right. It's even worse than that. A parser will produce an AST but without any Type/metadata information in it. For example this line of code from a complete .cs file: foo = 5; will produce an AST of an assignment expression with a target of an Identifier and a constant. The Identifier may be a local variable, parameter, field, event or property. The parser can't narrow it down completely because it would have to have all the other source files and go through all the referenced assemblies to get that information. It also can't tell in this situation where foo is instance or static. All that, of course, is the compiler's job. I've been looking for something that is between a parser and compiler. I want the AST where everything is resolved. I had hoped Mono C# compiler would give me this without too many code changes but I gave up on that. I am now looking forwards to the "compiler as a service" we've heard a little bit about in C# 5.0. I've written a number of FxCop rules and felt an AST would make a number of them easier to create. FxCop only uses the generaed IL. You can see some of the FxCop rules I've written at [http://jslfxcop.codeplex.com](http://jslfxcop.codeplex.com). I've written a bunch more but they are specific to the application I'm working on. I've also had projects where I wanted to migrate source code but the simple AST didn't give me enough information. For example the Click event handler in WinForms doesn't have the same parameter types as the Click event handler in WPF. Sorry for the rant. I would just really like any professional language to make available ALL the parts of the compiler: code to AST, AST + resolution of Types/methods/etc, AST to assembly. http://ayende.com/4256/rant-compilers-parsers#comment3http://ayende.com/4256/rant-compilers-parsers#comment3Mon, 19 Oct 2009 23:29:34 GMT[ICR] commented on Rant: Compilers != Parsers"E.g. if you are writing a syntax highlighter for C++ (which spits out HTML), I guess it *would* be mostly about the parser" So if it's not a compiler, then it's not a compiler? http://ayende.com/4256/rant-compilers-parsers#comment2http://ayende.com/4256/rant-compilers-parsers#comment2Mon, 19 Oct 2009 21:02:33 GMTIvan commented on Rant: Compilers != ParsersI guess, it depends on many things, such as how unruly your grammar is and how deep (wide?) is the semantic gap between the input and the output. E.g. if you are writing a syntax highlighter for C++ (which spits out HTML), I guess it *would* be mostly about the parser. If you are writing a highly optimizing Pascal compiler (for lack of better example), then it probably wouldn't. http://ayende.com/4256/rant-compilers-parsers#comment1http://ayende.com/4256/rant-compilers-parsers#comment1Mon, 19 Oct 2009 20:59:09 GMT