Abstract Syntax Trees
Consider this a very high level explanation, I just wrote this in an email, and I felt that it is a shame to waste the info.
AST is the most common way for computers to work with structured text. They turn it into a graph, and then they can do operation on the graph, instead of manual text parsing. Again, this is a rough description of how it works, but it is an essential part of any text parsing routine of any complexity. Most often, compiler, interpreter and the like are based around AST and operations on AST.
Let us take this example:
if (user == null) Console.WriteLine("User is null!");
The code above translate to the above graph:
- If statement:
- Conditional Expression:
- Binary Expression (operation equality):
- Left: Reference Expression ("user")
- Right: Null Literal Expression
- True Block:
- Method Invocation Expression:
- Target:
- Member Reference Expression:
- Target:
- Reference Expression ("Console")
- Name: "WriteLine"
- Arguments:
- 0: String Literal Expression ("User is null!")
- False Block:
- Nothing
The compiler then takes this graph and run though it. Let us say that we can ask the IF statement to generate the relevant code. Assuming that this is a .Net compiler, what will happen is:
- Ask the conditional to generate the code for it.
- The conditional figures out if it is a boolean expression. (error otherwise)
- It asks the binary expression to generate the code for it.
- The binary expression ask the left expression to generate its code.
- The reference expression now looks up in the symbol table for a reference (can be a local variable, member variable, etc) and emit the code to load that
- Errors if not found
- The binary expression asks the right expression to generate its code
- The right expression emit the code to load a null to the stack
- The binary expression emit an equality operator to the stack.
- The IF statement emits a conditional jump after the true block if the previous expression result in negative
- The IF statement asks the true block to generate its code.
This is a highly abstracted explanation, but it is a useful lie :-)
Comments
AST is a way to structure expressions, probably the easiest approach, but not the only one. I wonder how people wrote those single pass, no AST compilers in the past, and yet offer languages like simula. Those were true programmers! :-)
Hammett,
That is some pretty amazing programming to do that with a single pass, it does present some constraints.
You can see some of the remains of that in C/C++ declare-before-use mentality.
My guess is that the compiler was just a big state machine that responded to the current output.
Not something that I would approach to without a bit of trepidation, that is tricky coding at best.
Comment preview