Abstract Syntax Trees

time to read 2 min | 326 words

Consider this a very high level explanation, I just wrote this in an email, and I felt that it is a shame to waste the info.

AST is the most common way for computers to work with structured text. They turn it into a graph, and then they can do operation on the graph, instead of manual text parsing. Again, this is a rough description of how it works, but it is an essential part of any text parsing routine of any complexity. Most often, compiler, interpreter and the like are based around AST and operations on AST.

Let us take this example:

if (user == null) 
	Console.WriteLine("User is null!");


The code above translate to the above graph:

  • If statement:
    • Conditional Expression:
      • Binary Expression (operation equality):
        • Left: Reference Expression ("user")
        • Right: Null Literal Expression
    • True Block:
      • Method Invocation Expression:
        • Target:
          • Member Reference Expression:
            • Target:
              • Reference Expression ("Console")
              • Name: "WriteLine"
        • Arguments:
          • 0: String Literal Expression ("User is null!")
    • False Block:
      • Nothing

The compiler then takes this graph and run though it. Let us say that we can ask the IF statement to generate the relevant code. Assuming that this is a .Net compiler, what will happen is:

  • Ask the conditional to generate the code for it.
    • The conditional figures out if it is a boolean expression. (error otherwise)
    • It asks the binary expression to generate the code for it.
      • The binary expression ask the left expression to generate its code.
        • The reference expression now looks up in the symbol table for a reference (can be a local variable, member variable, etc) and emit the code to load that
        • Errors if not found
      • The binary expression asks the right expression to generate its code
        • The right expression emit the code to load a null to the stack
      • The binary expression emit an equality operator to the stack.
  • The IF statement emits a conditional jump after the true block if the previous expression result in negative
  • The IF statement asks the true block to generate its code.

This is a highly abstracted explanation, but it is a useful lie :-)