Hi all --
in the spirit of diving right into the deep end, I'm learning Go by
writing an interpreter for a simple language. A simplified subset of
the grammar:
statement:
assignment EOL
assignment:
NAME '=' expr
expr:
STRING
| expr + expr
(In reality, I broke expr up into multiple productions to avoid
ambiguities. I said this was a *simplified* subset!)
This parser would accept
a = "foo"
b = "bar"
c = "foo" + "bar"
Simple enough. I'm pretty sure the output of the parser should be an
AST (abstract syntax tree), but I'm unsure if I've got the details
right. Here's what it looks like:
type ASTNode interface {
Dump(writer io.Writer, indent string)
Equal(other ASTNode) bool
}
type ASTExpression interface {
// this mainly exists because I feel I need a way to distinguish
// expression nodes... and implementing String() is easy and useful
String() string
}
// implements ASTNode
type ASTAssignment struct {
target string // variable name
expr ASTExpression
}
// implements ASTNode and ASTExpression
type ASTString struct {
value string
}
// implements ASTNode and ASTExpression
type ASTAdd struct {
op1 ASTExpression
op2 ASTExpression
}
The implementations of Dump(), Equal(), and String() for those types
is straightforward and a bit repetitive, so I'll spare you. I'm sure
I'll have to add ASTNode.Walk() eventually.
This seemed like the most obvious, straightforward design to me. It's
also quite similar to a prototype I threw together in Python. But now
that I'm getting into the real world of AST walking, I sense a lot of
type assertions in my future, which seems to miss the point of having
interfaces like ASTNode and ASTExpression.
The only other design that I can see is one big concrete mudball:
const (
nt_root = iota
nt_stmt
nt_assign
nt_string
nt_add
)
type ASTNode struct {
nodetype int
children []ASTNode
data string
}
If nodetype == nt_root, then children is a list of statement nodes. If
nodetype == nt_add, then children is {op1, op2}. Etc. AST walking is
then one recursive function with a big switch on nodetype. This seems
pretty vile to me, and it only occurred to me after reading about AST
walking articles by a couple of ANTLR hackers (specifically
http://www.antlr.org/article/1170602723163/treewalkers.html).
Thoughts? Is my approximately-one-type-per-grammar-rule the usual
approach? Is there a nice middle ground that I'm missing?
Thanks --
Greg