Using a parse tree listener to build an AST

edgar hoover

unread,

Sep 22, 2020, 5:03:01 PM9/22/20

to antlr-discussion

Hi all,
I'm trying to make an AST for a large and complex SQL grammar.

I've only done this before for simpler stuff as a learning experience, and that's not helping the with grammar I'm working with now.

The ANTLR book suggests three ways, of using an expression stack for evaluation (which I used instead to build an AST rather than directly evaluation), and using a listener to walk to parse tree and build the AST from that.

Using a listener with an expression stack method is fine for simple stuff but for the complexity of SQL, it gets pretty bad rather rapidly.

Using a listener with the annotation method is also not much fun because it divorces the code from the grammar and it becomes very easy to make mistakes (see <https://groups.google.com/forum/?oldui=1#!topic/antlr-discussion/-jVVxzyVul4>).

The alternative, which I haven't tried, is to embed code into the grammar. It may not be elegant but it's certainly starting to look attractive. Does anyone have any experience with this, and any suggestions for or against it, for large and complex grammars?

Tangentially related question, I have a grammar that looks like this:

constant :
…
    |
        integer_const
    |
        decimal_const
…
    ;

But to get the correct node in the alternatives I have to do this:

public override void ExitConstant(LDBParser.ConstantContext cxt) {
    if (cxt.integer_const() != null) {
        var e = cxt.integer_const().eGet();
        cxt.eSet(e);
    } else if (cxt.decimal_const() != null) {
        var e = cxt.decimal_const().eGet();
        cxt.eSet(e);
    } else ... 
        ....
        ....
    } else { assert(false, "failing in ExitConstant!!"); }
}

Perhaps I'm missing something, I can't see a way round this but it really looks awful, am I doing this the wrong way? (I can perhaps clean this up with lambdas and whatnot but I’m still fundamentally having to check each alt for nullness then extract from the non-null one, it’s just mucky)..

thanks

jan

Susan Jolly

unread,

Sep 22, 2020, 6:07:46 PM9/22/20

to antlr-di...@googlegroups.com

I don't know if this is a useful comment or not but it is something that I
didn't realize at first but that helped me have success with when using the
annotation method for translation. Sorry but I'll have to use Java
terminology. The example on page 123 of the ANTLR 4 Ref. happens to use
the Java Integer class as the generic type of the ParseTreeProperty class
but this is just an example. You can use any type (Java class) which means
you can annotate each node with an instance of your own custom class and
thus use as many different pieces of data as you need.

HTH, SusanJ

edgar hoover

unread,

Sep 23, 2020, 4:35:18 AM9/23/20

to antlr-discussion

If I understand you correctly, I have done that. The 'integer' class in my code is replaced with a Node class which is something like:

enum IsOfType {iotStatement, iotExpression...};

class Node {
    IsOfType iot;

    Statement stmt;
    Expr expr;
    ....
}

because parts of SQL are exeprssions and other parts are statements, and there's actually a few other bits.

but if you mean something more, please tell because I'm always interested!

thanks

jan

Susan Jolly

unread,

Sep 23, 2020, 1:32:33 PM9/23/20

to antlr-di...@googlegroups.com

You understood me just fine!

It's been a while since I did this but IIRC sometimes it was useful to store
certain information when a node was entered on the way down and use or
modify it on the way back up.

In my project annotating made it possible do things at the simplest level.
You had an example where you were at a Constant node and had to test for
the type of its single child node in order to do a conversion. If I'm
understanding the example you could have set the required value as an
annotation to the child node when you were at the child node and knew its
particular type and then later accessed its val via its parent context by
something like

val = getNode( ctx.getChild(0) )

HTH, SusanJ

edgar hoover

unread,

Sep 30, 2020, 7:41:46 AM9/30/20

to antlr-discussion

Just for other people's info, I gave up on both stacks and listeners and am embedding actions directly.
It may not be clean but I'm making far better progress than before.

At some future point it would be interesting to discuss a high-level ANTLR wrapper/DSL that spits out the desired AST.

regards

jan

Martin Mirchev

unread,

Sep 30, 2020, 10:19:54 AM9/30/20

to antlr-discussion

Hello, what stopped you from using Visitors?

edgar hoover

unread,

Oct 5, 2020, 8:54:20 AM10/5/20

to antlr-discussion

I didn't try using visitors but they seem to have the same problem as listeners, which is that the visitor/listener code (c# here) has to track the parse tree reasonably closely, but it is not physically associated with the grammar (.g4 file) and this makes it something of a pig to do.

If the grammar was simpler I'm sure that the easier, but it's a large chunk of modern SQL. In addition I lack the experience in writing the grammar in such a way that deriving the AST from it is 'easiest'. So I had to keep rewriting the grammar code, which meant I had to keep rewriting the listener code -- hardly the fault of ANTLR but a real hindrance nonetheless.

Just dumping the action code into the grammar has its problems, but it was easiest way in the end.

regards

jan

Susan Jolly

unread,

Oct 5, 2020, 2:00:15 PM10/5/20

to antlr-di...@googlegroups.com

Reading your comments and questions made me aware that I didn't know
anything about creating or using ASTs since my translation application works
fine using an ANTLR 4 parse tree (Concrete Syntax Tree.) with a listener..

In doing a Google search I found a highly-praised example of using a visitor
to build an AST from an ANTLR 4 parse tree. I don't have the background
myself to evaluate it but thought it might be useful if you hadn't already
seen it.

https://stackoverflow.com/questions/29971097/how-to-create-ast-with-antlr4

Scroll down to the reply that starts "Ok, let's build a simple math example"

HTH, SusanJ

Martin Mirchev

unread,

Oct 5, 2020, 3:19:01 PM10/5/20

to antlr-discussion

Yeah, that is the perfect example and really neatly organized. Issue with his grammar would be the size but it just takes time.

edgar hoover

unread,

Oct 5, 2020, 3:48:53 PM10/5/20

to antlr-di...@googlegroups.com

I started using a stack to build the AST, something in principle like this inside a listener:

fucn ExitSubtract()
{
lftExpr = exprStack.pop;
rgtExpr = exprStack.pop;
var subtractNode = nw Subtract(lftExpr, rgtExpr);
exprStack.push(subtractNode);
}

For small grammars like the one in the link, fine, for SQL, it got scary quickly. One problem was ensuring I put the listener func call at every node, in the right place. I started going down a rabbit hole of adding special 'canaries' to the stack to make sure I stayed in sync - if I popped a canary value, I'd definitely made an error. Well it worked and once I got the hang of it it was quite clean but I totally lost faith that it would not collapse into am unworking, undebuggable tangle as I extended it.
The problem I feel was at least 50% simple inexperience on my part, but I had to factor that in. I gave up.

Then I tried annotating the parse nodes (see my original post at the top. It looked bad, was verbose and messy - and at least 50% simple inexperience on my part, but I had to factor that in. I gave up.

Embed the actions and while I've had to rewrite the grammar in places to reduce it to a few simple patterns I can repeatedly apply, I started making immediate progress and am doing OK.

That example should be added to the antlr FAQ, which I'm supposed to be looking into.. if I can find the time. We need a decent FAQ because I've realised there's a heck of a lot about antlr that I don't know and couldn't find written down.

but thanks for that link!

jan

(SQL is a big, hairy pig)
(Edit: and I am not an ANTLR expert. It's not a good combination)

Susan Jolly

unread,

Oct 5, 2020, 4:52:39 PM10/5/20

to antlr-di...@googlegroups.com

Well I had earlier found another link that deals with "big hairy" problems
which are definitely beyond my expertise. My impression reading between the
lines is that you are dealing with a difficult CS problem and learning about
tools/approaches others have used in situations like this might be useful.
(Actually sounds almost like something that needs a team effort.)

Here's the link:

https://stackoverflow.com/questions/1888854/what-is-the-difference-between-a
n-abstract-syntax-tree-and-a-concrete-syntax-tre

Scroll to the long answer that starts with sentence:
"A concrete syntax tree matches what the grammar rules say is the syntax."
This answer includes some links that look promising.

Best wishes,
SusanJ

"

edgar hoover

unread,

Oct 5, 2020, 5:38:59 PM10/5/20

to antlr-discussion

The parsing isn't a hard problem, antlr does that and eternal thanks to TParr!

The difficulty is as you suggest, lack of sufficient knowledge. I've rammed the car into the wall a few times before finally managing to motor off. It's cost a fair bit of time. Jagged little pill and all that.
I'm also using it to re-learn c# so that's another obstacle to fight.

The link is welcome and should certainly go into an antlr faq but I'm comfortable at that level.

Rather amusingly there's a reply from the guy who did the DMS software reengineering kit. I did contact him to get a price as it seemed to have some nice features. Didn't go very well...

It'd be interesting to abstract the antlr rules in M4 then use macros to generate the necessary code to produce the AST (maybe embedded, maybe as listener code, it doesn't matter) but that's the kind of rabbit hole that stops me getting things actually done. But something antlr-wrapping DSL to make the AST would be lovely.

cheers

jan

Reply all

Reply to author

Forward

Using a parse tree listener to build an AST - problems, suggestions?

edgar hoover

Susan Jolly

edgar hoover

Susan Jolly

edgar hoover

Martin Mirchev

edgar hoover

Susan Jolly

Martin Mirchev

edgar hoover

Susan Jolly

edgar hoover