Tree transformations and listeners/visitors versus tree parsers

6,068 views
Skip to first unread message

Steve Ebersole

unread,
Dec 6, 2012, 1:45:16 PM12/6/12
to antlr-di...@googlegroups.com
Looking to upgrade our translators from Antlr 2 to Antlr 4 (yep coming out of the dark ages lol). As background our translator translates between one query language (HQL / JPQL) into another query language (SQL) through a number of intermediate representations.

So I went ahead and bought the Antlr 4 book to help work through the migration.

One thing in particular, however, is still causing me some concerns and uncertainty. The book says that "writing tree parsers is unnecessary" because Antlr4 tooling now generates listeners and visitors. Sure I can see that when all passes simply walk the same tree. But what about the case like I mentioned above where each pass needs to consume a different tree?

Take an example from the book (specifically The "Starter Project" from Chapter 3) where we look at a translator that takes a Java source file that defines a byte[] through an array initializer; the translation reads that and transforms it into a String:

<quote>
For example, we could transform:

static short[] data = {1,2,3};

into the following equivalent string with Unicode constants:

static String data = "\u0001\u0002\u0003"; // Java char are unsigned short
</quote>

This works in the book example because the transformation is the end result (it simply writes the "transformed result" to System.out). But imagine that this translation parse is part of a larger chain of parses. So now, rather than just writing the translation result out to stdout we instead need to "write out" a mutated tree. And lets further imagine that instead of a simple one-kind-of-assignment to another-kind-of-assignment transformation we instead want the resulting tree to be structurally different and that the subsequent passes need to walk that structurally different tree. I just don't see how do do that without writing a tree parser. Am I missing something?

And assuming I am not missing something...
(1) Does Antlr4 still have the capability of authoring tree parsers via grammar?
(2) Any pointers on writing such "tree transforming" parsers in Antlr4? I have not yet read through the whole book, so I might very well just not gotten to it yet.

Thanks,
Steve

P.S. I originally posted this to the (apparently old/defunct) antlr-interest mailing list, but Sam thankfully pointed me here.

Gerald Rosenberg

unread,
Dec 7, 2012, 3:38:54 AM12/7/12
to antlr-di...@googlegroups.com
Yes, Antlr 4 represents a rather fundamental design change.  It does not generate tree parsers, at least in the sense of your question.  Rather than direct tree modification, the evident intent is to progressively decorate an invariant tree with analysis products until a final generation phase can be executed. The ParseTreeProperty class can serve as the base for node annotations. The tradeoff for getting away from relatively brittle tree mutation is a proliferation of decorator objects. 

I have been documenting my understanding of Antlr 4 at https://github.com/GRosenberg/GenPackage . It is a customizable project generator that produces a fairly complete recognizer.  One interesting feature is that it can be rerun at any time on an existing project to generate missing project assets - it parses the Antlr generated listener class to identify new decorator objects to create.

Steve Ebersole

unread,
Dec 7, 2012, 12:22:44 PM12/7/12
to antlr-di...@googlegroups.com
Excuse my confusion as I am extremely uncertain how Antlr 4 could be used in the "mutating tree" scenarios like I mentioned.

Correct me if I am wrong (very well possible), but even the decorators and your approach are subject to the same "same parse tree structure" limitation, no?

My case really is that trees differ in *structure*.  Again, still no idea how I can do that in Antlr 4.  Apparently it does not offer "tree parsers" as an option any more so that obviously will not work.  And as stated the listeners and visitors only seem to work over the same parse tree (same structure). 

The only way I can see this working would be to have the decorator somehow represent the new tree structure.  The reason this is a very ugly option is that I need to then walk that new tree.  Which it seems, again, Antlr 4 does not offer support for so that would need to be a manual process.

Again, unless I am missing something..  Which is why I am asking here.

Sam Harwell

unread,
Dec 7, 2012, 12:28:39 PM12/7/12
to antlr-di...@googlegroups.com

The general approach in ANTLR 4 is different from earlier versions. Rather than use rewrite rules and/or AST operators to explicitly create an AST of arbitrary shape, the basic idea is grammars for ANTLR 4 should be structured so the parse tree automatically derived from the grammar rules is already in the form you want your tree to have.

 

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

--
 
 

Steve Ebersole

unread,
Dec 7, 2012, 12:38:46 PM12/7/12
to antlr-di...@googlegroups.com
Sam, my concern is that sometimes the structure is derived from semantic analysis which needs the parse to finish to have access to the tree.  Chicken-egg.

Gerald Rosenberg

unread,
Dec 8, 2012, 2:00:09 AM12/8/12
to antlr-di...@googlegroups.com
No, you are not missing anything.  There is no direct support for mutating the Antlr4 parse tree.  Go with Antlr 3 if you really need AST rewriting. 

OTOH, do realize that mutating the description of a tree is the logical equivalent of mutating the tree itself.  The mechanics are just a bit different.  ;)  Besides, the jump from Antlr2 to Antlr3 is only superficially smaller than the jump from Antlr2 to Antlr4.

Perhaps it would be better if you could rephrase your question.  Can you identify a some particular transform from your project that will be difficult to do without -- or best done with -- tree mutation.  Antlr4 is still quite new, so you are likely to receive a lot of support in identifying a best Antlr4 solution.

Jim Idle

unread,
Dec 8, 2012, 4:42:18 AM12/8/12
to antlr-di...@googlegroups.com
I think an interesting thing to ponder here (I am somewhat aligned with Steve's worries here) is whether you can always annotate some arbitrary structure such that it properly represents the original translation unit. 

So, why so we need an AST? And when we find that we do, is an approximate representation with annotations good enough, or must we restructure it in light of new semantic information?

Also bear in mind that Ter has stated more than once that the main target use of ANTLR 4 is translation/transformation because not very many of us are writing compilers. I think he is correct in that regard, but I wonder if that changes things. 

Steve, perhaps you could elaborate on why you need to change the structure of the initial AST? We can either help, of see the use case that ANTLR4 does not cover. 

Perhaps considering C++ might be a good idea. OK - so it isn't likely that we want to write a C++ compiler but analysis tools are quite likely. Anything can be done, dues tree transformation make more sense, or are we saying that this is all manual operation now? Which traditionally it has been. 

Just some thoughts while drinking Old Speckled Hen on a Saturday night on Taipei :)

Jim
--
 
 

Gerald Rosenberg

unread,
Dec 8, 2012, 4:32:51 PM12/8/12
to antlr-di...@googlegroups.com
I had the same concerns when I first picked up Antlr4.  I still have concerns regarding hetero rule nodes being represented as uncorrelated lists and the absence of true syntactic predicates...

The essential tree mutation operators are just insert and delete, with copy and move being common composite operations.  For an immutable parse tree with annotations (per-node instance decorator objects), delete is just a decorator object instance flag.  Insert of a simple (single) node is equivalent to adding a new decorator object instance to the head of the affected subtree. For multiple node insertions, the equivalent is maintaining state over multiple tree walks - any copy or move can be realized in no more than three passes or no more than one if you rewire the visitor walk to match the new ordering.

Drawbacks: lots of decorator object instances, lots of walks, and needing a state table in possible addition to a symbol table.  Benefits: not much more than the parse tree remaining an immutable reference.  Actually, I think that is a big gain.  Direct manipulation of the AST was always perilous, not to mention a PITA if the parser grammar had to be changed.

Anyway, that is my reasoning (rationalization).  Time -- and difficult problems -- will tell . . .

Terence Parr

unread,
Dec 8, 2012, 7:23:02 PM12/8/12
to antlr-di...@googlegroups.com
Gentle-ANTLR-users,

I started an email, which turned into a blog entry:

http://www.antlr.org/wiki/display/~admin/2012/12/08/Tree+rewriting+in+ANTLR+v4

I hope this helps. unfortunately, I have no good support for Steve at the moment for tree rewriting. My hope is that what he actually wants is not tree rewriting ;) please see the part where I described how ANTLR itself works.

Ter


Steve Ebersole

unread,
Dec 10, 2012, 12:07:28 PM12/10/12
to antlr-di...@googlegroups.com
Mmmm, Old Speckled Hen... nice choice!

Here are a couple of examples:

1) The initial query language (HQL) , just like SQL, puts SELECT before FROM even though elements in the SELECT refer to elements from the FROM.  For example:

select a.z from A as a

The 'a' in 'a.z' refers to the 'a' declared in the FROM clause.  In order to properly process 'a.z', 'a' I have to know what 'a' refers to (aka, FROM already has to have been processed).

I understand this one in particular could be worked around by 2 passes over the tree: the first processes the FROM clause and then second relies on that parse information from the first pass.  But...

2) HQL has multiple ways to say the same thing in many cases.  One such case is specifying "joins".  There is an implicit and an explicit form.  For example, both of these are really saying the same thing:

a) select p.address from Person p
b) select a from Person p join p.address a

I like to "normalize" these alternate forms to a single form in the tree.  Here, that means processing the SELECT sub-tree would influence (as in add to) the FROM sub-tree; essentially I like to treat the (a) form as if the user had supplied the (b) form.

Terence Parr

unread,
Dec 10, 2012, 3:57:31 PM12/10/12
to antlr-di...@googlegroups.com
hi Steve, yep, normalization is the perfect use of tree rewriting. No direct support in v4, yet, though you could manually. as long as the translation is closed under HQL or whatever language, it's "proper" :)

Ter
> --
>
>

Steve Ebersole

unread,
Dec 10, 2012, 4:20:29 PM12/10/12
to antlr-di...@googlegroups.com
Terrance,

Curious what you mean by "as long as the translation is closed under ...".

Thinking more about the normalization case, I can see that the listeners/visitors in later phases would continue to work since the result of the normalization is still a "valid tree".  I assume manually mutating the tree is available (remove nodes from the tree; add nodes to the tree; move nodes to new positions in the tree)?

Terence Parr

unread,
Dec 10, 2012, 5:16:05 PM12/10/12
to antlr-di...@googlegroups.com

On Dec 10, 2012, at 1:20 PM, Steve Ebersole wrote:

> Terrance,
>
> Curious what you mean by "as long as the translation is closed under …".

hiya. I just meant that when you translate X to Y, Y is in the same language as X. I would not translate Java to C by doing tree rewriting whereas normalizations like you require are perfect for tree rewriting.

>
> Thinking more about the normalization case, I can see that the listeners/visitors in later phases would continue to work since the result of the normalization is still a "valid tree".

correct; being closed under translation means that the same language and hence tree structure would apply

> I assume manually mutating the tree is available (remove nodes from the tree; add nodes to the tree; move nodes to new positions in the tree)?

yup. you can modify the tree all you want, though it's all standard coding by hand. Later, Sam and I hope to come up with some nice rewriting tools.

Ter


Rick M

unread,
Dec 24, 2012, 4:24:54 AM12/24/12
to antlr-di...@googlegroups.com
On Saturday, December 8, 2012 4:23:02 PM UTC-8, the_antlr_guy wrote:
http://www.antlr.org/wiki/display/~admin/2012/12/08/Tree+rewriting+in+ANTLR+v4

Having just started writing my first interpreter, I'm wondering if I will be hampered by this current restriction. I started in v3 but quickly discovered v4 and moved to that. I think I'm progressing correctly, and hope I can figure out all the right things to do, but I do see what I think are extraneous tokens in the parse tree. I'm still experimenting with when to use Listeners vs. Visitors (currently I have listeners to define all the symbols and a visitor to actually execute the code).

tonyp...@gmail.com

unread,
Apr 25, 2013, 4:54:55 PM4/25/13
to antlr-di...@googlegroups.com
On Monday, December 10, 2012 5:16:05 PM UTC-5, the_antlr_guy wrote:
...

>
> yup. you can modify the tree all you want, though it's all standard coding by hand. Later, Sam and I hope to come up with some nice rewriting tools.
>
>
>
> Ter

Any idea when these rewriting tools might be available? We've got a parser written in Antlr3, which uses rewrites similar to what Steve Ebersole described. As we extend the grammar, we'd like to consider switching to Antlr4. But without rewrites, switching will be a lot of extra work.
Thanks,
Tony Passera

Terence Parr

unread,
Apr 25, 2013, 5:19:27 PM4/25/13
to antlr-di...@googlegroups.com
Don't hold your breath ;)  It's not on my schedule yet.
T



--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
Dictation in use. Please excuse homophones, malapropisms, and nonsense. 

Jim Idle

unread,
Apr 25, 2013, 10:20:00 PM4/25/13
to antlr-di...@googlegroups.com
Just my two cents, but I didn't find switching over to antlr4 and forgetting about re-writes to be any hassle at all. While it is certainly editing work, it is easy and you can get rid of a lot of difficult to maintain code along the way. I would not wait for rewrites, and I certainly won't be pushing for them to come back. 

Take one of your projects and just have a go at it - I think that you will be surprised to be honest. I am very happy with how v4 is working. I have about 7 or 8 huge v3 grammars/parser/analyzers to convert and about 4 more to write and no real hassle in doing so. Everyone's projects have their own foibles and requirements of course, but generally, I would not wait for re-writes as the work you do now to avoid them pays off a lot in the longer term in terms of maintenance/cleanliness of the code and so on :) 

Jim

Jim Idle

unread,
Apr 25, 2013, 10:24:36 PM4/25/13
to antlr-di...@googlegroups.com
I should add that Steve's point about tree mutation is still valid - it may be best to stick to v3 for big projects where the rewrite concept is intrinsic to the whole system, unless you are willing to step back and 'walk' away from the tree transformation model. I suspect that thought and effort would eliminate the transformation needs in favor of iterative decoration of an initial tree, but I also see that that could be a lot of work, potentially for little gain in the short term.

Jim

PS: Weak pun, I know ;)

tonyp...@gmail.com

unread,
Apr 26, 2013, 11:34:18 AM4/26/13
to antlr-di...@googlegroups.com
Terence,

Thanks for letting me know. By the way, I've been using Antlr 4 a bit and find it pretty amazing, even compared with Antlr 3. It handles direct left recursions flawlessly. I've been continually surprized by the way it handles erroneous input. I've never seen anything recover so well.
Thanks again,

Tony

tonyp...@gmail.com

unread,
Apr 26, 2013, 11:41:49 AM4/26/13
to antlr-di...@googlegroups.com
Jim,
Thanks for the comments. Turns out that sticking to v3 is probably the best choice for us, because there would be so much work in changing the model. For new projects, though, I'll use Antlr 4.
Tony

Terence Parr

unread,
Apr 26, 2013, 12:31:40 PM4/26/13
to antlr-di...@googlegroups.com
Hi Tony, that's great to hear! Can i put that on site as  a testimonial?
Ter

Jim Idle

unread,
Apr 27, 2013, 6:44:31 AM4/27/13
to antlr-di...@googlegroups.com
That's a great recovery mechanism ;)

Jim

On Apr 26, 2013, at 23:34, "tonyp...@gmail.com"

Terence Parr

unread,
Apr 27, 2013, 12:11:51 PM4/27/13
to antlr-di...@googlegroups.com
Part of the new mechanism is what I call "Jim Idle's magic sync" :) ;)
Ter

tonyp...@gmail.com

unread,
Apr 29, 2013, 10:00:51 AM4/29/13
to antlr-di...@googlegroups.com
Yes, please do!

Terence Parr

unread,
Apr 29, 2013, 4:43:42 PM4/29/13
to antlr-di...@googlegroups.com
cool! thanks
T

Oliver Zeigermann

unread,
Apr 29, 2013, 4:51:09 PM4/29/13
to antlr-di...@googlegroups.com
In case anyone is interested in my 2 cents: If you need an ASTs and rewrites in ANTLR4, this would be *my* approach:
- create Java classes that serve as the nodes of the AST
- create the AST from the parse tree by writing Java code that does that in either a listener or visitor
- walk that AST using a non-ANTLR, hand written visitor
- create a completely new AST or modify the existing one while walking the AST
- repeat when necessary

Cheers

- Oliver

zhongchu...@gmail.com

unread,
Jun 5, 2013, 8:34:54 AM6/5/13
to antlr-di...@googlegroups.com
在 2013年4月30日星期二UTC+8上午4时51分09秒,Oliver Zeigermann写道:
Did you had ever done some work about generating a AST or a Parse tree of language C or Cpp using Antlr4 ? I want a good *.g4 file that can parse a complete block instead of only expressions. I really appreciate it if you can help me find such a good *.g4 file.I can get a AST if I have a good *.g4 file.

yoges...@gmail.com

unread,
Sep 17, 2013, 11:25:13 AM9/17/13
to antlr-di...@googlegroups.com


Or use the AST tree of Eclipse JDT if the target is a Java source instead of writing that target AST.

Stephen Riley

unread,
Nov 11, 2013, 1:52:29 AM11/11/13
to antlr-di...@googlegroups.com
Hi Ter,

I know I'm quite late to this game, but please consider this a +1 on getting this on your schedule.  :-)  I need exactly the example you gave at the end of your blog post.

Besides that little thing, I'm *loving* ANTLR 4.  Great stuff!

Thanks,
--S

Terence Parr

unread,
Nov 11, 2013, 10:20:35 AM11/11/13
to antlr-di...@googlegroups.com
Thanks. I hope to have xpath for trees out soon then parser interpreter (no code gen needed). then tree pattern matching.
Ter

Stephen Schleimer

unread,
Jun 18, 2014, 4:40:21 PM6/18/14
to antlr-di...@googlegroups.com
Folk,

I am new to Antlr but not new to compilers.  I am practicing rewriting using Language Implementation Patterns example of VecMath and Simplify.  I have augmented Simplify.g with additional constant folders.  In particular, I added the following two bottomup reductions:

addR returns[Integer r]    : ^('+' a=INT b=INT) -> { $r = $a.int + $b.int};

mulR returns[Integer r]    : ^('*' a=INT b=INT) -> { $r = $a.int * $b.int};

This does generate Java and the Java compiles.  However, when the reduction is attempted, I get a cast exception:

java.lang.ClassCastException: java.lang.Integer cannot be cast to org.antlr.runtime.tree.Tree

the failure occurs in the generated code:

  adaptor.addChild(root_0,  retval.r(a!=null?Integer.valueOf(a.getText()):0) * (b!=null?Integer.valueOf(b.getText()):0)); 

which is in the code implementing the "*" rewrite.  When I debug the code, each of the clauses [(a!=null?Integer.valueOf(a.getText()):0) and (b!=null?Integer.valueOf(b.getText()):0)] and return Integers of the appropriate value.  When I attempt to evaluate both clauses and the multiply, I get a cast exception.

Any help appreciated.

Thanks

-steve

Terence Parr

unread,
Jun 18, 2014, 4:42:05 PM6/18/14
to antlr-di...@googlegroups.com
Hi. get rid of the -> as you are not trying to create or rewrite. You are merely trying to execute an action during a tree walk.
Ter
> --
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Stephen Schleimer

unread,
Jun 18, 2014, 5:01:26 PM6/18/14
to antlr-di...@googlegroups.com
Ter,

Thanks for the fast response.

I am attempting to rewrite the tree (* 5 4) to (20). I have no doubt that I am not expressing my requirement well.

Indeed, the generated Java looks like what I would expect to see (a new node is created to replace the matched subtree).

So, I am truly not clear as to how the rewrite should proceed.

-steve

Terence Parr

unread,
Jun 18, 2014, 5:06:03 PM6/18/14
to antlr-di...@googlegroups.com
OH! if you are trying to create a new treenode then you do want the ->, but it can be an assignment because it’s an expression that must evaluate to a tree node :) Maybe what you mean is something like

-> INT[$a.int+$b.int]

something like that. I can remember

Ter

Stephen Schleimer

unread,
Jun 18, 2014, 5:20:31 PM6/18/14
to antlr-di...@googlegroups.com
Ter,

Looking in the generated code I see that -> INT[$a.int+$b.int] includes roughly the following:

adaptor.create(INT, ....)

While create's signature includes an int as plus one of the following: String | Token | Token and a string. So, I transformed your suggestion to:

-> INT[String.valueOf($a.int+$b.int)]

and everything worked.

Thanks for the guidance.

-steve

Stephen Schleimer

unread,
Jun 18, 2014, 5:58:02 PM6/18/14
to antlr-di...@googlegroups.com
Folk,

Yet more.

Most everything is now working.  I now have the following rewriting rules:

zeroX : ^('*' a=INT b=INT {$a.int==0}?) -> $a ;                      // 0*x -> 0
xZero : ^('*' a=INT b=INT {$b.int==0}?) -> $b ;                      // x*0 -> 0
addR  : ^('+' a=INT b=INT) -> INT[ String.valueOf($a.int + $b.int)]; // constant fold
mulR  : ^('*' a=INT b=INT) -> INT[ String.valueOf($a.int * $b.int)]; // constant fold
com1  : ^('+' a=INT b=ID ) -> ^('+' $b $a);      // canonilize moving vars to the left using commutativity
com2  : ^('*' a=INT b=ID)  -> ^('*' $b $a);      // canonilize moving vars to the left using commutativity
flt1  : ^('*' ^('*' c=ID a=INT) b=INT)  -> ^('*' $c ^('*' $a $b));   // float up a nested var out of a constant context
flt2  : ^('+' ^('+' c=ID a=INT) b=INT)  -> ^('+' $c ^('+' $a $b));   // float up a nested var out of a constant context

The transformations which occur are as follows:

Original tree: (= x (+ (+ 3 b) (* 5 4)))
(+ 3 b) -> (+ b 3)
(* 5 4) -> 20
(+ (+ b 3) 20) -> (+ b (+ 3 20))
Simplified tree: (= x (+ b (+ 3 20)))

First, com1 is applied; then mulR is applied; then flt2 is applied and rewriting completes.

However, one more rule, addR, should be applied, but it is not.

Any obvious reason for this?

-steve


Terence Parr

unread,
Jun 18, 2014, 6:05:31 PM6/18/14
to antlr-di...@googlegroups.com
yes, you were actually trying to build something more complicated than a simple subtree rewriter. You need to attempt multiple rewrites as previous rewrites expose new opportunities for optimization. Note that (+ (+ b 3) 20) -> (+ b (+ 3 20)) exposes (+3 20) but the tree Walker has probably already moved on. I can’t remember what I recommend here but you have more work to do than a simple depth first search of the tree.

Ter

Stephen Schleimer

unread,
Jun 18, 2014, 7:01:24 PM6/18/14
to antlr-di...@googlegroups.com
Ter,

I looks like instead of using downup (for the TreeRewriter) one should use applyRepeatedly. However, there is a second parameter, TreeRewriter.fptr which indicates the kind of strategy to use. I am not able to find what to use for that parameter.

Thoughts?

-steve

Terence Parr

unread,
Jun 18, 2014, 7:38:06 PM6/18/14
to antlr-di...@googlegroups.com

On Jun 18, 2014, at 4:01 PM, Stephen Schleimer <ssc...@gmail.com> wrote:

> Ter,
>
> I looks like instead of using downup (for the TreeRewriter) one should use applyRepeatedly. However, there is a second parameter, TreeRewriter.fptr which indicates the kind of strategy to use. I am not able to find what to use for that parameter.

that sounds familiar. Ah. that is the rule to apply. In your case, you need a rule that references all of those optimizations.

Ter

Stephen Schleimer

unread,
Jun 19, 2014, 12:55:12 PM6/19/14
to antlr-di...@googlegroups.com
Folk,

I am seeing an infinite loop in the parser parsing the following:

    create type ethaddr_t as INT ;

I have debugged the execution of the parser and I notice that it loops once it has recognized the semicolon (";").

The relevant fragment of the grammar is as follows:

user_defined_type_definition   :  CREATE TYPE user_defined_type_body ; /* TOKEN */

user_defined_type_body  :
        fully_qualified_identifier  ( subtype_clause  )?
        ( AS representation  )? ( user_defined_type_option_list  )?; /* TOKEN */

user_defined_type_option_list   :  user_defined_type_option  ( user_defined_type_option  )*;

user_defined_type_option  : (not relevant)

subtype_clause  : (not relevant)

representation   :  predefined_type  | member_list ;

member_list   :  (not relevant)

predefined_type
    options{k=1;}
    :   character_string_type | numeric_type | (not relevant) ;
exact_numeric_type
    options{k=1;}
    :   (not relevant) | INT | (not relevant)

If helpful, I can include the relevant portions of the generated code.

By the way, the grammar is about 1800 lines and it is extracted from the SQL2003 grammar from the antlr 3 site.

Steve Ebersole

unread,
Oct 29, 2014, 6:42:39 AM10/29/14
to antlr-di...@googlegroups.com
I am curious if any solution for tree re-writing has been added yet in the intervening period of time?  Or am I still in the same boat?


On Saturday, December 8, 2012 6:23:02 PM UTC-6, the_antlr_guy wrote:
Gentle-ANTLR-users,

I started an email, which turned into a blog entry:

http://www.antlr.org/wiki/display/~admin/2012/12/08/Tree+rewriting+in+ANTLR+v4

I hope this helps. unfortunately, I have no good support for Steve at the moment for tree rewriting. My hope is that what he actually wants is not tree rewriting ;) please see the part where I described how ANTLR itself works.

Ter


David Whitten

unread,
Oct 29, 2014, 10:38:03 AM10/29/14
to antlr-di...@googlegroups.com
I think ANTLR 4 is not going to include tree grammars until later.
Did you read through the information at

http://www.meta-environment.org/doc/books//extraction-transformation/asfsdf-by-example/asfsdf-by-example.pdf

to see if it could support what you are trying to do?

Alternately, you could start to work on the specs for what you want, so you and this group will understand what you are trying to accomplish.

Dave Whitten
713-870-3834

Jim Idle

unread,
Oct 29, 2014, 10:40:01 PM10/29/14
to antlr-di...@googlegroups.com
I have created a couple of compilers and quite a few transformers with ANTLR 4 now and to be honest, I did not miss tree re-writing at all. 

On the one occasion that I needed to do things at this level of abstraction I just went straight to my internal model and gave that model some transformation heuristics so I could traverse it and re-arrange it if necessary. Not really different from walking a tree and transforming it; while there is no abstract language to help you with the mechanics, I am not sure that it would really yield that much anyway.

For instance I find that expression evaluation/simplification is easy enough with an evaluation stack and no need to walk a tree per se. From the internal model it is easy enough to create a "standard" double dispatch tree model and as that is a well know model, there are many known ways to apply transformations. However, I have found a lot more value in examining the internal model for certain traits and appending information useful for later code generation (such as being able to be more specific about variable usage with LLVM IR, where the hard optimization work gets done). 

I can see how tree re-writing is more useful if your task is some kind of machine translation, but I also feel that generating a "standard" double dispatch tree from the listener is quite trivial and after that most, if not all, transformations boil down to well know algorithms.

On other matters, I find myself very possibly in need of a C++11 target, so I may well have to knuckle down and produce it. Still contemplating whether I can get away without it, and without knowing the status of the current valiant attempt at a translation to C++ from the Java target, then I think it might be best for my purpose to start from scratch and see what happens.

Jim



Steve Ebersole

unread,
Oct 30, 2014, 5:40:24 AM10/30/14
to antlr-di...@googlegroups.com
Here is my concern, Jim.. maybe you have thoughts.  Consider a source query for me:

select c.address.city from com.acme.crm.Customer c where c.status = com.acme.crm.Status.ACTIVE

Notice all those dots?  Specifically all those dots with totally different semantics...

At least based on my understanding from Antlr 2/3, we need to essentially syntactically recognize the "dot sequences" and then semantically "resolve" them.  Ideally, as an example, the "semantic tree" here is something like:

[QUERY]
    [FROM]
        [ENTITY, "com.acme.crm.Customer"]
            [ALIAS, "c"]
    [SELECT]
        [PROPERTY-REF, "c.address.city"]
    [WHERE]
        [EQ]
            [PROPERTY-REF, "c.status"]
            [ENUM-LITERAL, "com.acme.crm.Status.ACTIVE"]

even though the parse tree *needs to be* (again, according to my understanding of Antlr) more like:

[QUERY]
    [SELECT]
        [DOT]
            [DOT]
                [IDENT, "c"]
                [IDENT, "address"]
            [IDENT, "city"]
    [FROM]
        [DOT]
            [DOT]
                [DOT]
                    [IDENT, "com"]
                    [IDENT, "acme"]
                [IDENT, "crm"]
            [IDENT, "Customer"]
        [IDENT, "c"]
    [WHERE]
        [DOT]
            [IDENT, "c"]
            [IDENT, "status"]
        [EQ]
        [DOT]
            [DOT]
                [DOT]
                    [DOT]
                        [IDENT, "com"]
                        [IDENT, "acme"]
                    [IDENT, "crm"]
                [IDENT, "Status"]
            [IDENT, "ACTIVE"]

I have to go back and refresh my memory of Antlr 4, but are the listeners/visitors defined in terms of the parse tree?  To me, its completely unreasonable in my case to work with the model based on the parse tree as defined above during every pass; its just completely unwieldy; its much better to drive later passes on the resolved model.  Driving the processing based on the "semantic tree" is much more reasonable and I can see that being doable even without re-writing.

Mike Lischke

unread,
Oct 30, 2014, 6:08:43 AM10/30/14
to antlr-di...@googlegroups.com
Steve,

as the author of a full new MySQL grammar (even though for ANTLR3) I had similar issues, but I wonder why you think there must be a (sub)tree for identifiers.


At least based on my understanding from Antlr 2/3, we need to essentially syntactically recognize the "dot sequences" and then semantically "resolve" them.  Ideally, as an example, the "semantic tree" here is something like:

[QUERY]
    [FROM]
        [ENTITY, "com.acme.crm.Customer"]

That's a typical qualified identifier. You can give it a context by defining an own rule (e.g. table_reference, using the a qualified_identifier rule as actual implementation).

            [ALIAS, "c"]
    [SELECT]
        [PROPERTY-REF, "c.address.city"]
    [WHERE]
        [EQ]
            [PROPERTY-REF, "c.status"]
            [ENUM-LITERAL, "com.acme.crm.Status.ACTIVE"]

even though the parse tree *needs to be* (again, according to my understanding of Antlr) more like:

[QUERY]
    [SELECT]
        [DOT]
            [DOT]
                [IDENT, "c"]
                [IDENT, "address"]
            [IDENT, "city"]

What makes you think you need a tree here? The normal sequence (without rewrite) is:

ID
DOT
ID
DOT
ID

Easy to process in your following steps. Regardless of the representation you cannot say from the parse step if an ID DOT ID sequence is a schema.table or table.column reference. You can tell from a containing rule however (like the mentioned table_reference rule).



Steve Ebersole

unread,
Oct 30, 2014, 6:46:12 AM10/30/14
to antlr-di...@googlegroups.com

On Thursday, October 30, 2014 5:08:43 AM UTC-5, Mike Lischke wrote:
Steve,

as the author of a full new MySQL grammar (even though for ANTLR3) I had similar issues, but I wonder why you think there must be a (sub)tree for identifiers.

Not sure what you mean by "(sub)tree for identifiers" here.

SQL, in many ways, is much easier to parse.  (One of) the problem in the source language I deal with is that the syntax quite often cannot be inferred from the syntax.  For example, both the propertyRef and enumLiteral rules for me essentially resolve to a dot-ident sequence, so there is an ambiguity.  So for me, there needs to be semantic analysis/predicate.
Aside from "niceness" and structure? ;)  I have multiple things that will need to produce this tree.  Antlr parsing a String is just one producer.  So which is better from your perspective for an api across producers?  (1) add( SubTree ) or (2) add( List<Tree> ) ?  I know which I prefer.

David Whitten

unread,
Oct 30, 2014, 9:57:16 AM10/30/14
to antlr-di...@googlegroups.com
Jim,
could you point to a tutorial or description of how to do this?
I don't think I know what a double dispatch tree is.

Thanks,
David

Steve Ebersole

unread,
Oct 31, 2014, 3:28:03 AM10/31/14
to antlr-di...@googlegroups.com
Here is another rewrite situation from our current grammars that I have no idea how to map to Antlr 4.  We call it "soft keywords".  Essentially our grammar has no hard keywords.  We recognize them (in the lexer) as IDENT.  Its easiest to explain this via an example.  Take our "select clause" rule:

selectClause
    : selectKeyword^ distinctKeyword? rootSelectExpression
    ;

selectKeyword
: {(validateSoftKeyword("select"))}?=>  id=IDENTIFIER
-> SELECT[$id]
;

If I understand correctly (which I may not), in Antlr 4 I'd di essentially the same, except that I would not be able to do the "rewrite part".  So would I instead get listener/visitor calls here related to the selectKeyword rule when it matches (and not when it doesnt)?  And then I would leverage the selectKeyword rule listener/visitor call itself to apply the semantic.  Am I grokking that properly?


Also, what about validating semantic predicates?  I could find no reference to them in the Antlr 4 book, so I am assuming that they are also no longer supported?  Do we instead just define a semantic action on match which does the check and throws an exception ourselves?

Thanks for all the help so far


On Thursday, December 6, 2012 12:45:16 PM UTC-6, Steve Ebersole wrote:
Looking to upgrade our translators from Antlr 2 to Antlr 4 (yep coming out of the dark ages lol).  As background our translator translates between one query language (HQL / JPQL) into another query language (SQL) through a number of intermediate representations.

So I went ahead and bought the Antlr 4 book to help work through the migration.

One thing in particular, however, is still causing me some concerns and uncertainty.  The book says that "writing tree parsers is unnecessary" because Antlr4 tooling now generates listeners and visitors.  Sure I can see that when all passes simply walk the same tree.  But what about the case like I mentioned above where each pass needs to consume a different tree?

Take an example from the book (specifically The "Starter Project" from Chapter 3) where we look at a translator that takes a Java source file that defines a byte[] through an array initializer; the translation reads that and transforms it into a String:

<quote>
For example, we could transform:

static short[] data = {1,2,3};

into the following equivalent string with Unicode constants:

static String data = "\u0001\u0002\u0003"; // Java char are unsigned short
</quote>

This works in the book example because the transformation is the end result (it simply writes the "transformed result" to System.out). But imagine that this translation parse is part of a larger chain of parses.  So now, rather than just writing the translation result out to stdout we instead need to "write out" a mutated tree.  And lets further imagine that instead of a simple one-kind-of-assignment to another-kind-of-assignment transformation we instead want the resulting tree to be structurally different and that the subsequent passes need to walk that structurally different tree.  I just don't see how do do that without writing a tree parser.  Am I missing something?

And assuming I am not missing something...
(1) Does Antlr4 still have the capability of authoring tree parsers via grammar?
(2) Any pointers on writing such "tree transforming" parsers in Antlr4?  I have not yet read through the whole book, so I might very well just not gotten to it yet.

Thanks,
Steve

P.S.  I originally posted this to the (apparently old/defunct) antlr-interest mailing list, but Sam thankfully pointed me here.

Steve Ebersole

unread,
Oct 31, 2014, 3:45:31 AM10/31/14
to antlr-di...@googlegroups.com
Hey Gerald, I took a brief look again at your project and think I understand the basic gist.  I am having difficulty understanding the details of the 2 supported approaches though.  If I understand correctly, I think I want the "Converter pattern" approach.  It would be similar to designing my own "query model" separate from the Antlr parse model, and building/mutating that query model from my Antlr listener(s) as I walked the parse tree one or more times.  Am I understanding your "Converter pattern" approach?

From practicality/usability stand-point, would you be interested in a contribution exposing your project as a gradle plugin?  I can see making this part of the build, run just after the step of running the grammar through the Antlr tool (via https://github.com/melix/antlr4-gradle-plugin or similar)


On Friday, December 7, 2012 2:38:54 AM UTC-6, Gerald Rosenberg wrote:
Yes, Antlr 4 represents a rather fundamental design change.  It does not generate tree parsers, at least in the sense of your question.  Rather than direct tree modification, the evident intent is to progressively decorate an invariant tree with analysis products until a final generation phase can be executed. The ParseTreeProperty class can serve as the base for node annotations. The tradeoff for getting away from relatively brittle tree mutation is a proliferation of decorator objects. 

I have been documenting my understanding of Antlr 4 at https://github.com/GRosenberg/GenPackage . It is a customizable project generator that produces a fairly complete recognizer.  One interesting feature is that it can be rerun at any time on an existing project to generate missing project assets - it parses the Antlr generated listener class to identify new decorator objects to create.

Gerald Rosenberg

unread,
Oct 31, 2014, 7:19:17 PM10/31/14
to antlr-di...@googlegroups.com


On Friday, October 31, 2014 12:45:31 AM UTC-7, Steve Ebersole wrote:
Hey Gerald, I took a brief look again at your project and think I understand the basic gist.  I am having difficulty understanding the details of the 2 supported approaches though.  If I understand correctly, I think I want the "Converter pattern" approach.  It would be similar to designing my own "query model" separate from the Antlr parse model, and building/mutating that query model from my Antlr listener(s) as I walked the parse tree one or more times.  Am I understanding your "Converter pattern" approach?

The two patterns form one basic approach: procedurally, first walk the parse tree & decorate with node-typed 'descriptor' objects (standard decorator pattern); second walk the parse tree as many times (phases) as necessary to analyze the tree and update corresponding descriptor instances (progressive reification pattern) using phase carry data (symbol table, etc.) as appropriate to maintain continuity through and between phases. 

So, rather than doing a structural modification of an AST (in effect implementing the deductions of an AST walk), the intent is to collect and resolve deductions with results stored in the corresponding descriptor instances and any accumulated carry data.  So, for your soft keyword example, one (part of one) phase might work to check for possible keywords, set characterizing flags in the corresponding instance descriptors, and update a keyword index in the carry data.  Parse tree walks/phases are cheap/fast. Makes it easy to break down a complex analysis into manageable parts.

Hope that answers your first question.

GenProject automates generation of the descriptor objects based on the ParserBaseListener.java generated from the compilation of your g4 parser. It can be rerun to generate missing descriptors, reflecting any new rules in the parser.  Does not do anything to update/change existing descriptors or delete obsoletes.  So, running it after every g4 change is likely to build up a fair amount of cruft.  Other than that, sure.

Reply all
Reply to author
Forward
0 new messages