So I went ahead and bought the Antlr 4 book to help work through the migration.
One thing in particular, however, is still causing me some concerns and uncertainty. The book says that "writing tree parsers is unnecessary" because Antlr4 tooling now generates listeners and visitors. Sure I can see that when all passes simply walk the same tree. But what about the case like I mentioned above where each pass needs to consume a different tree?
Take an example from the book (specifically The "Starter Project" from Chapter 3) where we look at a translator that takes a Java source file that defines a byte[] through an array initializer; the translation reads that and transforms it into a String:
<quote>
For example, we could transform:
static short[] data = {1,2,3};
into the following equivalent string with Unicode constants:
static String data = "\u0001\u0002\u0003"; // Java char are unsigned short
</quote>
This works in the book example because the transformation is the end result (it simply writes the "transformed result" to System.out). But imagine that this translation parse is part of a larger chain of parses. So now, rather than just writing the translation result out to stdout we instead need to "write out" a mutated tree. And lets further imagine that instead of a simple one-kind-of-assignment to another-kind-of-assignment transformation we instead want the resulting tree to be structurally different and that the subsequent passes need to walk that structurally different tree. I just don't see how do do that without writing a tree parser. Am I missing something?
And assuming I am not missing something...
(1) Does Antlr4 still have the capability of authoring tree parsers via grammar?
(2) Any pointers on writing such "tree transforming" parsers in Antlr4? I have not yet read through the whole book, so I might very well just not gotten to it yet.
Thanks,
Steve
P.S. I originally posted this to the (apparently old/defunct) antlr-interest mailing list, but Sam thankfully pointed me here.
The general approach in ANTLR 4 is different from earlier versions. Rather than use rewrite rules and/or AST operators to explicitly create an AST of arbitrary shape, the basic idea is grammars for ANTLR 4 should be structured so the parse tree automatically derived from the grammar rules is already in the form you want your tree to have.
--
Sam Harwell
Owner, Lead Developer
--
--
http://www.antlr.org/wiki/display/~admin/2012/12/08/Tree+rewriting+in+ANTLR+v4
Any idea when these rewriting tools might be available? We've got a parser written in Antlr3, which uses rewrites similar to what Steve Ebersole described. As we extend the grammar, we'd like to consider switching to Antlr4. But without rewrites, switching will be a lot of extra work.
Thanks,
Tony Passera
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Cheers
- Oliver
Or use the AST tree of Eclipse JDT if the target is a Java source instead of writing that target AST.
Folk,
addR returns[Integer r] : ^('+' a=INT b=INT) -> { $r = $a.int + $b.int};
user_defined_type_definition : CREATE TYPE user_defined_type_body ; /* TOKEN */user_defined_type_body :fully_qualified_identifier ( subtype_clause )?( AS representation )? ( user_defined_type_option_list )?; /* TOKEN */user_defined_type_option_list : user_defined_type_option ( user_defined_type_option )*;user_defined_type_option : (not relevant)subtype_clause : (not relevant)representation : predefined_type | member_list ;member_list : (not relevant)predefined_typeoptions{k=1;}: character_string_type | numeric_type | (not relevant) ;exact_numeric_typeoptions{k=1;}: (not relevant) | INT | (not relevant)
Gentle-ANTLR-users,
I started an email, which turned into a blog entry:
http://www.antlr.org/wiki/display/~admin/2012/12/08/Tree+rewriting+in+ANTLR+v4
I hope this helps. unfortunately, I have no good support for Steve at the moment for tree rewriting. My hope is that what he actually wants is not tree rewriting ;) please see the part where I described how ANTLR itself works.
Ter
At least based on my understanding from Antlr 2/3, we need to essentially syntactically recognize the "dot sequences" and then semantically "resolve" them. Ideally, as an example, the "semantic tree" here is something like:[QUERY][FROM][ENTITY, "com.acme.crm.Customer"]
[ALIAS, "c"][SELECT][PROPERTY-REF, "c.address.city"][WHERE][EQ][PROPERTY-REF, "c.status"][ENUM-LITERAL, "com.acme.crm.Status.ACTIVE"]even though the parse tree *needs to be* (again, according to my understanding of Antlr) more like:[QUERY][SELECT][DOT][DOT][IDENT, "c"][IDENT, "address"][IDENT, "city"]
Steve,as the author of a full new MySQL grammar (even though for ANTLR3) I had similar issues, but I wonder why you think there must be a (sub)tree for identifiers.
Looking to upgrade our translators from Antlr 2 to Antlr 4 (yep coming out of the dark ages lol). As background our translator translates between one query language (HQL / JPQL) into another query language (SQL) through a number of intermediate representations.So I went ahead and bought the Antlr 4 book to help work through the migration.
One thing in particular, however, is still causing me some concerns and uncertainty. The book says that "writing tree parsers is unnecessary" because Antlr4 tooling now generates listeners and visitors. Sure I can see that when all passes simply walk the same tree. But what about the case like I mentioned above where each pass needs to consume a different tree?
Take an example from the book (specifically The "Starter Project" from Chapter 3) where we look at a translator that takes a Java source file that defines a byte[] through an array initializer; the translation reads that and transforms it into a String:
<quote>
For example, we could transform:static short[] data = {1,2,3};
into the following equivalent string with Unicode constants:
static String data = "\u0001\u0002\u0003"; // Java char are unsigned short
</quote>This works in the book example because the transformation is the end result (it simply writes the "transformed result" to System.out). But imagine that this translation parse is part of a larger chain of parses. So now, rather than just writing the translation result out to stdout we instead need to "write out" a mutated tree. And lets further imagine that instead of a simple one-kind-of-assignment to another-kind-of-assignment transformation we instead want the resulting tree to be structurally different and that the subsequent passes need to walk that structurally different tree. I just don't see how do do that without writing a tree parser. Am I missing something?
And assuming I am not missing something...
(1) Does Antlr4 still have the capability of authoring tree parsers via grammar?
(2) Any pointers on writing such "tree transforming" parsers in Antlr4? I have not yet read through the whole book, so I might very well just not gotten to it yet.Thanks,
SteveP.S. I originally posted this to the (apparently old/defunct) antlr-interest mailing list, but Sam thankfully pointed me here.
Yes, Antlr 4 represents a rather fundamental design change. It does not generate tree parsers, at least in the sense of your question. Rather than direct tree modification, the evident intent is to progressively decorate an invariant tree with analysis products until a final generation phase can be executed. The ParseTreeProperty class can serve as the base for node annotations. The tradeoff for getting away from relatively brittle tree mutation is a proliferation of decorator objects.
I have been documenting my understanding of Antlr 4 at https://github.com/GRosenberg/GenPackage . It is a customizable project generator that produces a fairly complete recognizer. One interesting feature is that it can be rerun at any time on an existing project to generate missing project assets - it parses the Antlr generated listener class to identify new decorator objects to create.
Hey Gerald, I took a brief look again at your project and think I understand the basic gist. I am having difficulty understanding the details of the 2 supported approaches though. If I understand correctly, I think I want the "Converter pattern" approach. It would be similar to designing my own "query model" separate from the Antlr parse model, and building/mutating that query model from my Antlr listener(s) as I walked the parse tree one or more times. Am I understanding your "Converter pattern" approach?