--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Following up on my initial discussion, I am still having problems understanding the practical steps to performing transformations with Antlr 4. Antlr 4 does not support tree re-writing, so I get that that will not be an option. The 2 approaches that get mentioned for use with Antlr 4 include:
- "decoration" - I understand the decorator pattern in principal. However, I have no idea how to apply a decoration to the parse tree nodes in Antlr 4. I have seen mention (here, for example) that decorations are covered in the Antlr 4 book, but I could find no reference to "decorat" in a search of my copy.
- "symbol tables" - I understand the concept of building up symbol tables in theory. But I just don't see how this helps in my cases.
Remember, walking the parse tree is very cheap. And, that you can strategically act on the visitor enter and/or exit of just the nodes of current interest.
First walk, decorate the tree with your node-type specific objects and do basic decorator object init stuff (whatever you need).
Second walk, gather your alias definitions from the clauses where aliases are defined.
Third walk, on visitor exit from each node, carry up any meaningful data to a decorator object of your choice. When you get to a field that matches the alias, resolve against the symbol table to pull in the association.
Fourth walk, whatever your specific requirements are.
At this point you will likely have filled out decorator objects for the primary statement nodes, query, select, from, etc., each with fields with the appropriate characterizing data.
For analysis, rather than walking an AST and looking at the type of node you are on, walk the parse tree and look at what the decorator then says about the node. How semantic the parse tree is depends solely on how semantic you make the decorators. The content of the parse tree and an AST derived from the same source is always going to be the same.
[QUERY][SELECT][DOT][DOT][DOT][IDENT, "c"][IDENT, "headquarters"][IDENT, "state"][IDENT, "code"][FROM][SPACE][SPACE_ROOT][IDENT, "Customer"][IDENT, "c"]
Don't mean to be snarky, but I am sure you can build a richer parse tree than this to start from.
I can understand this in principal. Of course devil's in the details. And having never built a parser this way, its daunting looking up that hill.
--
You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/hzF_YrzfDKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.
For example, the @members methods could be removed, with the *_key rules being simplified to explicit elements: all_key: ALL IDENTIFIER ; |
If you are looking for assurances, then quite clearly, yes, translation from HQL/JPA to SQL is exactly the kind of problem Antlr4 is designed for. Given the HQL/JPA/SQL specs, there is no real ambiguity in the translation -- conceptually it is not too far off from being a 1-to-1 translation, but a pain because the differences are at such a low-level.
Your question then is really just about the mechanics. The project is no doubt large, but you are looking at just a handful of techniques to be mastered and then applied many times over. Ter has written a number of blog posts on Antlr.org about the preference of using a parse tree over an AST. Read them, or read them again - the simple takeaway is that ASTs become brittle under progressive transformations - they are in all meaningful ways self-modifying code. In a project as large as this, the stability that an immutable parse tree gives far outweighs any perceived benefit of re-arranging the AST.
FWIW, in hindsight, I am not particularly fond of parse trees. Hopefully, Antlr5 will allow the parser grammar to directly define key aspects of the form and content of the tree produced. But still an immutable tree once produced, with generated listeners & visitors as now provided. Having just the one level of transformation could significantly simplify analysis. More than one, in purely practical terms, is an anti-pattern.
So, for now, use the tool as designed. The structured parse-tree driven approach it most directly supports will be, if not enthusiastically, welcomed in the end.
The individual parse tree nodes are actually as 'semantic' as any AST nodes that Antlr3 would produce. In the generated parser, each different parser rule is represented by a unique 'context' class object. The queryExpression rule will be represented by a queryExpressionContext() class containing lists and references to querySpecContext(), union_keyContext(), intersect_keyContext(), except_keyContext(), all_keyContext(), and querySpecContext() classes.
Take another look at the generated parser in this light -- you should recognize that each instance of IDENTIFIER will occur in a sub-branch of the parse tree well-characterized by the series of parent contexts that connect it to the tree. An IDENTIFIER instance referenced from a parent instance of type aliasReferenceContext() must represent an alias.
Now, the last aspect of your question -- where to begin -- is simply with the grammar. Looks like it is a conversion from the Antlr2 grammar. It is a good start, but there are better ways of handling things, such as key words, that will make the grammar produce a simpler and more helpful parse tree for analysis.
For example, the @members methods could be removed, with the *_key rules being simplified to explicit elements:
all_key: ALL IDENTIFIER ;
You can add labels that will become fields in the context class, which can make it a bit easier to access the discrete elements of a context class
all_key: k=ALL id=IDENTIFIER ;
In the lexer, the tokens block could be removed with corresponding token rules being defined. The Antlr4 lexer offers 'modes' that allow well-defined subsets of tokens to be recognized. For example, many of the tokens are only valid between a 'select' and 'from'. Not sure it is appropriate for a mode, but worth considering.
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
Tree transformations *are* in fact super useful however in some circumstances, whether it’s an AST or parse tree. When you have things that go from language X to language X, there are lots of simplifications and identity transformations that make sense such as “<expr> + 0” -> “<expr>”.
On Wed, Nov 19, 2014 at 10:31 PM, Gerald Rosenberg <gbrose...@gmail.com> wrote:When I said semantic before, I meant in terms of control over the produced output (with gated semantic predicates, etc). Specifically in terms of producing differing sub-trees or nodes in one type of token (IDENT, e.g.) based on where/how it is occurs (aliasReference vs. attributeReference ...). Maybe I am not using parsing vocab correctly; that is definitely possible :)
'c' is an "alias reference". But using the tools I know from my work with Antlr 2/3, I cannot know that during parse because it also happens to be a forward reference. In Antlr 4 I can do this in the visitors by subsequently walking the parse tree multiple times, starting with the fromClause. After that I know all the possible alias references.
But to be honest this is exactly the kind of thing I am talking about in regards to learning curve and it being daunting: this process of designing the process from the ground, because that design assumes knowledge of the end result.
You misunderstand the intent of these *_key rules. They are meant to handle keywords used as not-keywords.
Hi. I don’t grok your notation / what your goal is below.
> A++ ==> expr op -> op expr+