Tagging generated parse tree classes as implementing interface

28 views
Skip to first unread message

Code Query

unread,
Feb 18, 2020, 2:58:19 PM2/18/20
to antlr-discussion
Dear ANTLRs,

This must have been asked before, but I failed to find it :(

I use a parse tree visitor to "interpret" an ANTLR generated parse tree. I very often find myself wanting to do something like "get the Identifier from all children that have an Identifier" (or something similar; type annotation, expression, etc). Consider the following example grammar fragment:

program : '{' (ss+=statement (';' ss+=statement)*)? '}'
statement
 
: 'fun' Identifier '(' argList ')' funBody #funStmt
 
| 'var' Identifier '=' expr                #varStmt
 
| 'prn' expr                               #prnStmt
 
;

I typically have to write code like this in my visitor:

@Override
public List<String> visitProgram(InterfaceDemoParser.ProgramContext ctx) {
 
List<String> result = new ArrayList<>();
 
for(InterfaceDemoParser.StatementContext stmt : ctx.ss) {
   
if (stmt instanceof InterfaceDemoParser.FunStmtContext) {
      result
.add(((InterfaceDemoParser.FunStmtContext) stmt).Identifier().getText());
   
} else if (stmt instanceof InterfaceDemoParser.VarStmtContext) {
      result
.add(((InterfaceDemoParser.VarStmtContext) stmt).Identifier().getText());
   
}
 
}
 
return result;
}

Of course, there often are many more than two cases to distinguish; the point is that some statements have an Identifier that I'm interested in and some don't.

I would like to define an interface, say HasIdentifier, and let the generated parse tree classes FunStmtContext and VarStmtContext both explicitly say "implements HasIdentifier" so that I only have to do one instance check.

Is there a way to create such interfaces and let the generated parse tree classes implement them?

Is the "default" way to do this, to write another visitor that only gets the interface of a parse tree node (and null for all those that don't have one)? If so, is there a straightforward way of generating a BaseVisitor for that which returns null by default instead of calling visitChildren?

Regards,
Joe




Duane Griffin

unread,
Feb 19, 2020, 6:03:26 PM2/19/20
to antlr-di...@googlegroups.com
Hi Joe,

The ParserRuleContext::getRuleContexts method is probably what you are
looking for. Assuming you're doing this more than once, and hence want
a helper to do it in a generic fashion, something like this:

public List<String> getChildTextFromRules(Collection<? extends
ParserRuleContext> rules, Class<? extends ParserRuleContext>
childType) {
return rules
.stream()
.flatMap(stmt -> stmt.getRuleContexts(childType).stream())
.map(ParserRuleContext::getText)
.collect(Collectors.toList());
}

(Or write it using for loops instead of streams, if you prefer.)

Then your example visitor method body just becomes:

return getChildTextFromRules(ctx.ss,
InterfaceDemoParser.IdentifierContext.class);

Cheers,
Duane.
> --
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/b9c88750-51cc-4435-9d99-ca891a59ba21%40googlegroups.com.



--
"I never could learn to drink that blood and call it wine" - Bob Dylan

Code Query

unread,
Feb 20, 2020, 4:54:12 AM2/20/20
to antlr-discussion
Dear Duane,

Thanks for that tip; it looked promising. I had glanced over that function because it appears it didn't appear worthy of documentation (either in the API docs, or in The Definitive ANTLR4 Reference). I didn't know, for example, whether it was superficial or recursive. Looking up the source, it turns out to be superficial, so it's a "generalisation" of getChild; getAllChildren, if you will. Not quite, actually, because its template variable has an upper bound on ParseRuleContext (whereas getChild "only" bounds it at ParseTree).

My example was poorly chosen. Two elements weren't in there;
  1. When I have a rule that has multiple Identifiers (or any other Rule or Terminal), I still can't grab it without knowing the index in that (sub)clause
  2. Sometimes, contexts share the same purpose, but have different types. This is typically solved by naming things, but then I still can't "group" the ParserRuleContexts representing the subclauses by letting them implement an interface (as mentioned before; the problem with naming things is that names translate to member fields, not methods, so interfaces wouldn't solve my issue yet). Consider this - yet again woefully synthetic - example:

constantExpression
 
: Identifier '[' subscript=integerConstantExpression ']' # constantArrayDereference
 
| Identifier '{' subscript=stringLiteral '}'             # constantStringMapDereference
 
| stringLiteral
 
| integerConstantExpression

I would love to have a way of saying that both ConstantArrayDereferenceContext and ConstantStringMapDereferenceContext implement HasSubscript. Using getRuleContext would now require multiple passes to get either IntegerConstantExpressionContexts and StringLiteralContexts. Not only that, but it would also fail to disambiguate the subscript cases from the "naked" expression cases.

I guess both of these problems could be solved by introducing intermediate nodes, i.e.

statement
 
: 'fun' stmtName '(' argList ')' funBody #funStmt
 
| 'var' stmtName '=' expr                #varStmt
 
| 'prn' expr                             #prnStmt
 
;

stmtName
: Identifier;


constantExpression
 
: Identifier '[' constantSubscriptExpression ']' # constantArrayDereference
 
| Identifier '{' constantSubscriptExpression '}' # constantStringMapDereference
 
| stringLiteral
 
| integerConstantExpression
 
;

constantSubscriptExpression
 
: integerConstantExpression
 
| stringLiteral
 
;


Unfortunately, I'm not the sole owner of the grammar and other users are (reasonably) concerned about overhead of parse tree node count growth.

Any further thoughts would be greatly appreciated!

Regards,
Joe
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-di...@googlegroups.com.

Duane Griffin

unread,
Feb 20, 2020, 8:27:02 PM2/20/20
to antlr-di...@googlegroups.com
Hmm, I see! That does complicate matters.

I'm afraid I don't have any very good suggestions for you: your
thought to add intermediate nodes would be what I would try first, but
of course if the overhead is unacceptable then c'est la vie. Perhaps
it would be worth trying and measuring?

The only other option I can think of is rather nasty: you could always
use reflection to look for the relevant member fields. The gory
details could all be hidden in helpers, keeping your application logic
nice and clean, however you would of course have all the usual
downsides of using reflection. In particular the performance hit might
be quite severe, depending on what your visitor is for.

Good luck!

Cheers,
Duane.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/3985a01c-9f2d-46c5-b8ba-b74fd295b43c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages