Decoupling grammars in conflict with the need for semantic predicates

115 views
Skip to first unread message

Greg D

unread,
Oct 23, 2013, 2:39:58 PM10/23/13
to antlr-di...@googlegroups.com
There seems to be a natural tension between objectives discussed in Chapter 7, "Decoupling Grammars from Application-Specific Code" and Chapter 11 "Altering the Parse with Semantic Predicates".

In the parser, most of the objectives of decoupling grammars may be achieved  by using the visitor class. The exception seems to be ambiguities at parse time, such as described in 11.3's sub-chapter "Properly Recognizing T(i) in C++".

In the lexer, however, there seems to be no equivalent to the power of the visitor class. Resolving ambiguities may be done with modes for simple problems. There must be enough context to switch modes, and the scope must be narrow enough to avoid recasting too many rules. For tougher problems, the only alternative I can see is semantic predicates, which couples the grammar to the target language.

Have I missed a good Antlr idiom to provide the equivalent of lexer semantic predicates, but decoupled from the target language?

If not, I found myself wishing for some additional constructs for the lexer rules. My take at a small set of powerful constructs would be:
  • zero(ID)
  • incr(ID)
  • decr(ID)
  • isZero(ID)?
  • isNonZero(ID)?
  • isNegative(ID)?
  • isPositive(ID)?
The default for any unassigned ID would be 0. The target language would be able to manipulate the predicates, eg. Java:
In GrammarNameLexer.java, the enhanced Antlr provided declaration:
public static final Map<String, int> lexerIds = new HashMap<String, int>();
could be used in application TestGrammarName.java:
GrammarNameLexer.lexerIds.put("java5", 1);

With these constructs, the examples from the book could be recast as:
  • predicates/Enum2.g4
ENUM: 'enum' {java5}? -> isNonZero(java5)? ; // must be before ID
ID
: [a-zA-Z]+
  • lexmagic/SimplePy.g4
/** Nested newline within a (..) or [..] are ignored. */
IGNORE_NEWLINE
   
: '\r'? '\n' -> isNonZero(nesting)?, skip
   
;
LPAREN
: '(' -> incr(nesting) ;
RPAREN
: ')' -> decr(nesting) ;
LBRACK
: '[' -> incr(nesting) ;
RBRACK
: ']' -> decr(nesting) ;

Perhaps this is larding up the language too much to support the objectives of Chapter 7. I am currently debugging my project with Java, but hope to use a different target language.

The question remains, however, did I miss a simpler mechanism to be decoupled from the target language?

Terence Parr

unread,
Oct 24, 2013, 1:10:39 PM10/24/13
to antlr-di...@googlegroups.com
Interesting. We did in fact consider making a very simple imperative language that would be language neutral. These are good suggestions. Maybe add it to get hub issues? We have to be careful about adding such things however because it's complicated machinery that is rarely used, particularly when actions in C#, Java, C++ will be identical for such simple tasks.

Ter
> --
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Greg D

unread,
Oct 24, 2013, 9:47:26 PM10/24/13
to
I will added to git-hub issues, but recast it differently based on your comments.

Reflecting on your observation that C derivative languages may share the declarations and actions for the simpler predicates, I think that it would be better to make it a requirement that the target languages support a set of simple forms that are native to the C chain.

There should be relatively simple mechanisms to do this for less related languages, and is true of the desired target for my current project.

Edit: Issue opened at https://github.com/antlr/antlr4/issues/344

Reply all
Reply to author
Forward
0 new messages