There seems to be a natural tension between objectives discussed in Chapter 7, "Decoupling Grammars from Application-Specific Code" and Chapter 11 "Altering the Parse with Semantic Predicates".
In the parser, most of the objectives of decoupling grammars may be achieved by using the visitor class. The exception seems to be ambiguities at parse time, such as described in 11.3's sub-chapter "Properly Recognizing T(i) in C++".
In the lexer, however, there seems to be no equivalent to the power of the visitor class. Resolving ambiguities may be done with modes for simple problems. There must be enough context to switch modes, and the scope must be narrow enough to avoid recasting too many rules. For tougher problems, the only alternative I can see is semantic predicates, which couples the grammar to the target language.
Have I missed a good Antlr idiom to provide the equivalent of lexer semantic predicates, but decoupled from the target language?
If not, I found myself wishing for some additional constructs for the lexer rules. My take at a small set of powerful constructs would be:
- zero(ID)
- incr(ID)
- decr(ID)
- isZero(ID)?
- isNonZero(ID)?
- isNegative(ID)?
- isPositive(ID)?
The default for any unassigned ID would be 0. The target language would be able to manipulate the predicates, eg. Java:
In GrammarNameLexer.java, the enhanced Antlr provided declaration:
public static final Map<String, int> lexerIds = new HashMap<String, int>();
could be used in application TestGrammarName.java:
GrammarNameLexer.lexerIds.put("java5", 1);
With these constructs, the examples from the book could be recast as:
ENUM: 'enum' {java5}? -> isNonZero(java5)? ; // must be before ID
ID : [a-zA-Z]+
/** Nested newline within a (..) or [..] are ignored. */
IGNORE_NEWLINE
: '\r'? '\n' -> isNonZero(nesting)?, skip
;
LPAREN : '(' -> incr(nesting) ;
RPAREN : ')' -> decr(nesting) ;
LBRACK : '[' -> incr(nesting) ;
RBRACK : ']' -> decr(nesting) ;
Perhaps this is larding up the language too much to support the objectives of Chapter 7. I am currently debugging my project with Java, but hope to use a different target language.
The question remains, however, did I miss a simpler mechanism to be decoupled from the target language?