Hi,
the problem of allowing non-reserved keywords as identifiers in ANTLR (3.4) has been discussed a lot already and the usual solution is something like this:
identifier:
IDENTIFIER
| keyword
;
keyword:
kw1
| kw2
| kw3
...
;
This works fine but has a big drawback: when you have a large number of keywords this makes your parser huge. I'm talking of a size of ~24MB currently (C target).
In order to get this size down (because it causes a lot of trouble in various tools including XCode) I tried to write an is_keyword() function and use it so:
identifier:
{is_keyword(LA(1))}? => .
| IDENTIFIER
;
This brought down the parser size to ~7MB (great!) but gives me a large number of warnings (Input such as ... is insufficiently covered with predicates), probably because of the dot and the hoisting that is done in the parser. Additionally, I get an error that certain alts never can get matched. So this construct doesn't work at all.
My thought was now that I could avoid this if I can replace the dot with the actual token. But how can I use the current lexer token instead? Any other idea I could use instead?