I have a top level grammar, call it mylang.g4; it imports some common patterns including patterns to match whitespace (call it base_patterns.g4), identifiers, all the usual primitive types, and it also has its own lexer rules to recognise keywords.
The problem is that the imported base_patterns.g4 rules for things like IDENTIFIER are matching keywords in an input text being processed by mylang.g4, at least that is what I think is happening.
The file structure is:
~~~~~~ base_patterns.g4 ~~~~~~~~~~
lexer grammar ;
IDENTIFIER : .... ; // etc
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~ mylang.g4 ~~~~~~~~~~
grammar ;
import base_patterns;
// lexer patterns
SYM_DESCRIPTION : [Dd][Ee][Ss][Cc][Rr][Ii][Pp][Tt][Ii][Oo][Nn] ; // 'description' keyword
SYM_CARDINALITY : [Cc][Aa][Rr][Dd][Ii][Nn][Aa][Ll][Ii][Tt][Yy] ; // 'cardinality' keyword
// etc
~~~~~~~~~~~~~~~~~~~~~~~~~
but when an input text is parsed, 'cardinality' is matched as an IDENTIFIER.
To avoid this, I could presumably create mylang_keywords.g4 and then do
import mylang_keywords, base_patterns ;
but this seems an annoying proliferation of files. Any other solution?
- thomas