I want to achieve following behavior: User:class
should be parsed to Object - User; Type - class
, alsoUs:er:class
should result Object - Us:er; Type - class
. I can't make second part work, as soon as I add :
as a legal symbol for WORD
it parses whole input as an object Object - Us:er:class
. My grammar:
grammar Sketch;
/*
* Parser Rules
*/
input : (object)+ EOF ;
object : objectName objectType? NEWLINE ;
objectType : ':' TYPE ;
objectName : WORD ;
/*
* Lexer Rules
*/
fragment LOWERCASE : [a-z] ;
fragment UPPERCASE : [A-Z] ;
fragment NUMBER : [0-9] ;
fragment WHITESPACE : (' ') ;
fragment SYMBOLS : [!-/:-@[-`] ;
fragment C : [cC] ;
fragment L : [lL] ;
fragment A : [aA] ;
fragment S : [sS] ;
fragment T : [tT] ;
fragment U : [uU] ;
fragment R : [rR] ;
TYPE : ((C L A S S) | (S T R U C T));
NEWLINE : ('\r'? '\n' | '\r')+ ;
WORD : (LOWERCASE | UPPERCASE | NUMBER | WHITESPACE | SYMBOLS)+ ;
I wrote simple example just to explain what kind of behavior I want to get, In fact, my parser is much more complicated. As I understand, when multiple lexer rules can be fulfilled together, antlr chooses the longest token from all, and only if they are same length, order of rule declaration matters. What I want to achieve is to make order superior over token length. I found something related to that in "The definitive Antlr4 reference(15.6 Wildcard Operator and Nongreedy Subrules, page 283)". But, unfortunately, I still can't make it work with my example. I assume it's cause, in book examples are applied only to subrules. Any suggestions are appreciated.
input: object+ EOF;
object: objectName (COLON objectType)? NEWLINE ;
objectType: TYPE ;
objectName: WORD (COLON WORD)* ;
//lexers as above, of course with
COLON: ':';