Let's say that I have the following:
lexer grammar Hmm;
fragment INTEGER : [0-9];
NUMBER : INTEGER+;
COMMA : ',';
MANY_NUMBERS : NUMBER (COMMA NUMBER)*;
If I give this the entry:
1,2,3
then it will lex it as MANY_NUMBERS. I thought that lexing went in top down order of preference, and was greedy. That is, I thought it would be tokenized as:
NUMBER COMMA NUMBER COMMA NUMBER
How should I reason about this sort of thing? I realize that using parser rules I can disambiguate this, but I'd like to better understand what's going on at the lexer level.
Is it that it goes for the longest continuous possible, and defers first to the longest, then the first?
IE it's matching NUMBER, then it sees a COMMA, and instead of prefering two tokens, it prefers MANY_NUMBERS because that will locally minimize the number of tokens?
Thanks
Jon