grammar lexissue;
example : CONTAINER*;
dimensions : INT ('x'|'X') INT # setWindowDimensions;
CONTAINER: [a-z] [a-zA-Z]*;
INT: [0-9]+;
and listing:
acontainer
a
x
I get a token list of:
[@0,0:9='acontainer',<3>,1:0]
[@1,11:11='a',<3>,2:0]
[@2,13:13='x',<1>,3:0]
[@3,15:14='<EOF>',<-1>,4:0]
Notice token @ 2 is . I would expect the token to follow the lexer rules and be (CONTAINER). It seems to be unexpected that the grammar rule dimension would affect the tokenization because parser rules should not affect lexing at all.
However, I was told this is by designed because I added 'x' as a token identifier in the grammar. This blows away how I am trying to solve one of my grammars use cases.
In my full blown grammar i am using a predicate which temporarly allows something like [124x256] to be parsed without triggering the CONTAINER token. This, should, greatly simplify my parser as now I can use a grammar rule like in my example and still match an x container later. This antlr4 behavior does the opposite of what I need.... It forces 'x' to be a token.
In the traditional sense a grammar doesn't affect the tokenization because Lexing happens prior to knowing anything about the parsing. Antlr4 bends the rules a bit. So how do I solve my use case without over complicating my parser?
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
> I get a token list of:
>
> [@0,0:9='acontainer',<3>,1:0]
> [@1,11:11='a',<3>,2:0]
> [@2,13:13='x',<1>,3:0]
> [@3,15:14='<EOF>',<-1>,4:0]
>
> Notice token @ 2 is . I would expect the token to follow the lexer rules and be (CONTAINER). It seems to be unexpected that the grammar rule dimension would affect the tokenization because parser rules should not affect lexing at all.
You used 'x' in a parser rule. This creates an unnamed TOKEN for 'x'.
I'm not sure from your snippet why 'a' is a token. Did you forget
something in your example?
> However, I was told this is by designed because I added 'x' as a token identifier in the grammar. This blows away how I am trying to solve one of my grammars use cases.
You can try the following changed rules:
dimension: INT X INT;
X: 'x' | 'X' ;
CONTAINER: 'a'-'z' ('a'-'z' | 'A'-'Z')*
| X;