lexer grammar LLexer ;
COLON2
: '::' -> more ;
DIGITS
: [0-9]+ { System.out.println
("at DIGITS: getText() is '"
+ getText()
+ "'.");
System.out.println
("at DIGITS: alt method is '"
+ _tokenFactorySourcePair.b.getText(Interval.of(_tokenStartCharIndex,getCharIndex()-1))
+ "'.");
} ;
123::456
$ antlr4 LLexer.g4
$ javac *.java
$ grun L tokens -tokens sample.txt
at DIGITS: getText() is '123'.
at DIGITS: alt method is '123'.
at DIGITS: getText() is '456'.
at DIGITS: alt method is '::456'.
line 1:8 token recognition error at: '\n'
[@0,0:2='123',<1>,1:0]
[@1,3:7='::456',<1>,1:3]
[@2,9:8='<EOF>',<-1>,2:0]
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
My guess is the latter, since that's where I am in my learning of ANTLR v4 and parsing in general. I'm afraid I don't know what LA() means, and so can't phrase this reply in those terms.
@lexer::members {
// method to get text for a token, even if aggregated by 'more'
public String getTokenText() {
if (_tokenFactorySourcePair.b != null) {
return _tokenFactorySourcePair.b.getText(Interval.of(_tokenStartCharIndex,getCharIndex()-1));
} else {
return getText();
}
}
}
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
The code above is crafted just to show the issue.
The purpose of the underlying code is to process parametrised macros in the source. Since existing macro definitions may be redefined in the source, I believe it is simpler to manage them at lexer-time. As a result, I am using token actions to maintain the pool of macros and to inject new tokens when a macro is referenced. This creates the need to access the token text in lexer rules. Some of the tokens are aggregated with '-> more', or '{ more(); }, leading to the subject question of this topic.
The book's examples use getText(), but only on self-contained lexer rules. So, I don't know if the behaviour show above is:
- intentional, because it makes more sense for getText() to yield the local text (i.e. my problem to live with)
- inadvertent, and an issue should be created
- wrong-headed, because there is already a robust mechanism to get the aggregated text
My guess is the latter, since that's where I am in my learning of ANTLR v4.
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.