Hi all
I have an interesting problem right now with ANTLR4. I have defined the following rules and tokens:
grammar xpc;
file: preprocessor ;
preprocessor:
DEFINE ID (ID | INT) #Macro
;
// Lexer section
DEFINE:
'+define' ;
ID:
ID_LETTER (ID_LETTER | DIGIT)* ;
INT: DIGIT+ ;
fragment
ID_LETTER: 'a'..'z' | 'A'..'Z' ;
fragment
DIGIT: '0'..'9' ;
Now I'm using a listener to grab when the parse tree walk enters the rule Macro and exits the rule macro.
My listener overridden methods are as follows:
@Override
public void enterMacro(xpcParser.MacroContext ctx)
{
// Look for all instances of this MACRO being used in the currently parsed file
// and replace it with the Identifier
System.out.printf("ctx.DEFINE() = %s\n", ctx.DEFINE().getText());
System.out.printf("ctx.ID(0) = %s\n", ctx.ID(0).getText());
if(ctx.ID(1) != null)
{
System.out.printf("ctx.ID(1) = %s\n", ctx.ID(1).getText());
}
else
{
System.out.printf("ctx.INT() = %s\n", ctx.INT().getText());
}
}
@Override
public void exitMacro(xpcParser.MacroContext ctx)
{
}
Now, I have a source file that's very simple:
test.xpc:
+define SOMETHING spare
In my java application, the parse tree walk will generate the following output, given the test.xpc file as an input:
ctx.DEFINE() = +define
ctx.ID(0) = SOMETHING
ctx.INT() = 10100
This is a bit odd, I would expect the rule to hit the second ID token and not the INT.
If I run the testRig on this input using this grammar and generate tokens, the following is seen:
[@0,0:6='+define',<20>,1:0]
[@1,8:16='SOMETHING',<52>,1:8]
[@2,18:22='spare',<52>,1:18]
Looking at the xpc.tokens file generated by ANTLR4, token <52> corresponds to ID.
[-bash-4.2:XPC (master)]$ grep 52 xpc.tokens
ID=52
But why does the parser in my code believe that the string spare is an INT and not an ID?
I'm quite stumped.
-Kishore