Extraneous input error when using “lexer rule actions” and “lexer commands”

75 views
Skip to first unread message

Jorge Troncoso

unread,
Jun 5, 2019, 3:33:25 PM6/5/19
to antlr-discussion

I'm seeing an "extraneous input" error with input "\aa a" and the following grammar:

Cool.g4

grammar Cool;
import Lex;

expr
   : STR_CONST # str_const
   ;

Lex.g4

lexer grammar Lex;

@lexer::members {
  public static boolean initial = true;
  public static boolean inString = false;
  public static boolean inStringEscape = false;
}

BEGINSTRING: '"' {initial}? {
  inString = true;
  initial = false;
  System.out.println("Entering string");
} -> more;

INSTRINGSTARTESCAPE: '\\' {inString && !inStringEscape}? {
  inStringEscape = true;
  System.out.println("The next character will be escaped!");
} -> more;

INSTRINGAFTERESCAPE: ~[\n] {inString && inStringEscape}? {
  inStringEscape = false;
  System.out.println("Escaped a character.");
} -> more;

INSTRINGOTHER: (~[\n\\"])+ {inString && !inStringEscape}? {
  System.out.println("Consumed some other characters in the string!");
} -> more;

STR_CONST: '"' {inString && !inStringEscape}? {
  inString = false;
  initial = true;
  System.out.println("Exiting string");
};

WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

ID:  [a-z][_A-Za-z0-9]*;

Here's the output:

$ grun Cool expr -tree
"\aa a"
Entering string
The next character will be escaped!
Escaped a character.
Consumed some other characters in the string!
Exiting string
line 1:0 extraneous input '"\aa' expecting STR_CONST
(expr "\aa  a")

Interestingly, if I remove the ID rule, antlr parses the input fine. Here's the output when I remove the ID rule:

$ grun Cool expr -tree
"\aa a"
Entering string
The next character will be escaped!
Escaped a character.
Consumed some other characters in the string!
Exiting string
(expr "\aa a")

Any idea what might be going on? Why does antlr throw an error when ID is one of the Lexer rules? I realize that this may not be the best way to parse strings with escape sequences, but I'm still curious why antlr throws an error. Is this expected behavior in antlr?

Thanks,
Jorge

Jorge Troncoso

unread,
Oct 17, 2019, 4:00:29 PM10/17/19
to antlr-discussion
Copying the answer I received from @sharwell on GitHub.

"Your ID rule is unpredicated, so it matches aa following the \ (aa is longer than the a matched by INSTRINGAFTERESCAPE, so it's preferred even though it's later in the grammar). If you add a println to WS and ID you'll see the strange behavior in the output."
Reply all
Reply to author
Forward
0 new messages