Extraneous input error when using “lexer rule actions” and “lexer commands”

75 views

lexer

Skip to first unread message

Jorge Troncoso

unread,

Jun 5, 2019, 3:33:25 PM6/5/19

to antlr-discussion

I'm seeing an "extraneous input" error with input "\aa a" and the following grammar:

Cool.g4

grammar Cool;

import Lex;

expr

: STR_CONST # str_const

;

Lex.g4

lexer grammar Lex;

@lexer::members {

public static boolean initial = true;

public static boolean inString = false;

public static boolean inStringEscape = false;

}

BEGINSTRING: '"' {initial}? {

inString = true;

initial = false;

System.out.println("Entering string");

} -> more;

INSTRINGSTARTESCAPE: '\\' {inString && !inStringEscape}? {

inStringEscape = true;

System.out.println("The next character will be escaped!");

} -> more;

INSTRINGAFTERESCAPE: ~[\n] {inString && inStringEscape}? {

inStringEscape = false;

System.out.println("Escaped a character.");

} -> more;

INSTRINGOTHER: (~[\n\\"])+ {inString && !inStringEscape}? {

System.out.println("Consumed some other characters in the string!");

} -> more;

STR_CONST: '"' {inString && !inStringEscape}? {

inString = false;

initial = true;

System.out.println("Exiting string");

};

WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

ID: [a-z][_A-Za-z0-9]*;

Here's the output:

$ grun Cool expr -tree

"\aa a"

Entering string

The next character will be escaped!

Escaped a character.

Consumed some other characters in the string!

Exiting string

line 1:0 extraneous input '"\aa' expecting STR_CONST

(expr "\aa a")

Interestingly, if I remove the ID rule, antlr parses the input fine. Here's the output when I remove the ID rule:

$ grun Cool expr -tree

"\aa a"

Entering string

The next character will be escaped!

Escaped a character.

Consumed some other characters in the string!

Exiting string

(expr "\aa a")

Any idea what might be going on? Why does antlr throw an error when ID is one of the Lexer rules? I realize that this may not be the best way to parse strings with escape sequences, but I'm still curious why antlr throws an error. Is this expected behavior in antlr?

Thanks,

Jorge

(Also asked about this on stackoverflow: https://stackoverflow.com/questions/56420700/extraneous-input-error-when-using-lexer-rule-actions-and-lexer-commands)

Jorge Troncoso

unread,

Oct 17, 2019, 4:00:29 PM10/17/19

to antlr-discussion

Copying the answer I received from @sharwell on GitHub.

"Your ID rule is unpredicated, so it matches aa following the \ (aa is longer than the a matched by INSTRINGAFTERESCAPE, so it's preferred even though it's later in the grammar). If you add a println to WS and ID you'll see the strange behavior in the output."

Reply all

Reply to author

Forward

0 new messages