how to match until end of line?

452 views
Skip to first unread message

Cristian Vasile Mocanu

unread,
Feb 19, 2018, 6:47:41 PM2/19/18
to antlr-discussion

Hello,

I just started to learn ANTLR (just finished reading chapter 4 of "The definitive ANTLR 4 reference").

I am trying to match something like this:
print: anything, but at least one character followed by newline

My grammar is the following:
grammar NotSoSimple;

print: PRINT text=NOT_NEWLINE NEWLINE;

PRINT : 'print: ' ;
NOT_NEWLINE : ~[\r\n]+ ;
NEWLINE : '\r'? '\n' | '\r' ;

Of course, this doesn't work because NOT_NEWLINE eats up all the characters, including 'print: '.

I have 4 questions:
1) how can I solve this?
2) why does NOT_NEWLINE take precedence when I listed PRINT before it?
3) is there a simple solution to this? Something that does not look as a workaround and that is not completely counterintuitive.

Thinking out loud:
It looks to me that lexing just introduces a lot of complexity.
Obviously what token is formed by NOT_NEWLINE depends on the context.
That suggest to skip lexing completely (parsing is the phase that deals with context). But (in case this is possible) I'm afraid it would lead to a big performance hit.

4) Is there any case where the lexer makes things easier, given that is so inflexible (not context dependent)? Why do we even have a lexing phase?

P.S.: Obviously the above grammar grow to be much larger. Otherwise I would just do a substring and call it a day.

Thanks for your help,
Cristian

Cristian Vasile Mocanu

unread,
Feb 20, 2018, 9:07:33 AM2/20/18
to antlr-di...@googlegroups.com
The  book says this on page 76:
"ANTLR lexers resolve ambiguities between lexical rules by favoring the rule specified first."

Is the behavior that I see a bug (to favor NOT_NEWLINE even though PRINT is specified before?

For the record, I am using version 4.7.1 with maven (the files are generated with "antlr4-maven-plugin" and the production code is depending on "antlr4-runtime").

Mike Lischke

unread,
Feb 20, 2018, 11:37:11 AM2/20/18
to antlr-di...@googlegroups.com
> The book says this on page 76:
> "ANTLR lexers resolve ambiguities between lexical rules by favoring the rule specified first."
>
> Is what I see a bug?


It's not the entire truth. The actual approach is: the longest match wins. If there are 2 rules that match the same input then the first one in the grammar is used.

Mike
--
www.soft-gems.net

Cristian Vasile Mocanu

unread,
Feb 20, 2018, 12:02:38 PM2/20/18
to antlr-di...@googlegroups.com
Thanks Mike.

Then how can I match the grammar above? It looks like it should be very simple.

Does ANTLR have something like negative lookbehind from regex?
Is there another way?

Cristian Vasile Mocanu

unread,
Feb 22, 2018, 5:07:40 AM2/22/18
to antlr-discussion
I don't think that the length has anything to do with it.
If length would be the determiner, it would be really impossible to match any keyword in any language.

Actually, it works as expected when the not operator ~ is not used.

I have reported a bug: https://github.com/antlr/antlr4/issues/2229


On Tuesday, February 20, 2018 at 5:37:11 PM UTC+1, Mike Lischke wrote:

Mike Lischke

unread,
Feb 22, 2018, 9:39:04 AM2/22/18
to antlr-di...@googlegroups.com
> I don't think that the length has anything to do with it.

How wrong.


Mike
--
www.soft-gems.net

Reply all
Reply to author
Forward
Message has been deleted
0 new messages