How to sort out proper comment lexer rules

16 views
Skip to first unread message

Michael Powell

unread,
Mar 24, 2019, 4:26:14 PM3/24/19
to antlr-discussion

Hello,

I have lexer rules for comments something like this:

SINGLELINE: '//' ~[\r\n]* NEWLINE -> channel(HIDDEN);

MULTILINE: '/*' .+? '*/' -> channel(HIDDEN);

WS: [ \r\n\t]+ -> channel(HIDDEN);

NEWLINE: '\r'? '\n';

However, my parser listener is rejecting the parse of the following:

syntax  =  'proto2'  ;  // cf7679b2-dd90-421b-95e4-9f5f9b74c29d

With the following message:

System.InvalidOperationException : line 1, column 25, symbol '/': mismatched input '/' expecting {<EOF>, 'enum', ';', 'extend', 'import', 'message', 'option', 'package'}

With the root Proto rule:

proto: syntaxDecl ( importDecl | packageDecl | optionDecl | topLevelDef | emptyDecl )* EOF;

Somehow not seeing comments prior to EOF? How do I convince the lexer/parser rules to skip the comments prior to EOF?

Cheers,

Michael Powell

Michael Powell

unread,
Mar 24, 2019, 4:32:23 PM3/24/19
to antlr-discussion


On Sunday, March 24, 2019 at 4:26:14 PM UTC-4, Michael Powell wrote:

Hello,

I have lexer rules for comments something like this:

SINGLELINE: '//' ~[\r\n]* NEWLINE -> channel(HIDDEN);

Turns out it was a minor tweak:

SINGLELINE: '//' ~[\r\n]* NEWLINE? -> channel(HIDDEN); 

For single line comments occurring at the very end just prior to EOF.

Mike Lischke

unread,
Mar 25, 2019, 3:53:42 AM3/25/19
to antlr-discussion
I have lexer rules for comments something like this:

SINGLELINE: '//' ~[\r\n]* NEWLINE -> channel(HIDDEN);

Turns out it was a minor tweak:

SINGLELINE: '//' ~[\r\n]* NEWLINE? -> channel(HIDDEN); 

For single line comments occurring at the very end just prior to EOF.

That extra NEWLINE makes no sense. You are already looping until a line break. You can just remove that entirely.


Michael Powell

unread,
Mar 25, 2019, 9:00:58 AM3/25/19
to antlr-discussion


On Monday, March 25, 2019 at 3:53:42 AM UTC-4, Mike Lischke wrote:
I have lexer rules for comments something like this:

SINGLELINE: '//' ~[\r\n]* NEWLINE -> channel(HIDDEN);

Turns out it was a minor tweak:

SINGLELINE: '//' ~[\r\n]* NEWLINE? -> channel(HIDDEN); 

For single line comments occurring at the very end just prior to EOF.

That extra NEWLINE makes no sense.

I'm not sure what you mean. What "extra" NEWLINE? It's not extra. Read what it says. The comment consumes every non-(CR or NL) character until it reaches a NEWLINE (or in my case, was failing on EOF).

That's because there was no terminal NEWLINE, but rather EOF. I suppose to be proper, I could say ( NEWLINE | EOF ), correct?
 
You are already looping until a line break. You can just remove that entirely.

Actually, you cannot, because you want the single line comment to be consumes through (and including) the NEWLINE. Otherwise, when does it end?

Mike Lischke

unread,
Mar 25, 2019, 12:11:31 PM3/25/19
to antlr-discussion

I'm not sure what you mean. What "extra" NEWLINE? It's not extra. Read what it says. The comment consumes every non-(CR or NL) character until it reaches a NEWLINE (or in my case, was failing on EOF).

That's because there was no terminal NEWLINE, but rather EOF. I suppose to be proper, I could say ( NEWLINE | EOF ), correct?
 
You are already looping until a line break. You can just remove that entirely.

Actually, you cannot, because you want the single line comment to be consumes through (and including) the NEWLINE. Otherwise, when does it end?

You want the new line to be part of the single line comment? That's pretty unusual. Normally line breaks and comments are handled separately. If you do it that way it doesn't matter if there's a final newline or not (the `~[\r\n]*` part also stops without error when no input is available anymore.


Michael Powell

unread,
Mar 25, 2019, 12:33:13 PM3/25/19
to antlr-discussion
I see, then the grammar is naturally concise and extensible. 

Reply all
Reply to author
Forward
0 new messages