Comments in source

673 views
Skip to first unread message

Bill Dickenson

unread,
Nov 18, 2020, 3:05:00 PM11/18/20
to antlr-discussion
The ANTLR Grammar sends all of the comments out to never never land (Hidden channel)

COMMENTENTRYLINE : COMMENTENTRYTAG WS? ~('\n' | '\r')*;
COMMENTLINE : (COMMENTTAG | ASTERISKCHAR | SLASHCHAR) WS? ~('\n' | '\r')* -> channel(HIDDEN);
WS : [ \t\f;]+ -> channel(HIDDEN);
SEPARATOR : ', ' -> channel(HIDDEN);

While that works generally, i happen to need them in the location where they are found. So I need the contents of the comments, except for the comment indicator. I don't care to see COMMENTTAG | ASTERISKCHAR etc.

My first attempt was to define text in the middle

COMMENTENTRYLINE : COMMENTENTRYTAG WS? CommentText? WS? ~('\n' | '\r')*;
COMMENTLINE : (COMMENTTAG | ASTERISKCHAR | SLASHCHAR)  WS? CommentText? WS? ~('\n' | '\r')* ;
CommentText:
       '123' Textforeval?
        |Textfordisplay?;
Textforeval : Identifier;
Textfordisplay : Identifier

But nothing shows up on the normal channel. I could just assign Hidden to another channel and come back but that seems a double waste of time.

Mike Lischke

unread,
Nov 19, 2020, 2:39:12 AM11/19/20
to antlr-discussion

The ANTLR Grammar sends all of the comments out to never never land (Hidden channel)

The hidden channel is just another channel (in fact just a different channel id) assigned to a token. The parser only considers tokens that have the main channel id set (0), but the tokens with other channel ids are still present in the token stream. So, all you have to do is to iterate over the tokens from the token stream, check their channel value and act on the tokens on the HIDDEN channel as you need it.


Bill Dickenson

unread,
Nov 20, 2020, 12:17:46 PM11/20/20
to antlr-discussion
Yeah thats what we ended up with. It just seems counter intuitive. Thanks for your help !

rtm...@googlemail.com

unread,
Nov 27, 2020, 5:38:11 PM11/27/20
to antlr-discussion

I may be missing something but why not just recognise comments same as any other lex token and return them to the parser? I admit I haven't tried it yet.

jan

Bill Dickenson

unread,
Nov 27, 2020, 6:09:53 PM11/27/20
to antlr-di...@googlegroups.com
We did try that and it didn't work. It seems like it should

--
You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/7PrXJsqMBH8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/e9ff8fc4-3ef2-49a8-8492-ed08fcf73ee4n%40googlegroups.com.

Tim Spriggs

unread,
Nov 27, 2020, 6:16:09 PM11/27/20
to antlr-di...@googlegroups.com
The fun thing about comments is they can happen just about anywhere. To accept that in a parser, the parser rules can get fairly hairy. Just throwing them out is often good enough for most purposes. Also, skipping them can see modest performance improvements if the comment to code ratio is high.

You received this message because you are subscribed to the Google Groups "antlr-discussion" group.


To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.


To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/CAJG_SJ76Bnww1L3CJi4AUbfWd6q6enbDwJkr%3D_7-Sk0LbnEsLQ%40mail.gmail.com.


Bill Dickenson

unread,
Nov 28, 2020, 10:29:35 AM11/28/20
to antlr-discussion
I get that, I do. I think there should be an option in Antlr to include or not. Comments do matter, even overdone ones. We are using Antlr as a preprocessor for our analysis tools and its done pretty well. Its just that as we see more and more code generators and other tools, we seem to be seeing a lot more actual information in  comments. So as I mentioned above, we "backfit" comments into the stream. Its not ideal, but it does work.

in line comments are particular pesky but they rarely have additional information, or even code readable information.
Full Line comments occasionally contain SBoM information, and sometimes the fingerprint for the module.

If it were a perfect world, I guess I have talked myself into having two sets of comment processes that dovetail. On my "to do" list is to modify the language comments so that they operate rationally.

//
// Whitespace and comments
//
WS  :  [ \t\r\n\u000C]+ -> skip
    ;
COMMENT
    :   '/*' .*? '*/' -> skip
    ;
LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;

Leave line_comment alone and simply grab comment

COMMENT
    :   '/*' COMMENT_DATA*? '*/'
    ;
COMMENT_DATA
  : '!VSC' STRUCTURED_COMMENT
  | UNSTRUCTURED_COMMENT
  ;

I think this discussion is haunted by Perfect being the enemy of good enough. Having now analyzed several thousand programs ( literally) I have found less than 5 situations where an inline comment was important.

Just a stray thought.

rtm...@googlemail.com

unread,
Nov 29, 2020, 8:17:19 AM11/29/20
to antlr-discussion
You're right, they can appear anywhere and yep, I can see it can get complicated - where in the AST should they go.. I was about to do what I suggested with my work and now I'm thinking I should have put some thought in before replying.
I can probably capture the list of them at the end of each parse rule, when I create the AST object, and that's likely sufficient for me, but not you.
I guess this bit "seeing a lot more actual information in  comments" means what amounts to make-do pragmas.

sorry!

jan

Bill Dickenson

unread,
Nov 29, 2020, 10:13:12 AM11/29/20
to antlr-di...@googlegroups.com
Thats actually a good point. In our case, we created a "node" and attached it to the non comment statement immediately below it. Running through the GOBS data, we found that more often than not, the comments preceded the code block when they were whole line comments.

// so if we had this set of comments
// we went ahead and attached it to def node
DEF my_dog_has_fleas():
// and if it was below it just gets attached to the next line (for)
    for I is lost in space

in any event, when we count the comments, we get 3 since we will count 2 attached to def and 1 attached to the 1
But again, its a judgement call.
Thanks




Reply all
Reply to author
Forward
0 new messages