Nested Comments In ANTLR

918 views
Skip to first unread message

Dennis Ashley

unread,
Mar 31, 2016, 5:08:22 PM3/31/16
to antlr-discussion
Is it possible to have ANTLR itself recognize nested comments?

I assume it would require a change in its grammar and recompile!  This would mean that the new compilation would be non-standard / non-conforming.

Is it possible to have the standard ANTLR grammar modified as such?  Below is what I have in mind.

What are the negative impacts of such a modification?
1) EOF not handled in the proposal
2) May break some grammars with nested comments
3) ??? 
4) ...

Standard ANTLR Lexer Grammar:
    // -------------------------
    // Comments
    
    DOC_COMMENT
        :   DocComment
        ;
    
    BLOCK_COMMENT
        :   BlockComment    -> channel(OFF_CHANNEL)
        ;
    
    LINE_COMMENT
        :   LineComment     -> channel(OFF_CHANNEL)
        ;
    
    fragment DocComment 
        : '/**' .*? ('*/' | EOF)
        ;
    fragment BlockComment
        : '/*'  .*? ('*/' | EOF)
        ;
    fragment LineComment
        : '//' ~[\r\n]*
        ;

Potential Replacement Lexer Rules:
    /** Documentation COMMENTs will nest.  As a Multi-line Comment
        all other Comment types are allowed to nest in a
        Documentation Comment.
     */
    COMMENT_DOC
        : '/**'
          (
            ( /* Must never match an '/' in position 4 here,
                 otherwise there is a conflict with the
                 definition of COMMENT_BLK
               - '/ * * /' is an empty Block Comment
               - Nesting not allowed at position 4 due to conflict
               */
              [*]* ~[*/]    // No '/' nor repeating '*' followed by '/'
            )
            ( COMMENT_DOC
            | COMMENT_BLK
            | COMMENT_INL
            | .
            )*?
          )?
          '*'+ '/'
        ;
    /** Block COMMENTs will nest.  As a Multi-line Comment all other
        Comment types are allowed to nest in a Block Comment.
     */
    COMMENT_BLK
        : '/*'
          (
            ( /* Must never match an '*' in position 3 here,
                 otherwise there is a conflict with the
                 definition of COMMENT_DOC
               - '/ * *' Starts a Documentation Comment
               - Nesting allowed, No conflict at position 3
               */
              [/]? ~[*/]    // No '//', '/*' nor '*'
            | COMMENT_DOC
            | COMMENT_BLK
            | COMMENT_INL
            )
            ( COMMENT_DOC
            | COMMENT_BLK
            | COMMENT_INL
            | .
            )*?
          )?
          '*/'
          -> channel( OFF_CHANNEL )
        ;
    /** Inline COMMENTs will nest, however all nesting will be on
        the same line.  The NEWLINE character is never consumed and
        terminates all nesting levels.  Multiline Comments do not
        nest inside Single Line Comments, therefore you cannot start
        nor end a Multiline Comment in a Single Line Comment.
     */
    COMMENT_INL
        : '//'
          ( ~[\n\r]     // No NEW_LINE
          | COMMENT_INL // COMMENT_INL, min 2 char, perferred vs ~[\n\r], max 1 char
          )*
          -> channel( OFF_CHANNEL )
        ;


the_antlr_guy

unread,
Apr 7, 2016, 1:48:44 PM4/7/16
to antlr-discussion
Hi.  I'm not sure the need for nested comments is great enough to overcome the inertia of changing the metalanguage.
Ter

Dennis Ashley

unread,
Apr 8, 2016, 11:49:48 AM4/8/16
to antlr-discussion

Thanks for the reply.


I agree there is no need exactly.  This is definitely more of a preference.


I’ve just never liked the way comments are handled in most contexts.  If I want to comment a large block of code/grammar that already has comments, I should not have to find the existing closing ‘*/’ and change it somehow.  If comments nested, this would not be an issue.  I could put my ‘/*’ and ‘*/’ in and not think about comment issues while coding.  Also, if I want to use ‘/*’ or ‘*/’ in a comment itself, I can precede it with ‘//’ and not think about beginning nor ending a multiline comment.


I know existing comments are implemented the way they are for historical reasons.  Several programming languages including the C family do not use nested comments.  I imagine that was to simplify the lexer.  However, developer preference/convenience should also be considered when the opportunity is available.

Ian Utley

unread,
Apr 8, 2016, 12:50:41 PM4/8/16
to antlr-di...@googlegroups.com
Have you thought about introducing a preprocessing stage to your workflow? It sounds like you want to, for example, mark a section of the grammar to ignore by surrounding it with a #if false .....#endif type construct and use a preprocessing step to produce the cut down grammer to be used by ANTLR, rather than attempting to use ANTLRs own commenting.

There are probably some general purpose preprocessing tools that could do this, or use the ANTLR4 grammar to write an ANTLR preprocessor! ;-)

Dennis Ashley

unread,
Oct 16, 2017, 11:05:22 AM10/16/17
to antlr-discussion
I think I finally came up with rules that perform the way I want them to:

    /** Documentation COMMENTs will nest.  As a Multi-line Comment
         all other Comment types are allowed to nest in a
         Documentation Comment.
     *- Position 4 can never be an '/' in an Documentation Comment,
         '/ * * /' is an empty Block Comment.   Nesting not allowed at
         position 4 due to conflict
     *- I want to recognize '/ * * ... / * /', therefore '/ * /' should 
         be allowed to end a comment.  However, the first '* /' always
         ends the comment
     *- The comment prefix and suffix should never be recognized in the
         body of a comment 
     */
    COMMENT_DOC
        : '/**'
          ( '*'+?
          | '*'* ~[*/]
            ( ('/'|'*'+)? ~[*/] | COMMENT_DOC | COMMENT_BLK | COMMENT_INL )*?
            ('/'|'*'+?)?
          )??
          '*/'
          -> channel( OFF_CHANNEL )
        ;
    /** Block COMMENTs will nest.  As a Multi-line Comment all other
         Comment types are allowed to nest in a Block Comment.
     *- Position 3 can only be an '*' in an empty Block Comment, 
         otherwise there is a conflict with Documentation Comment.
     *- I want to recognize '/ * ... / * /', therefore '/ * /' should
         be allowed to end a comment.  However, the first '* /' always
         ends the comment
     *- The comment prefix and suffix should never be recognized in
         the body of a comment 
     */
    COMMENT_BLK
        : '/*'
          ( '/'
          | ( '/'? ~[*/]        | COMMENT_DOC | COMMENT_BLK | COMMENT_INL )
            ( ('/'|'*'+)? ~[*/] | COMMENT_DOC | COMMENT_BLK | COMMENT_INL )*?
            ('/'|'*'+?)?
          )??
          '*/'
          -> channel( OFF_CHANNEL )
        ;
     /** Inline COMMENTs will nest, however all nesting will be on
        the same line.  The NEWLINE character is never consumed and
        terminates all nesting levels.  Multiline Comments do not
        nest inside Single Line Comments, therefore you cannot start
        nor end a Multiline Comment in a Single Line Comment.
     */
    COMMENT_INL
        : '//'  ( COMMENT_INL | ~[\n\r] )* -> channel( OFF_CHANNEL )
        ;
Reply all
Reply to author
Forward
0 new messages