Checkstyle Project: Avoid changes in ANTLR generated token and rule numbers with updates in grammar

10 views
Skip to first unread message

ps

unread,
Jan 6, 2018, 3:19:19 PM1/6/18
to antlr-discussion
Checkstyle uses ANTLR to generate parser for javadocs. (JavadocParser.g4, JavadocLexer.g4). An issue at github requires new rules and tokens be added to the grammars to support a few HTML 5 tags. Lexer is able to particularly identify a certain number of HTML tags which are widely used (like <p> <li> <br> and so on) and all the other tags are treated generically as HTML_TAG_NAME.

Checkstyle needs to be preserve the tokens' and rules' values/constants that are provided to the user via JavadocTokenTypes.java which currently maps to the values which are internally used by the ANTLR parser. Most of the new rules and tokens can be added at the end of the grammar which doesn't incite any changes to the existing values. 

Since ANTLR resolves ambiguity by utilizing the rule or the token that matches first, it is necessary that all the new tokens for the HTML 5 tags be added before the token HTML_TAG_NAME. Correct me if I am wrong but ANTLR doesn't provide any method to manually control the tokens' and rules' numbers rather it automatically generates the values in the same order in which it encounters the rules or the tokens. This is forcing a change in token value for HTML_TAG_NAME.

Hard coding values in JavadocTokenTypes.java is something we want to avoid. Is there any other way to overcome this problem.

NOTE: 


Useful links:

Reply all
Reply to author
Forward
0 new messages