Checkstyle Project: Avoid changes in ANTLR generated token and rule numbers with updates in grammar

10 views

Skip to first unread message

ps

unread,

Jan 6, 2018, 3:19:19 PM1/6/18

to antlr-discussion

Checkstyle uses ANTLR to generate parser for javadocs. (JavadocParser.g4, JavadocLexer.g4). An issue at github requires new rules and tokens be added to the grammars to support a few HTML 5 tags. Lexer is able to particularly identify a certain number of HTML tags which are widely used (like <p> <li> <br> and so on) and all the other tags are treated generically as HTML_TAG_NAME.

Checkstyle needs to be preserve the tokens' and rules' values/constants that are provided to the user via JavadocTokenTypes.java which currently maps to the values which are internally used by the ANTLR parser. Most of the new rules and tokens can be added at the end of the grammar which doesn't incite any changes to the existing values.

Since ANTLR resolves ambiguity by utilizing the rule or the token that matches first, it is necessary that all the new tokens for the HTML 5 tags be added before the token HTML_TAG_NAME. Correct me if I am wrong but ANTLR doesn't provide any method to manually control the tokens' and rules' numbers rather it automatically generates the values in the same order in which it encounters the rules or the tokens. This is forcing a change in token value for HTML_TAG_NAME.

Hard coding values in JavadocTokenTypes.java is something we want to avoid. Is there any other way to overcome this problem.

NOTE:

Not all of the ANTLR generated token values are supplied to the user in JavadocTokenTypes.java but only the ones that he or she would need.
mode xmlTagDefinition has been split (L267, L399) to preserve the values of the tokens in modes following it.