Checkstyle uses ANTLR to generate parser for javadocs. (
JavadocParser.g4,
JavadocLexer.g4). An
issue at github requires new rules and tokens be added to the grammars to support a few HTML 5 tags. Lexer is able to
particularly identify a certain number of HTML tags which are widely used (
like <p> <li> <br> and so on) and all the other tags are treated generically as
HTML_TAG_NAME.
Checkstyle needs to be preserve the tokens' and rules' values/constants that are provided to the user via
JavadocTokenTypes.java which currently maps to the values which are internally used by the ANTLR parser. Most of the new rules and tokens can be added at the end of the grammar which doesn't incite any changes to the existing values.
Since ANTLR resolves ambiguity by utilizing the rule or the token that matches first, it is necessary that all the new tokens for the HTML 5 tags be added before the token
HTML_TAG_NAME. Correct me if I am wrong but ANTLR doesn't provide any method to manually control the tokens' and rules' numbers rather it automatically generates the values in the same order in which it encounters the rules or the tokens. This is forcing a change in token value for
HTML_TAG_NAME.
Hard coding values in
JavadocTokenTypes.java is something we want to avoid. Is there any other way to overcome this problem.
NOTE:
Useful links: