If by “my output contains escape characters” you’re referring to \r\n\t etc., that’s just an artifact of how Java represents the string when it contains carriage returns, new lines, and tabs. You’re string doesn’t really contain “\r”, it contains a byte with a value of 13 (aka, a carriage return), which Java will print out as “\r” since it has no other visual representation.
Your grammar says that a DocComment is everything from ‘/**’ to ‘*/‘ inclusive, so that includes all whitespace as well as the embedded *’s. If it’s undesirable, it would be easy enough to remove the beginning /** and trailing */. You could also use a regex to identify, and remove any instances of a \r\n\s*\* from your string. This will remove the the carriage return, linefeed, any whitespace before an * and the * itself, so you’ll lose some information regarding the line breaks. If those are important to you you can get more sophisticated and only remove the *’s that are preceded by a carriage return linefeed and whitespace.
You *could* try a parser rule for docComments, where you get more specific and assign labels to the parts you’re interested in maintaining. As it is, you’ve defined DOC_COMMENT as a token, so it’s all or nothing:
Using a parser rule would be something more like the following (not tested, so expect to tweak a few typos)
DOC_COMMENT_START : ‘/**’
DOC_COMMENT_END : ‘*/‘
DOC_COMMENT_BREAK : ‘\r?\n\s*\*’
docComment:
DOC_COMMENT_START
cmt=.*?
(DOC_COMMENT_BREAK cmt=.*?)*
DOC_COMMENT_END
This is still going to leave carriage returns and line feeds that aren’t followed by whitespace and an *, so modify accordingly if you want to exclude them as well.. There are a number of ways to handle this depending upon how much of the comment formatting you want to preserve.
In general, I advise folks to avoid getting to clever in the grammar and just expect it to properly identify and classify the input stream. Then you can write code to handle the details, so there’s something to be said for just sticky with a simple token that you know brings in the whole DOC_COMMENT, and then pulling the information you want out in your own code.
Hope that helps more than it confuses… it’s a bit early in the morning and I may not have done the best job explaining.