--
--
>Jim
Jim,
I think I would prefer to have a "clean" string literal as early in the tool chain as possible so that none of the later points have to know that the string had ever needed scanning. That allows me to make a distinction between strings that needed escape processing because the user intended them to be that way (e.g. read in from a file that has no actual newlines but indicates their location with backslash-n) versus string literals in the code. Instead of every parser rule that takes STRING_LITERAL as a token needing to de-escape the string, they all are handed strings as neatly pre-processed tokens. Which I think is what the lexer ought to be doing.
I see your point, though. There ARE circumstances where you want to leave the decision to de-escape or not de-escape to a later point in the tool chain. So I added single-quoted string literals to my grammar, and those have no de-escaping performaed on them. Similar to the SQL standard or VB string literals, a single-quoted string literal begins and ends with the "tick" symbol (') and uses a sequence of two ticks in a row to indicate tick symbol WITHIN the literal. Unix shells do similar things. A single-quote string does not get $ expansion or escape processing, whereas a double-quote string does. My grammar calls RegEx functions a lot, so not having to use four backslashes to match a single backslash character is a real help.
The one takeaway I get from these discussion boards is that if you ask ten ANTLR developers a question, you'll get ten different answers, and none of them will agree with the answer you would have given them if they asked you the same question. Whoever said "great minds think alike" was an optimist!
--