Hi all,
I'm trying parse multi-line string literals that start and end with triple double quotes `"""` (similar as found in languages like Kotlin and Scala). The string literals should allow up to two consecutive (unescaped) double quotes anywhere in the string body, like in:
"""
Hello ""World""!
"""
I'll paste / link below a simplified grammar that comes pretty close to the requirement above, however doesn't work with trailing double quotes before the closing token `"""`. See some examples below:
"""""
Additional double quotes in the beginning SUCCEED.
"""
"""
Additional double quotes in the body "" or at the end of a line SUCCEED.""
"""
But:
"""
Additional double quotes before the closing token FAIL!
"""""
Note that the multi-line strings are not purely lexical, since the full grammar does support also string interpolation.
Therefore I would appreciate feedback that would help me supporting up to two consecutive trailing double quotes in the grammar structure that is already in place (i.e., where multi-line strings are defined in the lexical and also syntax part).
I appreciate your help.
Best regards,
Michael
---- ExampleLexer.g4 ----
lexer grammar ExampleLexer;
channels {
WHITESPACE,
COMMENTS
}
WS
: [ \t\r\n\f]+ -> channel(WHITESPACE)
;
TRIPLE_QUOTE
: '"""' -> pushMode(MultiLineString)
;
mode MultiLineString;
END_TRIPLE_QUOTE
: '"""' -> popMode
;
MLStringChars
: (MLUnescapedDoubleQuotes? ~["\\$\r\n])+
;
MLNewline
: '\r' '\n'? | '\n'
;
MLUnescapedDoubleQuotes
: '"' '"'?
;
---- ExampleParser.g4 ----
parser grammar ExampleParser;
options {
tokenVocab = ExampleLexer;
}
stringLiteral
: TRIPLE_QUOTE multiLinePart* END_TRIPLE_QUOTE
;
multiLinePart
: (ts+=MLStringChars | ts+=MLNewline | ts+=MLUnescapedDoubleQuotes)+
;