Multiline single quote string detection not working in [JavaScript target for ANTLR 4]

249 views
Skip to first unread message

Nayan Choudhary

unread,
Jan 10, 2021, 11:57:34 PM1/10/21
to antlr-discussion
Can anyone comment if this is bug in the JavaScript target or I am missing something?

Problem
I created a grammar and tested via Antlr4 plugin in IntelliJ. The sample test passed. But the same grammar and same test is generating error in JavaScript target.

What's special?
I want single quote, double quote to be multiline strings, by default.


Grammar:
---------------------------
grammar scratch_1;

options {
language = JavaScript;
}

// Parser

document : definition+;

definition        : table_declaration;
table_declaration : TABLE IDENTIFIER CURLY_OPEN tableBodyLine+ CURLY_CLOSE;
tableBodyLine     : fieldDefinition;
fieldDefinition   : IDENTIFIER IDENTIFIER annotations?;
annotations       : SQUARE_BRACKET_OPEN note SQUARE_BRACKET_CLOSE;
note              : NOTE COLON REGULAR_STRING;

// Lexer

SINGLE_LINE_COMMENT : '//' InputCharacter*    -> channel(HIDDEN);
DELIMITED_COMMENT   : '/*' .*? '*/'           -> channel(HIDDEN);
WHITESPACES         : (Whitespace | NewLine)+ -> channel(HIDDEN);

TABLE : 'table';
NOTE  : 'note';

IDENTIFIER : IdentifierOrKeyword;

REGULAR_STRING : ('"' (SimpleEscapeSequence | ~'"')* '"') | ('\'' (SimpleEscapeSequence | ~'\'')* '\'');

SQUARE_BRACKET_OPEN  : '[';
SQUARE_BRACKET_CLOSE : ']';
CURLY_OPEN           : '{';
CURLY_CLOSE          : '}';
COLON                : ':';

// Fragments

fragment InputCharacter           : ~[\r\n\u0085\u2028\u2029];
fragment IdentifierOrKeyword      : IdentifierStartCharacter IdentifierPartCharacter*;
fragment IdentifierStartCharacter : [a-zA-Z_];
fragment IdentifierPartCharacter  : [0-9a-zA-Z_];
fragment SimpleEscapeSequence     : '\\\'' | '\\"' | '\\\\' | '\\0' | '\\a' | '\\b' | '\\f' | '\\n' | '\\r' | '\\t' | '\\v';

fragment NewLine:
'\r\n'
| '\r'
| '\n'
| '\u0085' // <Next Line CHARACTER (U+0085)>'
| '\u2028' //'<Line Separator CHARACTER (U+2028)>'
| '\u2029'
;

fragment Whitespace:
UnicodeClassZS //'<Any Character With Unicode Class Zs>'
| '\u0009' //'<Horizontal Tab Character (U+0009)>'
| '\u000B' //'<Vertical Tab Character (U+000B)>'
| '\u000C'
;

fragment UnicodeClassZS:
'\u0020' // SPACE
| '\u00A0' // NO_BREAK SPACE
| '\u1680' // OGHAM SPACE MARK
| '\u180E' // MONGOLIAN VOWEL SEPARATOR
| '\u2000' // EN QUAD
| '\u2001' // EM QUAD
| '\u2002' // EN SPACE
| '\u2003' // EM SPACE
| '\u2004' // THREE_PER_EM SPACE
| '\u2005' // FOUR_PER_EM SPACE
| '\u2006' // SIX_PER_EM SPACE
| '\u2008' // PUNCTUATION SPACE
| '\u2009' // THIN SPACE
| '\u200A' // HAIR SPACE
| '\u202F' // NARROW NO_BREAK SPACE
| '\u3000' // IDEOGRAPHIC SPACE
| '\u205F'
;


Test input:
---------------------------
table a {
int id [note: 'check ']
}


IntelliJ Antlr4 Plugin Graph - Works fine




My Language Service code
---------------------------
const scratch_1Lexer = require('../grammar/antlr/scratch_1Lexer').default;
const scratch_1Parser = require('../grammar/antlr/scratch_1Parser').default;

const antlr4 = require("antlr4");
const CommonTokenStream = antlr4.CommonTokenStream;
const error = antlr4.error;
const CharStreams = antlr4.CharStreams;

class MyErrorListener extends error.ErrorListener {
syntaxError(recognizer, offendingSymbol, line, column, msg, e) {
console.log(`ERROR [${line}:${column}] - ${msg}`, offendingSymbol);
}
}

module.exports = class MyLanguageService {
parse(text) {
if (text.length === 0) return;

const stream = CharStreams.fromString(text);
const lexer = new scratch_1Lexer(stream);
lexer.removeErrorListeners();
lexer.addErrorListener(new MyErrorListener());

const tokens = new CommonTokenStream(lexer);
const parser = new scratch_1Parser(tokens);
parser.removeErrorListeners();
parser.addErrorListener(new MyErrorListener());

const tree = parser.document();
}
}


Error noted via JavaScript target
---------------------------
ERROR [2:14] - token recognition error at: ''ch' null
ERROR [2:21] - token recognition error at: '']
' null
ERROR [2:17] - mismatched input 'eck' expecting REGULAR_STRING CommonToken {source: Array(2), type: 15, channel: 0, start: 27, stop: 29, ...}
ERROR [3:0] - missing IDENTIFIER at '}' CommonToken {source: Array(2), type: 26, channel: 0, start: 34, stop: 34, ...}


Strange fact
---------------------------
This test sample works, with above everything, if I replace single quote with double quote:

table a {
int id [note: "check "]
}


Package.json dependencies
---------------------------
  "dependencies": {
...
    "antlr4": "^4.9.1",
...
}

Mike Cargal

unread,
Jan 11, 2021, 10:25:08 AM1/11/21
to antlr-discussion
It works fine using the Java target.

The project I worked on just before retirement allowed for 'strings' and "strings" and we targeted Java for the server side, but also used the Javascript target inside the browser, so this can definitely work in the JavaScript target.  (I don't have access to the grammer we used anymore).

Do you have a repo that would make it easy to reproduce this?

Nayan Choudhary

unread,
Feb 7, 2021, 7:04:24 AM2/7/21
to antlr-discussion
No, but the code above can tested for reproduction.

Mike Cargal

unread,
Feb 7, 2021, 9:29:35 AM2/7/21
to ANTLR List
I'm sure that's true, but anyone attempting to help you is going to have to put together the scaffolding code to execute your test.

That may not be a LOT of work (and not necessarily difficult), but it's work you're asking someone to do to help you out.

A repo that has everything in place to just build and run and reproduce the problem increases the likelihood someone will look further into what it doesn't work.

Sometimes, I'll be interested and put together some calling code.  In your case, it looks like I'd have to set up an express server(?? maybe ??).  I'm sure that's not difficult, but it's work I'd have to do to get to the point I could look at your problem.

I'm not sating you've done anything "wrong".  You've provided a good bit of info, but it would appear that most people reading it haven't wanted to invest the time to set it up and dive into it (It's been out here nearly w month without response).

If you had a support contract from a company, this is probably enough to expect them to response.  You're looking for volunteers here... the easier you make it the more likely they are to "bite"

Hope that helps... not trying to be critical, just trying to make a suggestion that improves your chances of getting assistance.

Another option is to post to StackOverflow.  There, you'll possibly find users that are working to build reputation points that'll do extra work for that "reward"

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/cf678027-efe8-412f-9828-236a66878335n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages