Can a modern JavaScript grammar be written in ANTLR4 that doesn't end-up with target-language specific grammar extensions?Looking at the grammar of JavaScript, both ECMAScript 2015 as standardized, and the ANTLR grammar at https://github.com/antlr/grammars-v4/tree/master/ECMAScript (based on ECMAScript 5 I think.) I was a bit disappointed that the ANTLR4 example has multiple different grammar files depending on target language. That doesn't seem to meet the expectation that the grammars are free of actions statement included at the top of the antlr/grammars-v4 page. But in the case of for ECMAScript, I'm not sure, is that a reasonable goal?
...
I also was surprised to see the various taget variants of this grammar and I believe the ECMA grammar is a perfect example how bad it is to have so much target specific code in a grammar at all. However, it's actually relatively easy to circumvent all that. The better approach is to derive the generated parser + lexer classes from intermediate classes that implement all the logic you see in the grammar actions.
Most languages support the function syntax where you have an identifier with parenthesis and parameters, which return a value. Something like "isThisValid(ctx)" can be used in C/C++, Java, JS, C# etc. etc. So, by replacing all the code by calls to functions in a base class you can easily create a target independent grammar.
I think ANTLR might benefit from a target-language independent syntax to both declare and invoke abstract semantic predicates. The code generation templates could then emit abstract classes with pure-virtual functions and allow the predicates semantics to be implemented in target-language classes inheriting the generated code.
I think ANTLR might benefit from a target-language independent syntax to both declare and invoke abstract semantic predicates. The code generation templates could then emit abstract classes with pure-virtual functions and allow the predicates semantics to be implemented in target-language classes inheriting the generated code.I like this idea and I suggested the same issue earlier: Unified Actions Language.It benefits not only universal semantic predicates for different runtimes, but checking of them on code generation step, not code compilation.Grammar and semantic predicates will be merged into a single unit with an ability to parse context-sensitive languages.
Yes, I experimented earlier with a base class for the ECMAScript Lexer and into a few complications. For example is `isRegexPossible()` was tricky to define in a base class because its implementations references token constants generated by the tool. I'm not saying I need help to resolve that, its just part of the bigger picture.
Most languages support the function syntax where you have an identifier with parenthesis and parameters, which return a value. Something like "isThisValid(ctx)" can be used in C/C++, Java, JS, C# etc. etc. So, by replacing all the code by calls to functions in a base class you can easily create a target independent grammar.Yes, I'm with you, and agree the syntax are close, but note that in some languages (JavaScript in particular), the method (or property) name **has** to be prefixed with a `this.` to get invoked correctly, and in Python the required syntax seems to be `self.`. Close only counts in horseshoes and hand grenades.