Not sue how helpful this will be… is this “jq” as in
https://stedolan.github.io/jq/ ?? (A quick glance at the grammar leads me to think that’s possibly just a naming coincidence)
If it’s not, then, just can’t help but comment that the right to left evaluation must be about the most counter-intuitive/confusing way to evaluate expressions I’ve ever seen. If this is a grammar you have discretion over (rather than implementing something that’s already established, I’d suggest changing that (and I wouldn’t be surprised if that’s a factor in your performance.
Leaving that aside, I do see a few other things that seem “unusual” (at least to me)
-------------------
I see this pattern in a few places:
multiAssign:(assign* (';' assign)*);
It appears that you’re trying to say that a multiAssign is zero or more assigns separated by a semicolon, but this definition makes the semi-colon effectively optional, since assign* matches 0 or more assigns without semicolons. I suspect you mean
multiAssign:(assign? (';' assign)*);
——————
csvq: (expr?|(expr (',' expr)*));
Prettysure this could just be:
csvq: (expr? (',' expr)*));
—————
MONOP:('til'|'enlist'|'first'|'last'|'distinct'|'count'|'key'|'where'|'reverse'|'null'|'not' |
| |'raze'|'rotate'|'show'|'system' |
| |'desc'|'asc'|'++'|'--'|'til'|'get'|'abs'|'all'|'any'|'avg'|'avgs'|'exp'|'floor'|'ceiling' |
| |'cos'|'sin'|'tan'|'acos'|'asin'|'atan'|'exp'|'log'|'fills'|'flip' |
| |'mcount'|'type'|'attr'|'reciprocal'|'sqrt' |
| |'svar'|'sdev'|'var'|'dev'|'differ'|'getenv'|'group'|'iasc' |
| |'idesc'|'::' |
| |'max'|'maxs'|'min'|'mins'|'med'|'mmu'|'read0'|'read1'|'prd'|'prds'|'exit'|'neg'|'inv' |
| |'rand'|'ratios'|'ratios'|'signum'|'value' |
| |'trim'|'rtrim'|'ltrim'|'upper'|'lower'|'string' |
| |'hcount'|'hdel'|'hsym'|'hopen'|'hclose' |
| |'gtime'|'ltime'|'parse'|'views'|'tables'); // exotic |
I suspect you’ll find it easier going later on if this is a parse rule rather than a Lexer ruler, and each of these keywords is defined as a separate token.
monop: TIL
| ENLIST
| FIRST
…
TIL: ’til’
ENLIST: “enlist”
FIRST: “first”
——————————
Overall, I see quite a few places where I’d suggest that you’re not using parser rules, Lexer rules, and fragments as designed. (For example, I’m even a bit (just a bit) surprised that you’re able to define Lexer rules that include other Lexer rules (generally that’s the point of fragments)
You might be well served to make sure you have a solid grasp of the ANTLR pipeline of tokenizing a stream of characters and then applying parser rules to the stream of tokens. (Or to word it differently, if you’re not solid on that, you definitely want to understand how that process works)
Re:
TIMELIST:TIM (' ' TIM)+ ('n'|'u'|'v'|'t')?; |
| TIME:TIM ('n'|'u'|'v'|'t')?; |
| TIM:Specials | DD | DD? TI | (DD? TI ':' [0-9] [0-9]) | (DD? TI ':' [0-9] [0-9] '.' Digits?); |
| DD:([-])* Digits 'D'; |
| TI:([-])* [0-9] [0-9] ':' [0-9] [0-9]; |
This also hints at a common poor choice that many people make early in grammar development to attempt to encode as much validation as possible into the grammar. I’ve found that it works much better to think of the grammar as a way of saying “this is the only way to interpret this input”, and then you can do your own validation of whether something is a valid time etc. Generally, you’ll wind up with much better error messages. The idea here would be to have enough information in the grammar to distinguish between different inputs, and not much more. Antlr gives pretty good error messages, but they’ll, necessarily be a bit more generic and technical than what you’d be able to do in validation.
=================
Re:
OK, clearly not the “jq” that I’m familiar with :)
I’m a bit confused about what “output” you expect to match.
I just retired, and may play around a bit with this just for kicks (briefly), so I’m curious what I’d do to validate changes.