grammar Day9; expr : '{}' # EmptyGroup | '<>' # EmptyGarbage | '<' garb '>' # Garbage | '{' expr '}' # Group | expr ',' expr # Multi ; garb : '!!' # Ignore | '!' CHAR # CanceledGarbage | CHAR #CharGarbage | garb garb #MultiGarbage ; CHAR : [0-9a-zA-Z] | ',' | '_' | '<' | '>' | '{' | '}' | '"' | '\'' ;
{<!>>}
Here the curly braces should be parsed as a Group, and then the first and last pointy braces should be Garbage, and inside the garb rule it should become CanceledGarbage. I mean, by hand I can parse the string this way...
But ANTLR does it differently:
line 1:3 missing CHAR at '>'
line 1:4 extraneous input '>' expecting '}'
So for some reason the parser says it is missing a char at the token '>'. But that doesn't make any sense to me: CHAR can be '>', it's right there in the token rule! Indeed, removing the '<' from CHAR doesn't change the output in the slightest. I can also put all characters between the [], it doesn't change a thing.
And okay, this could be because for some reason, ANTLR is treating the first '>' as the closing pointy brace of the Garbage rule. But then it doesn't make sense that it afterwards complains about an extra '>' before the closing curly brace '}'. ANTLR is supposed to match as long as it can, isn't it?
Can anyone tell me what I am doing wrong and how I can fix my grammar to make '>' indeed be part of CHAR, and parsed that way? There is really only one unambigious way to parse the example string correctly, but ANTLR apparently isn't finding it and I don't understand why.
grammar Day9; expr : '{}' # EmptyGroup | '<>' # EmptyGarbage | '<' garb '>' # Garbage | '{' expr '}' # Group | expr ',' expr # Multi ; garb : '!!' # Ignore | '!' CHAR # CanceledGarbage | CHAR #CharGarbage | garb garb #MultiGarbage ; CHAR : [0-9a-zA-Z] | ',' | '_' | '<' | '>' | '{' | '}' | '"' | '\'' ;
The grammar is supposed to parse everything between < and > as "garbage" which can later be discarded. Additionally, every character after an exclamation mark should be ignored, meaning that even a closing > should be ignored if an exclamation mark is right in front of it. As a tiny example, consider the string{<!>>}
Here the curly braces should be parsed as a Group, and then the first and last pointy braces should be Garbage, and inside the garb rule it should become CanceledGarbage. I mean, by hand I can parse the string this way...
But ANTLR does it differently:
Thank you, your explanation about explicitly defining tokens does make sense, I was not aware of the implicit definitions causing problems. However, defining them explicitly and making sure each token is unambiguous/non-overlapping doesn't seem to work either. It outputs the (very peculiar) error:
line 1:0 mismatched input '{' expecting {'{', '<'}
Why can't it match the leading '{' (which should now be the same
token everywhere, right?) when it expects '{' or '<'? That
seems self-contradictory, I don't understand what's wrong.
Here is the modified grammar:
rammar Garbage; expr : CURLY_OPEN CURLY_CLOSE # EmptyGroup | LESS_THAN GREATER_THAN # EmptyGarbage | LESS_THAN garb GREATER_THAN # Garbage | CURLY_OPEN expr CURLY_CLOSE # Group | expr COMMA expr # Multi ; garb : EXCLAMATION EXCLAMATION # Ignore | EXCLAMATION char # CanceledGarbage | char # CharGarbage | garb garb # MultiGarbage ; char : CHAR | COMMA | LESS_THAN | GREATER_THAN | CURLY_OPEN | CURLY_CLOSE ; CHAR : [0-9a-zA-Z_"\']; CURLY_OPEN: '{'; CURLY_CLOSE: '}'; LESS_THAN: '<'; GREATER_THAN: '>'; EXCLAMATION: '!'; COMMA: ',';
--
You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/Ah4ajVkSX_A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Am 10.12.2017 um 18:05 schrieb Natanji <nat...@gmail.com>:line 1:0 mismatched input '{' expecting {'{', '<'}
But look at the modified grammar I provided: the only token in which '{' appears is CURLY_OPEN. I can literally Ctrl+F search the grammar file for all occurences of '{' and there is only a single one. There is also no implicit token defined anywhere in the grammar, from what I can tell. So I don't understand your explanation.
-Natanji
But look at the modified grammar I provided: the only token in which '{' appears is CURLY_OPEN. I can literally Ctrl+F search the grammar file for all occurences of '{' and there is only a single one. There is also no implicit token defined anywhere in the grammar, from what I can tell. So I don't understand your explanation.
Okay, you're right - I figured it out now. Thank you for your time. And phew, this turned out much more complicated than I would have thought. Writing correct grammars is hard!
-Natanji