I'm using ANTLR4 to generate a parser. I am new to parser grammars. I've read the very helpful ANTLR Mega Tutorial but I am still stuck on how to properly order (and/or write) my lexer and parser rules.
I want the parser to be able to handle something like this:
Hello <<name>>, how are you?
At runtime, after parsing, I will replace "<<name>>" with the user's name, for instance, so that it should come out as "Hello Mark, how are you?"
So mostly I am parsing text words (and punctuation, symbols, etc), except with the occasional "<<something>>" tag, which I am calling a "func" in my lexer rules.
Here is my grammar:
doc: item* EOF ;
item: (func | WORD) PUNCT? ;
func: '<<' ID '>>' ;
WS : [ \t\n\r] -> skip ;
fragment LETTER : [a-zA-Z] ;
fragment DIGIT : [0-9] ;
fragment CHAR : (LETTER | DIGIT | SYMB ) ;
WORD : CHAR+ ;
ID: LETTER ( LETTER | DIGIT)* ;
PUNCT : [.,?!] ;
fragment SYMB : ~[a-zA-Z0-9.,?! |{}<>] ;
Side note: I added "PUNCT?" at the end of the "item" rule because it is possible, such as in the example sentence I gave above, to have a comma appear right after a "func". But since you can also have a comma after a "WORD" then I decided to put the punctuation in "item" instead of in both of "func" and "WORD".
If I run this parser on the above sentence, I get a parse tree that looks like this:
So it is not recognizing the "ID" inside the double angle brackets as an "ID". Presumably this is because "WORD" comes first in my list of lexer rules. However, I have no rule that uses double angle brackets around a word (so no "<< WORD >>") and only a rule that says "<< ID >>", so I'm not clear on why that is happening.
If I swap the order of "ID" and "WORD" in my grammar (so now ID comes before WORD) and run the parser, I get a parse tree like this:
So now the "func" and "ID" rules are being handled appropriately, but none of the "WORD"s are being recognized.
So if WORD comes first, then ID's are not handled how I want, and if ID comes first, then WORD's aren't handled how I want. How do I get past this conundrum?
I suppose one option might be to change the "func" rule to "<< WORD >>" and just treat everything as words, doing away with "ID". But I wanted to differentiate a text word from a variable identifier (for instance, no special characters are allowed in a variable identifier).
Thanks for any help!
Mark
Hi Norm, thanks for your reply.
--
You received this message because you are subscribed to a topic in the Google Groups "antlr-discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antlr-discussion/9Snht-8KCAg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antlr-discussion+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.