Hi,
Context: I'm trying to create a parser that is able to parse useragents (and I'm learning ANTLR4 while doing this).
I ran into a scenario where I want to recognize 'key value' patterns.
Things like foo=bar and foo=123
I also want to match this where the value is empty ( foo= )
Unfortunately I ran into a situation where I simply do not understand why the parser selects the second alternative in my rule instead of the first one.
The input I'm testing with is:
One Key=Value(foo) Two Key=Value2 (foo) Three Key=(foo)
The problem I have is that the second pair (Key=Value2) gets matched incorrectly. (see screenshot)
What happens seems to be that it is matched as the combination of "second alternative of 'keyValue'" in combination with a 'version'.
I understood that the precedence in ambiguous situations is handled by the order of the rules in the grammar.
So I expected the system to pick the first alternative "key with a value".
I've rechecked the ANTLR4 book on this matter yet I still do not understand why it is matching it this way.
Can someone please explain (or point me to documentation) so I can understand why it does this?
Thanks.
Niels Basjes
grammar Test;
//=============================================
// One Key=Value (foo) Two Key=Value2 (foo) Three Key= (foo)
//=============================================
SPACE : (' '|'\t'|'+')+;
BRACEOPEN : '(' ;
BRACECLOSE : ')' ;
COLON : ':' ;
SEMICOLON : ';' ;
SLASH : '/' ;
EQUALS : '=' ;
MINUS : '-' ;
fragment NORMAL_LETTERS_NO_DIGITS : ~[\+\;\{\}\(\)\/\ \t\:\=\[\]\"0-9] ;
fragment NORMAL_LETTERS_AND_DIGITS: ~[\+\;\{\}\(\)\/\ \t\:\=\[\]\"] ;
// A WORD with at least 1 number in it (and that can contain a '-').
VERSION : (NORMAL_LETTERS_NO_DIGITS)*[0-9]+(NORMAL_LETTERS_AND_DIGITS)*;
// A WORD without any numbers
WORD : NORMAL_LETTERS_NO_DIGITS+ ;
//=============================================
input : product (SPACE* product)*;
product : productName
( SPACE* (keyValue|version) )*
( SPACE* SLASH? SPACE* (keyValue|version) )
( SPACE* SLASH SPACE* (keyValue|version) )*
SPACE* comment
;
productName :
WORD (SPACE WORD)*;
productVersion:
keyValue|VERSION;
comment : BRACEOPEN SPACE* SEMICOLON* words (SPACE* SEMICOLON+ SPACE* words)* SPACE* SEMICOLON* SPACE* BRACECLOSE
;
keyValue : key=keyName ((COLON|EQUALS)+ ( words | version ) )+
| key=keyName (COLON|EQUALS)+
;
keyName :
WORD|VERSION;
version:
VERSION;
words:
WORD (SPACE WORD)*;
//=============================================