Unexpected parsing rule precedence.

15 views
Skip to first unread message

Niels Basjes

unread,
Jan 14, 2016, 11:02:11 AM1/14/16
to antlr-discussion
Hi,

Context: I'm trying to create a parser that is able to parse useragents (and I'm learning ANTLR4 while doing this).
I ran into a scenario where I want to recognize 'key value' patterns.
Things like foo=bar and foo=123
I also want to match this where the value is empty ( foo= )

Unfortunately I ran into a situation where I simply do not understand why the parser selects the second alternative in my rule instead of the first one.

The input I'm testing with is:

One Key=Value(foo) Two Key=Value2 (foo) Three Key=(foo)

The problem I have is that the second pair (Key=Value2) gets matched incorrectly.  (see screenshot)

What happens seems to be that it is matched as the combination of "second alternative of 'keyValue'" in combination with a 'version'.

I understood that the precedence in ambiguous situations is handled by the order of the rules in the grammar. 
So I expected the system to pick the first alternative "key with a value".

I've rechecked the ANTLR4 book on this matter yet I still do not understand why it is matching it this way.
Can someone please explain (or point me to documentation) so I can understand why it does this?

Thanks.

Niels Basjes



grammar Test;

//=============================================
// One Key=Value (foo) Two Key=Value2 (foo) Three Key= (foo)

//=============================================
SPACE : (' '|'\t'|'+')+;
BRACEOPEN : '(' ;
BRACECLOSE : ')' ;
COLON : ':' ;
SEMICOLON : ';' ;
SLASH : '/' ;
EQUALS : '=' ;
MINUS : '-' ;

fragment NORMAL_LETTERS_NO_DIGITS : ~[\+\;\{\}\(\)\/\ \t\:\=\[\]\"0-9] ;
fragment NORMAL_LETTERS_AND_DIGITS: ~[\+\;\{\}\(\)\/\ \t\:\=\[\]\"] ;

// A WORD with at least 1 number in it (and that can contain a '-').
VERSION : (NORMAL_LETTERS_NO_DIGITS)*[0-9]+(NORMAL_LETTERS_AND_DIGITS)*;

// A WORD without any numbers
WORD : NORMAL_LETTERS_NO_DIGITS+ ;

//=============================================

input : product (SPACE* product)*;

product : productName
( SPACE* (keyValue|version) )*
( SPACE* SLASH? SPACE* (keyValue|version) )
( SPACE* SLASH SPACE* (keyValue|version) )*
SPACE* comment
;

productName :
WORD (SPACE WORD)*;

productVersion:
keyValue|VERSION;

comment : BRACEOPEN SPACE* SEMICOLON* words (SPACE* SEMICOLON+ SPACE* words)* SPACE* SEMICOLON* SPACE* BRACECLOSE
;

keyValue : key=keyName ((COLON|EQUALS)+ ( words | version ) )+
| key=keyName (COLON|EQUALS)+
;

keyName :
WORD|VERSION;

version:
VERSION;

words:
WORD (SPACE WORD)*;

//=============================================
Parsing-problem.png
Reply all
Reply to author
Forward
0 new messages