Unexpected parsing rule precedence.

15 views

antlr4parserprecedence

Skip to first unread message

Niels Basjes

unread,

Jan 14, 2016, 11:02:11 AM1/14/16

to antlr-discussion

Hi,

Context: I'm trying to create a parser that is able to parse useragents (and I'm learning ANTLR4 while doing this).

I ran into a scenario where I want to recognize 'key value' patterns.

Things like foo=bar and foo=123

I also want to match this where the value is empty ( foo= )

Unfortunately I ran into a situation where I simply do not understand why the parser selects the second alternative in my rule instead of the first one.

The input I'm testing with is:

One Key=Value(foo) Two Key=Value2 (foo) Three Key=(foo)

The problem I have is that the second pair (Key=Value2) gets matched incorrectly. (see screenshot)

What happens seems to be that it is matched as the combination of "second alternative of 'keyValue'" in combination with a 'version'.

I understood that the precedence in ambiguous situations is handled by the order of the rules in the grammar.

So I expected the system to pick the first alternative "key with a value".

I've rechecked the ANTLR4 book on this matter yet I still do not understand why it is matching it this way.

Can someone please explain (or point me to documentation) so I can understand why it does this?

Thanks.

Niels Basjes

grammar Test;

//=============================================
// One Key=Value (foo) Two Key=Value2 (foo) Three Key= (foo)

//=============================================
SPACE       :        (' '|'\t'|'+')+;
BRACEOPEN   :        '('                 ;
BRACECLOSE  :        ')'                 ;
COLON       :        ':'                 ;
SEMICOLON   :        ';'                 ;
SLASH       :        '/'                 ;
EQUALS      :        '='                 ;
MINUS       :        '-'                 ;

fragment NORMAL_LETTERS_NO_DIGITS : ~[\+\;\{\}\(\)\/\ \t\:\=\[\]\"0-9] ;
fragment NORMAL_LETTERS_AND_DIGITS: ~[\+\;\{\}\(\)\/\ \t\:\=\[\]\"] ;

// A WORD with at least 1 number in it (and that can contain a '-').
VERSION     : (NORMAL_LETTERS_NO_DIGITS)*[0-9]+(NORMAL_LETTERS_AND_DIGITS)*;

// A WORD without any numbers
WORD        : NORMAL_LETTERS_NO_DIGITS+  ;

//=============================================

input   : product (SPACE* product)*;

product     : productName
                ( SPACE*               (keyValue|version) )*
                ( SPACE* SLASH? SPACE* (keyValue|version) )
                ( SPACE* SLASH  SPACE* (keyValue|version) )*
                  SPACE* comment
            ;

productName :
            WORD (SPACE WORD)*;

productVersion:
            keyValue|VERSION;

comment     : BRACEOPEN SPACE* SEMICOLON* words (SPACE* SEMICOLON+ SPACE* words)* SPACE* SEMICOLON* SPACE* BRACECLOSE
            ;

keyValue    : key=keyName ((COLON|EQUALS)+ ( words | version ) )+
            | key=keyName  (COLON|EQUALS)+
            ;

keyName     :
            WORD|VERSION;

version:
            VERSION;

words:
            WORD (SPACE WORD)*;

//=============================================