Peculiar ANTLR bug? (serious blocker for me)

39 views
Skip to first unread message

edgar hoover

unread,
Oct 7, 2020, 3:58:20 PM10/7/20
to antlr-discussion
Hi all,
I had some very strange behaviour where antlr seemed unable to recognise an identifier properly.
The following replicates this, sort of.    It's certainly peculiar to me but maybe it's my mistake somehow. 

Grammar:

grammar LDB
;

import LDBGeneratedLex;

start_parse returns
[SelectListItem sli] :
        select_list_item EOF
   
;


select_list_item returns
[SelectListItem sli] :
        ASTERISK
   
|
        regular_ident
   
;

ASTERISK              
: '*' ;


//RI : [a-zA-Z]+ ;  // it's this - uncomment to break


fragment HWS
: [ \t] ; // horizontal whitespaces
fragment
ALLWSes   : [ \t\r\n]+ ;

SKIPWS
: ALLWSes -> skip ;



Lexer:

lexer grammar MSSQLLexer;

regular_ident
:
        REGULAR_IDENT
   
;

REGULAR_IDENT
:
       
'ddd'  
   
;




Input: (literally just this)

ddd


Expected output, and you actually get it (I've also asked to dump the parse tree):


parse completed.
( start_parse
  ( select_list_item
    ( regular_ident
      ( DEFAULT_TOKEN_CHANNEL i=0 txt=ddd tt=3
  ) ) )
  ( DEFAULT_TOKEN_CHANNEL i=1 txt=<EOF> tt=-1
) )



But if you go back to the grammar and uncomment the rule RI and re-run, you get this:


line 1:0 mismatched input 'ddd' expecting {'*', 'ddd'}
Parse error line/col 1/0, expecting ASTERISK, REGULAR_IDENT
error in parse.
( start_parse
  ( select_list_item
    ( DEFAULT_TOKEN_CHANNEL i=0 txt=ddd tt=2
  ) )
  ( DEFAULT_TOKEN_CHANNEL i=1 txt=<EOF> tt=-1
) )



I totally don't get this at all, and as I am pretty sure I had the grammar working when I wrote it a few years ago, I wonder if something has changed.

(I used "ddd" as an identifier instead of the usual [a-zA-Z]+ just to strip things down as far as possible).

Thoughts welcome.  As before, it may be my mistake somehow.

thanks

jan

John B Brodie

unread,
Oct 7, 2020, 4:06:40 PM10/7/20
to antlr-di...@googlegroups.com

i don't think parser rules contained within a Lexer grammar included when you import the Lexer grammar.

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/91fa6196-7ecb-4c78-8fb3-8b48817dca8eo%40googlegroups.com.

edgar hoover

unread,
Oct 7, 2020, 5:05:29 PM10/7/20
to antlr-discussion
I can't argue with the good sense of that, I should separate them properly and I will first thing tomorrow, but that doesn't seem to explain what's happening here.

thanks

jan
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-di...@googlegroups.com.

John B Brodie

unread,
Oct 7, 2020, 5:40:00 PM10/7/20
to antlr-di...@googlegroups.com

Greetings!

Sorry for not fully reading your earlier question and just jumping to a conclusion...

I have very little experience with imported Lexers, so I am probably off-base here, anyway...

when you un-comment RI, you have an ambiguous Lexer.

whenever 2 lexer rules match EXACTLY the same input sequence, ANTLR will resolve this ambiguity by using the lexer rule that appears first in the grammar definition. RI and REGULAR_IDENT match "ddd" perfectly and (apparently) RI wins. Generally it is best to put lexer rules with repetition last.

dump the type of the input tokens and not just their text in order to check me on this. there should be a *.tokens file in the same directory as the generated lexer that gives the type numbers, and i think grun --tokens dumps the token stream with text and type, afaicr... I don't know what the tt field is in your dump, maybe Token Type? if so the *.tokens file will help.

One intuition might be that since the import statement appears first in the grammar then its rules would be "first" but evidently that is not the case...

Hope this helps

   -jbb

To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/c4968d33-510c-4927-8a21-59947a166c3ao%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages