Getting mismatched input errors and can't work out why

Dominic Finch

unread,

May 19, 2017, 10:47:41 AM5/19/17

to antlr-di...@googlegroups.com

Hello everyone,

I'm very new to using ANTLR and I've encountered some problems which I can't seem to resolve. I was hoping that I could ask some help from those with more knowledge and experience than myself. As far as I can tell, my grammar file is fine but I am getting errors during execution. There were no errors or warnings when passing my grammar file into ANTLR. I agree that my grammar file is probably poorly written right now - that is fine, I'll improve how it is written once I've solved my immediate problems.

These are the errors I am getting:

line 1:0 mismatched input 'datadef' expecting DATADEF
line 1:5 token recognition error at: '%'
line 1:0 mismatched input 'type' expecting DATADEF
line 1:8 token recognition error at: '1'
line 1:0 mismatched input 'version' expecting DATADEF
line 1:11 token recognition error at: '3'
line 1:0 mismatched input 'subversion' expecting DATADEF
line 1:0 mismatched input 'enddatadef' expecting DATADEF
line 1:0 mismatched input 'finish' expecting DATADEF
line 1:0 mismatched input '$.' expecting DATADEF

This is the contents of the file I am attempting to load into my program:

datadef
type %desi
version 1
subversion 3
enddatadef
finish
$.

And this is the current contents of my grammar file:

/*
 * Parser Rules
 */


schema: defSchema;


defSchema : datadef typedef verdef? subverdef? enddatadef finishdef eofdef EOF;


datadef: DATADEF NEWLINE;
typedef: TYPE NEWLINE;
verdef: VERSION type_int NEWLINE;
subverdef: SUBVERSION type_int NEWLINE;
enddatadef: ENDDATADEF NEWLINE;
finishdef: FINISH NEWLINE;
eofdef: FILE_TERMINATOR NEWLINE;


type: INTEGER |
      TEXT |
      INTEGERARRAY;


type_int: INTEGER;


/*
 * Lexer Rules
 */


//fragment INT : '0' | [1-9] [0-9]* ;
fragment EXP : [Ee] [+\-]? INTEGER ;
fragment ALPHA : ([a-z] | [A-Z]);


fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;


fragment TRUE : ('T' | 't') ( ('R' | 'r') ( ('U' | 'u') ('E' | 'e')?)?)?;
fragment FALSE : ('F' | 'f') (('A' | 'a') (('L' | 'l') (('S' | 's') ('E' | 'e')?)?)?)?;


NUMBER : '-'? INTEGER '.' [0-9] + EXP? | '-'? INTEGER EXP | '-'? INTEGER ;
TEXT: ('T' | 't') ('E' | 'e') ('X' | 'x') ('T' | 't');
//TEXT_LITERAL : '\'' ( '\'\'' | ~[\'\r\n] )* '\'';
INTEGER : ('I' | 'i') ('N' | 'n') ('T'|'t') ( ('E' | 'e') ( ('G' | 'g') (('E' | 'e') ('R' | 'r')?)?)?)?;
REFARRAY: ('R' | 'r') ('E' | 'e') ('F'|'f') ('A'|'a') ('R'|'r') ('R'|'r') ('A'|'a') ('Y'|'y');
INTEGERARRAY : ('I' | 'i') ('N' | 'n') ('T' | 't') ('A' | 'a') ('R' | 'r') ('R' | 'r') ('A' | 'a') ('Y' | 'y');
//INTEGER_LITERAL : (('-')? [1-9] ([0-9]*)) | '0';


LOGICAL_LITERAL : TRUE | FALSE;
WORD                : ALPHA+ ;
WHITESPACE          : (' ' | '\t')+ -> skip;
NEWLINE             : ('\r'? '\n' | '\r')+ ;


MULTILINECOMMENT : (('/*' .*? '*/') | ('$(' .*? '$)')) -> skip ;


LINECOMMENT : ('$*' ~[\r\n]* | '//' ~[\r\n]*) -> skip;


CTABLE : 'ctable' ;
DATADEF : 'datadef' ;
DYNAMIC : 'dynamic' ;
ENDDATADEF : 'enddatadef' ;
FINISH : 'finish' ;
ITABLE : 'itable' ;
REAL : 'real' ;
SUBVERSION : 'subversion' ; // INT;
TYPE : 'type' '%' WORD;
VERSION : 'version' ; // INT;
//PCT_NAME : '%'WORD;


FILE_TERMINATOR : '$.' ;

Thank you very much for any help you are able to give me.

John B. Brodie

unread,

May 19, 2017, 11:16:43 AM5/19/17

to antlr-di...@googlegroups.com, Dominic Finch

Greetings!

A quick scan of your lexer shows that your WORD rule should be moved to the bottom the the lexer section. ANTLR lexers match the longest sequence of characters possible and when there is a tie between 2 rules matching the same input, the rule appearing first in the grammar file wins. So WORD and DATADEF match the first line of the input and WORD wins.

Again, this is just from inspecting your grammar, untested.

Also you are doing to much in your TYPE lexer rule. I think you should have this instead:

typedef : type NEWLINE ;

type : TYPE '%' WORD;

TYPE : 'type';

again, untested.

Lastly you should dump out the token stream produced by your lexer before passing it to the parser. So you can see that your lexer is producing the stuff you expect.

Hope this helps....

-jbb

defSchema : datadef EOF; //typedef verdef? subverdef? enddatadef finishdef eofdef EOF;

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dominic Finch

unread,

May 22, 2017, 5:49:45 AM5/22/17

to antlr-discussion

Hello John,

Thank you for your reply. It was very helpful.

I've got some follow up questions based on what you said.

1) I've tried moving some of the lexer rules around so that the rules are applied in the correct order and I've tried redefining how some of the parser rules are defined (based on what you said about the definition of 'type) but there is still a problem although the errors are slightly different:

line 1:7 mismatched input '<EOF>' expecting NEWLINE
line 1:0 mismatched input 'type' expecting 'datadef'
line 1:10 mismatched input '<EOF>' expecting NEWLINE
line 1:8 token recognition error at: '1'
line 1:0 mismatched input 'version' expecting 'datadef'


line 1:11 token recognition error at: '3'


line 1:0 mismatched input 'subversion' expecting 'datadef'
line 1:0 mismatched input 'enddatadef' expecting 'datadef'
line 1:0 mismatched input 'finish' expecting 'datadef'
line 1:0 mismatched input '$.' expecting 'datadef'

/*
 * Parser Rules
 */


schema: defSchema;




defSchema : datadef typedef verdef? subverdef? enddatadef finishdef eofdef EOF;


datadef: DATADEF NEWLINE;
typedef: type NEWLINE;
verdef: version NEWLINE;
subverdef: subversion NEWLINE;


enddatadef: ENDDATADEF NEWLINE;
finishdef: FINISH NEWLINE;
eofdef: FILE_TERMINATOR NEWLINE;

/*
type: INTEGER |
      TEXT |
      INTEGERARRAY;
*/
type_int: INTEGER;


version : VERSION type_int;
subversion: SUBVERSION type_int;
type : TYPE '%' WORD;




/*
 * Lexer Rules
 */


//fragment INT : '0' | [1-9] [0-9]* ;
fragment EXP : [Ee] [+\-]? INTEGER ;
fragment ALPHA : ([a-z] | [A-Z]);


fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;


fragment TRUE : ('T' | 't') ( ('R' | 'r') ( ('U' | 'u') ('E' | 'e')?)?)?;
fragment FALSE : ('F' | 'f') (('A' | 'a') (('L' | 'l') (('S' | 's') ('E' | 'e')?)?)?)?;


NUMBER : '-'? INTEGER '.' [0-9] + EXP? | '-'? INTEGER EXP | '-'? INTEGER ;
TEXT: ('T' | 't') ('E' | 'e') ('X' | 'x') ('T' | 't');
//TEXT_LITERAL : '\'' ( '\'\'' | ~[\'\r\n] )* '\'';
INTEGER : ('I' | 'i') ('N' | 'n') ('T'|'t') ( ('E' | 'e') ( ('G' | 'g') (('E' | 'e') ('R' | 'r')?)?)?)?;
REFARRAY: ('R' | 'r') ('E' | 'e') ('F'|'f') ('A'|'a') ('R'|'r') ('R'|'r') ('A'|'a') ('Y'|'y');
INTEGERARRAY : ('I' | 'i') ('N' | 'n') ('T' | 't') ('A' | 'a') ('R' | 'r') ('R' | 'r') ('A' | 'a') ('Y' | 'y');
//INTEGER_LITERAL : (('-')? [1-9] ([0-9]*)) | '0';


LOGICAL_LITERAL : TRUE | FALSE;


WHITESPACE          : (' ' | '\t')+ -> skip;
NEWLINE             : ('\r'? '\n' | '\r')+ ;


MULTILINECOMMENT : (('/*' .*? '*/') | ('$(' .*? '$)')) -> skip ;


LINECOMMENT : ('$*' ~[\r\n]* | '//' ~[\r\n]*) -> skip;


CTABLE : 'ctable' ;
DATADEF : 'datadef' ;
DYNAMIC : 'dynamic' ;
ENDDATADEF : 'enddatadef' ;
FINISH : 'finish' ;
ITABLE : 'itable' ;
REAL : 'real' ;
SUBVERSION : 'subversion' ; // INT;


TYPE : 'type';
VERSION : 'version' ; // INT;
FILE_TERMINATOR : '$.' ;


WORD : ALPHA+ ;

2) When you said "you should dump the token stream produced by your lexer before passing it to the parser" - I'm not very sure how to do that.. I'm working in C# and I have tried calling CommonTokenStream.GetTokens() but that returns an empty list - this is not what I would expect and that makes me think something isn't working properly.

I have tried moving the definitions of CTABLE etc above the definition of the fragments earlier in the parser rules but this made no visible difference.

Regards

YC Chan

unread,

Mar 21, 2020, 7:09:55 AM3/21/20

to antlr-discussion

Le lundi 22 mai 2017 11:49:45 UTC+2, Dominic Finch a écrit :

Hello John,
...

2) When you said "you should dump the token stream produced by your lexer before passing it to the parser" - I'm not very sure how to do that.. I'm working in C# and I have tried calling CommonTokenStream.GetTokens() but that returns an empty list - this is not what I would expect and that makes me think something isn't working properly.

This is an old post (almost 3 years), but a very valid question. We C-sharpers are a forgotten lot in the ANTLR4 world.

So here's my code to list the tokens (ex. to the Console).

Yes indeed. To understand this error message, you need the list of recognized tokens

static void Tokens(calculatorLexer lex, CommonTokenStream cts)

{

foreach(var token in cts.GetTokens())

{

var typeInt = token.Type;

var symbolicName = lex.Vocabulary.GetSymbolicName(typeInt);

var literalName = lex.Vocabulary.GetLiteralName(typeInt);

var name = (string.IsNullOrEmpty(symbolicName)) ? literalName : symbolicName;

Console.WriteLine($"Token {name} --> {token.Text}");

}
}

It can be used as follows:

var inputStream = new AntlrInputStream(YourInputString);
var lexer = new calculatorLexer(inputStream);
var commonTokenStream = new CommonTokenStream(lexer);
var parser = new calculatorParser(commonTokenStream);
Console.WriteLine($"\r\nTest : {str}");

var ectx = parser.equation();
Console.WriteLine($"Parse tree : {ectx.ToStringTree(parser)}");

Tokens(lexer, commonTokenStream);

Enjoy !

Eric Vergnaud

unread,

Mar 21, 2020, 8:11:54 AM3/21/20

to antlr-di...@googlegroups.com

Hi,

Can you highlight how the C# error message is not as good as the Java one?

In our perception both runtimes are on par.

Eric

Envoyé de mon iPhone

Le 21 mars 2020 à 19:10, YC Chan <peter....@gmail.com> a écrit :

--

You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/d2f462a3-a2dc-4522-91e6-1c046eadadf9%40googlegroups.com.

YC Chan

unread,

Mar 23, 2020, 4:09:40 AM3/23/20

to antlr-di...@googlegroups.com

Hello, I am not comparing languages and their error messages.

I want to share with my fellow C sharpers some working code they can use in their environment.

To view this discussion on the web visit https://groups.google.com/d/msgid/antlr-discussion/481F3A14-835F-4CF1-8D84-69C9E5EC8764%40wanadoo.fr.

Reply all

Reply to author

Forward