Token Recognition Error

1,066 views
Skip to first unread message

Andi

unread,
Aug 26, 2014, 9:00:46 AM8/26/14
to antlr-di...@googlegroups.com
Hello,

in a project I need to talk a proprietary protocol and since there was no documentation for it, I started to write an EBNF for it. I then discovered that with such ANTLR is able to generate a lexer and parser. Awesome! Now playing around with it (Version 4.3) I got stuck in recognizing strings. Here's my grammar:

grammar V3;


message
: response | request;


request
: command '(' ( param (',' param)* ) ')';
param
: astring | number (numberwithzero*);
command
: 'command_1' | 'command_2' | 'command_3';


response
: returncode (responsedata);
returncode
: numberwithzero;
number
: ('-') '1' | '2' | '3' | '4' | '5'| '6' | '7' | '8' | '9';
numberwithzero
: number | '0';
realnumber
: ('-') numberwithzero* '.' numberwithzero*;
realnumberenotation
: realnumber (('e' | 'E') realnumber );
responsedata
: responseparam (delimiter responseparam)*;
delimiter
: (' '*)','(' '*);
responseparam
: astring | numberwithzero*;
astring
:  '"' ('""' | ~'"')* '"';

I then used a small program to test against some strings (MyVistior just prints out the message and its params):

String msg1 = "0, \"Hola\", \"!~^?%&ç*\"";
String msg2 = "-1, \"745\"   ,\"133\"";

ANTLRInputStream input = new ANTLRInputStream(msg1);
V3Lexer lexer
= new V3Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
V3Parser parser
= new V3Parser(tokens);
parser
.message().accept(new MyVisitor());



String ms2 works fine and I get the params:

Working on message: -1, "745"   ,"133"
92   [main] DEBUG client.protocol.TestApp  - Visit message: -1, "745"   ,"133"
93   [main] DEBUG client.protocol.TestApp  - Visit response param: 
93   [main] DEBUG client.protocol.TestApp  - Visit response param: "745"
93   [main] DEBUG client.protocol.TestApp  - Visit response param: "133"


Running it with msg1 I get the token recognition error:
Working on message: 0, "Hola", "!~^?%&ç*"
line 1:4 token recognition error at: 'H'
line 1:5 token recognition error at: 'o'
line 1:6 token recognition error at: 'l'
...


Does anyone have some ideas or pointers?

Regards,
Andi

John B. Brodie

unread,
Aug 26, 2014, 8:51:28 PM8/26/14
to antlr-di...@googlegroups.com, andi.b...@gmail.com
Greetings!

I believe your problem is the failure to specify the role of letters in your grammar.

You have not specified that the letters (e.g. H o l a etc) are valid tokens. The letters do not appear ANYWHERE in your grammar. Thus any letter is rejected as invalid input, as you have observed.

I am also worried that your rules for astring and the various numeric forms are all Parser rules. I suggest you change them to Lexer rules (by capitalizing them - ASTRING or Astring or AString or whatever case pleases you). Looking at some of the examples from Dr. Parr's book might help here.

Hope this helps....
   -jbb
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jim Idle

unread,
Aug 26, 2014, 10:40:27 PM8/26/14
to antlr-di...@googlegroups.com, andi.b...@gmail.com
As John points out. Your issue is that you need lexer tokens and not parser rules for astring. I would also advise you to use lexer tokens for keywords like 'command_1' and not place these in the parser. All your rules dealing with numbers should also be lexer rules and you should not try to separate out numberwith0 - just check the text when you get a generic NUMBER token. 

There are issues too, but I suspect that unless this is the entire grammar, that the thing you are trying to parse will eventually exhibit ambiguities when you want the same pattern to be two different things t the parser. Then remember to share the token - in some cases you cannot though and need lexer modes, and in some cases ANTLR isn't really the tool to use; basically, you should buy the ANTLR4 book as it will save you lots of time trying to do this for a relatively small outlay. 

Jim


Andi

unread,
Aug 27, 2014, 2:09:46 AM8/27/14
to antlr-di...@googlegroups.com, andi.b...@gmail.com
Dear John and Jim,

thanks a lot for your detailed explanation and responding so quickly! This helped a lot! Time to read some books....

Regards,
Andi

Reply all
Reply to author
Forward
0 new messages