Resolve ambiguity on a bad formatted file

44 views
Skip to first unread message

GMG

unread,
Oct 29, 2013, 12:54:38 PM10/29/13
to antlr-di...@googlegroups.com
I have a TXT file that I can't modify, attached  there is a brief 2 lines of my conflict that I don't know how to resolve. I'm using ANTLR V4

STRINGS ABC DEF GHI LMN OPQ
DEF A B C D

STRINGS  is the name of my array of string followed by all string array that finish when is present \r\n
DEF is the name of my array of def_strings followed by all string array that finish when is present \r\n
The 1st word is always the array name, the others strings are the array elements

The following grammar seems to be correct, but in this situation there is an ambiguity 

grammar xxx;

rule : (rule1 | rule2)+;

rule1 : ID1 (WS+ STRING)* CRLF;
rule2 : ID2 (WS+ STRING)* CRLF;

ID1 : 'STRINGS';
STRING : [A-Za-z0-9]+;
ID2 : 'DEF';
WS : [ ];
CRLF : '\r\n';

The output of TestRig

Arguments: [xxx, rule, -tokens, -tree, -gui, C:\Users\Gian\Desktop\sample.txt]
[@0,0:6='STRINGS',<1>,1:0]
[@1,7:7=' ',<4>,1:7]
[@2,8:10='ABC',<3>,1:8]
[@3,11:11=' ',<4>,1:11]
[@4,12:14='DEF',<2>,1:12]
[@5,15:15=' ',<4>,1:15]
[@6,16:18='GHI',<3>,1:16]
[@7,19:19=' ',<4>,1:19]
[@8,20:22='LMN',<3>,1:20]
[@9,23:23=' ',<4>,1:23]
[@10,24:26='OPQ',<3>,1:24]
[@11,27:28='\r\n',<5>,1:27]
[@12,29:31='DEF',<2>,2:0]
[@13,32:32=' ',<4>,2:3]
[@14,33:33='A',<3>,2:4]
[@15,34:34=' ',<4>,2:5]
[@16,35:35='B',<3>,2:6]
[@17,36:36=' ',<4>,2:7]
[@18,37:37='C',<3>,2:8]
[@19,38:38=' ',<4>,2:9]
[@20,39:39='D',<3>,2:10]
[@21,40:41='\r\n',<5>,2:11]
[@22,42:41='<EOF>',<-1>,3:13]
line 1:12 extraneous input 'DEF' expecting {STRING, WS}
(rule (rule1 STRINGS   ABC  ) (rule2 DEF   GHI   LMN   OPQ \r\n) (rule2 DEF   A   B   C   D \r\n))

if I swap STRING and ID2 order the new error is

line 2:0 extraneous input 'DEF' expecting {<EOF>, 'STRINGS', 'DEF'}
(rule (rule1 STRINGS   ABC   DEF   GHI   LMN   OPQ \r\n) DEF   A   B   C   D \r\n)

Attached there are the example files
Best regards
GMG
sample.txt
xxx.g4

Greg D

unread,
Oct 29, 2013, 5:28:03 PM10/29/13
to
Your grammar xxx.g4 threw exceptions when I tried to process it with Antlr 4.1. Are you sure your getting the latest version?

To your question:
The book by Terence Parr, "The Definitive ANTLR 4 Reference", chapter 12.2 pp. 209-211 discusses the solution to your specific problem in a subchapter titled "Treating Keywords As Identifiers".

While the tokens are not set as you would wish them, the changes below get you past your current level of problem:

grammar Xxx;

rulen
: (rule1 | rule2 | CRLF)+ EOF;

rule1
: ID1 (WS+ (STRING | ID1 | ID2 ) )* CRLF;
rule2
: ID2 (WS+ (STRING | ID1 | ID2 ) )* CRLF;

ID1
: 'STRINGS';
ID2
: 'DEF';
STRING
: [A-Za-z0-9]+;
WS
: [ ];
CRLF
: '\r'? '\n';



GMG

unread,
Oct 29, 2013, 7:40:46 PM10/29/13
to
Tank you very much for your answer, I have noticed that you have changed "rule" in "rulen", I have tested my grammar with ANTLRWORKS2 so I don't have noticed that "rule" is in conflict with Java. I have thinked the same your solution but I think is not an elegant solution (and maybe I think is not a correct way to solve the problem) . If I have 1000 IDs is crazy to think to put as follow

rule1 : ID1 (WS+ (STRING | ID1 | ID2 ... ID1000) )* CRLF;
...
...
...
rule1000 : ID2 (WS+ (STRING | ID1 | ID2 ... ID1000) )* CRLF;

I appreciate your solution, but I want to think that there is a better way to solve the problem.
Attached there are the corrected files.
sample.txt
xxx.g4

Greg D

unread,
Oct 29, 2013, 8:38:17 PM10/29/13
to antlr-di...@googlegroups.com

 
The important part of the answer was:
The book by Terence Parr, "The Definitive ANTLR 4 Reference", chapter 12.2 pp. 209-211 discusses the solution to your specific problem in a subchapter titled "Treating Keywords As Identifiers".

Also, the latest version of ANTLRWorks 2 is using Antlr 4.0, not Antlr 4.1. That may account for some of the changes I needed.

Jim Idle

unread,
Oct 29, 2013, 9:41:22 PM10/29/13
to antlr-di...@googlegroups.com
Well, your token definitions are ambiguous and the lexer does not get told by the parser to pick one or the other. Also, I think that this format is too simple for an ANTLR parser, you can just tokenize it directly. But if you feel you must use ANTLR then:

leadin: NL? arrays EOF;
arrays: array* ;
array: ID ID* NL;

ID: [A-Za-z0-9]+;
WS : " \t"+ ->skip();
CR: "\r"+->skip()
NL : '\n'+;

Jim



--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Onur

unread,
Oct 30, 2013, 3:46:43 AM10/30/13
to antlr-di...@googlegroups.com
grammar xxx;

rule : (rule1 | rule2)+;

rule1 : ID1 (WS+ identifier )* CRLF;
rule2 : ID2 (WS+ identifier )* CRLF;

identifier : STRING | ID1 | ID2; // so "DEF" is also matched as element of a a list

ID1 : 'STRINGS';
STRING : [A-Za-z0-9]+;
ID2 : 'DEF';
WS : [ ];
CRLF : '\r\n';

Jim Idle

unread,
Oct 30, 2013, 4:39:57 AM10/30/13
to antlr-di...@googlegroups.com
Did you not receive my previous reply? You only need an ID lexer rule. 
--

Onur

unread,
Oct 30, 2013, 4:50:28 AM10/30/13
to antlr-di...@googlegroups.com
I like my grammars to be "strongly typed", i.e. I don't have to separate the "array" rule in the "STRINGS" and "DEF" case.
Also I dislike inspecting a token if it has special value. I'd rather combine tokens that may be a reserved word or an identifier depending on position to be emitted as the reserved word (here STRINGS or DEF) token and allow identifiers to be either ID or the reserved word .
But if the number of "types" are really large (as stated in a later post) your approach is preferable.

Terence Parr

unread,
Oct 30, 2013, 10:04:02 AM10/30/13
to antlr-di...@googlegroups.com, Sam Harwell
I thought AW2 had 4.1 in it. Try downloading again.
T

Greg D

unread,
Oct 30, 2013, 10:47:27 AM10/30/13
to
My mistake about AW2.
  • the netbeans plugin uses 4.0
  • the stand-alone app uses 4.1
    • the one time I used the standalone on CentOs, it worked less than a day before it thrashed and crashed the JVM

As a result, I debug from the bash shell command line.

Reply all
Reply to author
Forward
0 new messages