trying to parse test that can be quoted or not quoted.

12 views
Skip to first unread message

david radley

unread,
Aug 20, 2015, 1:30:46 PM8/20/15
to antlr-discussion
I have created an antlr 4 grammar with :

fragment SINGLE_QUOTE_ :    '\'' ;
fragment DOUBLE_QUOTE_ :    '"' ;   // this is a single , double then single . I have tried escaping the double

TEXT_STRING     : ('a'..'z' | 'A'..'Z' |  '0'..'9')+  ;
VAL_STRING : TEXT_STRING | SINGLE_QUOTE_ TEXT_STRING SINGLE_QUOTE_ | DOUBLE_QUOTE_ TEXT_STRING DOUBLE_QUOTE_ ;

value_name         : name=VAL_STRING ;

The antlr parser does not parse the input when it is quoted or not quoted.

I am using v 4.4.under eclipse.

The parser generates :
{
            setState(72); ((Value_nameContext)_localctx).name = match(VAL_STRING);
            }
        }
        catch (RecognitionException re) {
            _localctx.exception = re;

and throws the invalid input exception. any thoughts ?
David
 



Jim Idle

unread,
Aug 20, 2015, 9:37:13 PM8/20/15
to antlr-di...@googlegroups.com
Your VAL_STRING duplicates the TEXT_STRING token. Take it out of the alts for VAL_STRING and allow VAL_STRING and TEXT_STRING in your parser rule instead.

However, you are probably over complicating it - you don't need fragments for the quote marks. I would just do this:

DSTRING: '"' ('a'..'z' | 'A'..'Z' |  '0'..'9')+  '"' ;
SSTRING: '\'' ('a'..'z' | 'A'..'Z' |  '0'..'9')+  '\'';
TSTRING: ('a'..'z' | 'A'..'Z' |  '0'..'9')+ ;

value: name=(DSTRING | TSTRING | SSTRING) EOF ;

Assuming that your value strings can only have that range of characters - but it would be better to allow all characters then versify the value semantically later down the line, otherwise your lexer will just complain about bad characters and it isn't easy for users to read such an error.

Also, you don't need to ignore whitespace? 

JIm

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

david radley

unread,
Aug 21, 2015, 6:47:11 AM8/21/15
to antlr-discussion
Hi Jim,
Thanks for your quick reply :-)

I tried what you suggested but I get no viable alternative. for text or either of the quoted texts I pass in....

The only thing that seems to work is  a hack like this:
DOUBLE : '"' .*? '"' -> type(TEXT_STRING) ;
SINGLE : '\'' .*? '\'' -> type(TEXT_STRING) ;
VAL_STRING : TEXT_STRING | SINGLE | DOUBLE ;

varname : name=TEXT_STRING;

TEXT_STRING     : ('a'..'z' | 'A'..'Z' |  '0'..'9')+  ;

But I think all my strings now accept the quoted form, which I need to police out in the Parser code.

I suspect this is an Antlr bug at the version I am at (4.4). As this should work as you say.

Jim Idle

unread,
Aug 21, 2015, 7:04:26 AM8/21/15
to antlr-discussion
You have not typed in what I sent in my email. You just have 

name=TEXT_STRING

and none of the other alts. There is no bug - it is just that your grammar is incorrect. 

You may find it useful to buy a copy of the book and work through the examples. 


david radley

unread,
Aug 21, 2015, 9:46:07 AM8/21/15
to antlr-discussion
Hi Jim,
got it - your way was failing for me but I already had another rule  with the body of TSTRING: ('a'..'z' | 'A'..'Z' |  '0'..'9')+ ; I deleted the duplicate and it works based on your grammar.
 thank you very much,     all the best, David.
Reply all
Reply to author
Forward
0 new messages