Allow whitespace in my token and use that token in my parsing rule

232 views
Skip to first unread message

shootingfrog

unread,
Feb 24, 2014, 8:45:10 AM2/24/14
to antlr-di...@googlegroups.com
I am into a serious problem here and I am fairly new to ANTLR. I have a token SECATTR and two parsing rules singlerule and expr . I have got lot of cases to parse in my final parsing rule expr such as:

X.Y> 10
A
.X>X.Y
COUNT
(A.B)>10
COUNT
(A.B)>  AVG(X.Y)
(x.y like foo) AND (a.b notlike bar)


((x.y like foo) AND (p.q notlike bar))corresponding
((i.j like foo) AND (k.p notlike bar)) OR (q.p like bar)
((e.f like foo) AND (t.s notlike bar)) OR (a.b like something) AND (f.g notlike foobar)



and so on.. Now in place of a.b/x.y/p.q which I need as token SECATTR should support  forms like : 
a.b
a a
.b c
a a_a
.b b_c
aa aa_aa aa
.aa aa_aa aa


I have made whitespace to hidden channel :
 WS       :           ('\t'|'\f'|'\n'|'\r'|' ')+{ $channel=HIDDEN; };

Problem here is the space I need to allow in my token SECATTR is messing up with the parsing rules singlerule and expr . For example singlerule like 
COUNT(aa aa_aa aa.aa aa_aa aa)>10
works if it doesn't have a space between COUNT and '(' else it throws an error NoViableAltException:line 1:30 no viable alternative at input ')'.
As a result of this chain reaction even the various rules of expr doesn't work as I have mentioned above.Please suggest me a way out how can I accommodate space in my token SECATTR and simultaneously get it tokenized without a problem in my rules singlerule and expr.
Here is my grammar :

grammar Test;


options
{
  language
= Java;
}




fragment DIVIDE
: '/';
fragment PLUS
: '+';
fragment MINUS
: '-';
fragment STAR
: '*';
fragment MOD
: '%';
LPAREN
: '(';
RPAREN
: ')';
fragment COMMA
: ',';
fragment COLON
: ':';
fragment LANGLEBRACKET
: '<';
fragment RANGLEBRACKET
: '>';
fragment EQ
: '=';
fragment NOT
: '!';
fragment UNDERSCORE
: '_';
fragment DOT
: '.';
fragment GRTRTHANEQTO
: RANGLEBRACKET EQ;
fragment LESSTHANEQTO
: LANGLEBRACKET EQ;
fragment NOTEQ      
: NOT EQ;


WS      
:           ('\t'|'\f'|'\n'|'\r'|' ')+{ $channel=HIDDEN; };


fragment A
:('a'|'A');
fragment B
:('b'|'B');
fragment C
:('c'|'C');
fragment D
:('d'|'D');
fragment E
:('e'|'E');
fragment F
:('f'|'F');
fragment G
:('g'|'G');
fragment H
:('h'|'H');
fragment I
:('i'|'I');
fragment J
:('j'|'J');
fragment K
:('k'|'K');
fragment L
:('l'|'L');
fragment M
:('m'|'M');
fragment N
:('n'|'N');
fragment O
:('o'|'O');
fragment P
:('p'|'P');
fragment Q
:('q'|'Q');
fragment R
:('r'|'R');
fragment S
:('s'|'S');
fragment T
:('t'|'T');
fragment U
:('u'|'U');
fragment V
:('v'|'V');
fragment W
:('w'|'W');
fragment X
:('x'|'X');
fragment Y
:('y'|'Y');
fragment Z
:('z'|'Z');


fragment
Space : ' '+;



OP1  
: (C O U N T | A V G | C O U N T D I S T I N C T | C A S T) ;
     
OP2  
: DIVIDE|PLUS|MINUS|STAR|MOD
     
|LANGLEBRACKET|RANGLEBRACKET|EQ|GRTRTHANEQTO|LESSTHANEQTO|NOTEQ
     
|E Q U A L S | L I K E | N O T E Q U A L S | N O T L I K E | N O T N U L L;
     
 OP3  
: ((C O R R E S P O N D I N G | A N Y)|I);
OP4  
: (A N D | O R);


 DIGIT    
:    ('0'..'9')+;
fragment
Letter   : ('a'..'z' | 'A'..'Z')+;


SECATTR  
: Letter DOT Letter
         
| Letter Space Letter DOT Letter Space Letter
         
| Letter Space Letter UNDERSCORE Letter  DOT Letter Space Letter UNDERSCORE Letter
         
| Letter Space Letter UNDERSCORE Letter Space Letter DOT Letter Space Letter UNDERSCORE Letter Space Letter
;
singlerule    
: SECATTR  OP2 (DIGIT|Letter)
             
| OP1 LPAREN SECATTR RPAREN OP2 (DIGIT|Letter)
             
| SECATTR OP2 SECATTR
             
| OP1 LPAREN SECATTR RPAREN  OP2 OP1 LPAREN SECATTR RPAREN
;
expr    
:((LPAREN? singlerule RPAREN?) OP4?)+
       
|((LPAREN (LPAREN singlerule RPAREN) OP4 (LPAREN singlerule RPAREN) RPAREN)+ (OP4 (LPAREN? singlerule RPAREN?))+ OP4?)+
       
| (LPAREN (LPAREN singlerule RPAREN) OP4 (LPAREN singlerule RPAREN) RPAREN OP3)+;


Mike (IMAP)

unread,
Feb 24, 2014, 9:29:01 AM2/24/14
to ANTLR List
Have you tried moving the WS lexer rule below your SECATTR rule?  The lexer uses the first rule that it matches.

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Message has been deleted

shootingfrog

unread,
Feb 24, 2014, 9:35:56 AM2/24/14
to antlr-di...@googlegroups.com

I tried it .. The problem still persists ..

Jim Idle

unread,
Feb 24, 2014, 10:46:30 PM2/24/14
to antlr-di...@googlegroups.com
I think that you are trying to do way too much in the Lexer. Try moving most of those rules in to the parser.

Jim


On Mon, Feb 24, 2014 at 10:35 PM, shootingfrog <harsh...@gmail.com> wrote:

I tried it .. The problem still persists ..

Have you tried moving the WS lexer rule below your SECATTR rule?  The lexer uses the first rule that it matches.

shootingfrog

unread,
Feb 25, 2014, 12:43:42 AM2/25/14
to antlr-di...@googlegroups.com
Only thing I really need tokenized is SECATTR .Apart from that rest all are parser rules . Is there a way where I can get SECATTR to be tokenized through some parser rule ? Please suggest.
Reply all
Reply to author
Forward
0 new messages