Problem with ignoring whitespace

1,692 views
Skip to first unread message

Anup Cowkur

unread,
Sep 30, 2015, 3:41:40 AM9/30/15
to antlr-di...@googlegroups.com

I'm quite new to ANTLR and looking at other examples, I used the following lexer rule to skip whitespaces:

WHITESPACE: [ \t\r\n]-> skip;

This is my complete grammar file:

    grammar CoreQueries;

    // Parser Rules

    orExpression :  andExpression (orOp  andExpression)*;

    andExpression :  notExpression (andOp  notExpression)*;

    notExpression :  (notOp)? atom ;

    atom :  coreQuery | lp  orExpression  rp;

    coreQuery : noLabels | noDueDate | overdue | nDays | priority | all | toMe | toOthers | project | label | user ;

    orOp: '|';

    andOp: '&';

    notOp: '!';

    lp: LP;

    rp: RP;

    noLabels : NO_LABELS;

    noDueDate: NO_DUE_DATE;

    overdue : OVERDUE;

    nDays : N_DAYS;

    priority : PRIORITY;

    all : ALL;

    toMe : TO_ME;

    toOthers : TO_OTHERS;

    project : PROJECT;

    label : LABEL;

    user : USER;

    stringOrDate : STRING_OR_DATE;

    // Lexer Rules

    // Tokens

    LP : '(';

    RP : ')';

    NO_LABELS : NO LABELS;

    NO_DUE_DATE : NO   DUE  DATE;

    OVERDUE : O V E R D U E;

    N_DAYS : ANY_INTEGER_EXCEPT_ZERO  DAYS;

    PRIORITY : PRIORITY_QUERY_TITLE  ONE_TO_FOUR;

    ALL : A L L;

    TO_ME : TO  ME;

    TO_OTHERS : TO OTHERS ;

    PROJECT: PROJECT_QUERY_TITLE COLON ANY_TEXT_EXCEPT_OPERATORS_AND_PARENS | PROJECT_QUERY_TITLE COLON ANY_INTEGER | HASH ANY_TEXT_EXCEPT_OPERATORS_AND_PARENS | HASH ANY_INTEGER;

    LABEL : AT ANY_TEXT_EXCEPT_OPERATORS_AND_PARENS;

    USER : U COLON ANY_TEXT_EXCEPT_OPERATORS_AND_PARENS ;

    STRING_OR_DATE: Q COLON ANY_TEXT_EXCEPT_OPERATORS_AND_PARENS | ANY_TEXT_EXCEPT_OPERATORS_AND_PARENS;

    WHITESPACE: [ \t\r\n]-> skip;

    //Fragments

    fragment A : ('a'|'A');
    fragment B : ('b'|'B');
    fragment C : ('c'|'C');
    fragment D : ('d'|'D');
    fragment E : ('e'|'E');
    fragment F : ('f'|'F');
    fragment G : ('g'|'G');
    fragment H : ('h'|'H');
    fragment I : ('i'|'I');
    fragment J : ('j'|'J');
    fragment K : ('k'|'K');
    fragment L : ('l'|'L');
    fragment M : ('m'|'M');
    fragment N : ('n'|'N');
    fragment O : ('o'|'O');
    fragment P : ('p'|'P');
    fragment Q : ('q'|'Q');
    fragment R : ('r'|'R');
    fragment S : ('s'|'S');
    fragment T : ('t'|'T');
    fragment U : ('u'|'U');
    fragment V : ('v'|'V');
    fragment W : ('w'|'W');
    fragment X : ('x'|'X');
    fragment Y : ('y'|'Y');
    fragment Z : ('z'|'Z');

    fragment COLON : ':';

    fragment HASH : '#';

    fragment AT : '@';

    fragment PROJECT_QUERY_TITLE : P R O J E C T | P;

    fragment PRIORITY_QUERY_TITLE: P R I O R I T Y | P;

    fragment ONE_TO_FOUR : [1-4];

    fragment NO : N O;

    fragment LABELS : L A B E L S;

    fragment DUE : D U E;

    fragment DATE : D A T E;

    fragment DAYS : D A Y S;

    fragment TO : T O;

    fragment ME : M E;

    fragment OTHERS : O T H E R S;

    fragment ANY_INTEGER :('0'|'1'..'9''0'..'9'*);

    fragment ANY_INTEGER_EXCEPT_ZERO :('1'..'9''0'..'9'*);

    fragment ANY_TEXT_EXCEPT_OPERATORS_AND_PARENS: ~[<>#=&|!()]*;

I can't quite figure out why whitespaces are not being ignored here.

for example: nolabels is recognised correctly but no labels is not.

What am I doing wrong?

Norman Dunbar

unread,
Sep 30, 2015, 6:49:01 AM9/30/15
to antlr-di...@googlegroups.com

On 30/09/15 08:41, Anup Cowkur wrote:

<snip>

hopefully I won't say the wrong thing here and make a complete fool of
myself - there are people on this list who know what they are talking
about .....

It is most likely the space (whitespace) between the tokens in your
NO_LABELS and NO_DUE_DATE lexer rules. It would work fine if you changed
them rules to something like this:


NO_LABELS : NO '_' LABELS;
NO_DUE_DATE : NO '_' DUE '_' DATE;

But, working on the assumption you need spaces and not underscores .....


One option is to add a SPACE fragment and change the lexer rules as follows:


NO_LABELS : NO SPACE LABELS ;
NO_DUE_DATE: NO SPACE DUE SPACE DATE ;

fragment SPACE : ' ';


That will work fine with your lexer rules as they are. If you have any
other rules that need a space between words, the above will apply.


Also, another option is as per the following in which I've minimised
your grammar and amended the parser rules for noLabels and noDueDate -
this also works fine:

grammar cq;

// Parser Rules

noLabels : NO LABELS; // Changed - removed underscore
noDueDate: NO DUE DATE; // Changed


// Lexer Rules

NO : N O;
LABELS : L A B E L S;
DUE : D U E;
DATE : D A T E;

WHITESPACE: [ \t\r\n]-> skip;


//Fragments

fragment A : [Aa];
fragment B : [Bb];

fragment D : [Dd];
fragment E : [Ee];

fragment L : [Ll];

fragment N : [Nn];
fragment O : [Oo];

fragment S : [Ss];
fragment T : [Tt];
fragment U : [Uu];


HTH

Cheers,
Norm.

Disclaimer: I'm not a compiler writer, nor do I play one on TV.

--
Norman Dunbar
Dunbar IT Consultants Ltd

Registered address:
27a Lidget Hill
Pudsey
West Yorkshire
United Kingdom
LS28 7LG

Company Number: 05132767

Jim Idle

unread,
Sep 30, 2015, 12:11:56 PM9/30/15
to antlr-di...@googlegroups.com
The issue is that too much is trying to be done in the lexer. Just let the lexer match single tokens unless space is really significant. Don't try to enforce syntax in lexer rules. As ever, move errors and formulations as far down the tool chain as you can.

Jim



--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages