QueryString not parsing correctly (Unable to consume all input)

Steve Welborn

unread,

Jun 5, 2016, 11:35:00 PM6/5/16

to antlr-discussion

I am working on an issue where we have a querystring with a filter ($filter=name like test-import) that only seems to return the first portion of the criteria ('test'). I am new to Antlr so I am going through how another team has implemented it and need some direction if possible.

Here is the grammer that I am using:

grammar FilterParser;
    options {
 language = CSharp2;
 output=AST;
 }


    @parser::namespace {Namespace.Text}
    @lexer::namespace {Namespace.Text}


    @members {public static string[] TokenNameList { get { return tokenNames; } }}


 public root : filter EOF;
 filter
 : subFilter ( (AND | OR)^  subFilter )*
 ;
 subFilter
 :
 (NOT^)? predicate
 ;
 predicate
 :
 group
 | identifier comparisonOperator^ expression
 | identifier^ IS (NOT)? NULL
 | identifier^ (NOT)? (
 LIKE expression // only single char
  | BETWEEN expression AND expression
  | IN LPAREN expression (COMMA expression)* RPAREN
  )
 | INTEGER EQ INTEGER
 ;
 group : LPAREN filter RPAREN;
 expression    : constant | identifier;


 stringLiteral
 :
  UnicodeStringLiteral
 | ASCIIStringLiteral
 ;


 identifier
 :
  NonQuotedIdentifier
 | QuotedIdentifier
 ;


 constant
 : Iso8601DateTime | Currency | boolean | NULL | number |stringLiteral | GUID
 ;


 comparisonOperator
 :
  EQ | NEQ | LTE | LT | GTE| GT
 ;
 
 number : INTEGER | REAL | HexLiteral;


 boolean : TRUE | FALSE;


 A : 'a'|'A';
 B : 'b'|'B';
 D : 'd'|'D';
 E : 'e'|'E';
 F : 'f'|'F';
 G : 'g'|'G';
 H : 'h'|'H';
 I : 'i'|'I';
 K : 'k'|'K';
 L : 'l'|'L';
 M : 'm'|'M';
 N : 'n'|'N';
 O : 'o'|'O';
 P : 'p'|'P';
 Q : 'q'|'Q';
 R : 'r'|'R';
 S : 's'|'S';
 T : 't'|'T';
 U : 'u'|'U';
 W : 'w'|'W';
 X : 'x'|'X';
 Y : 'y'|'Y';
 TRUE : T R U E;
 FALSE : F A L S E;
 BETWEEN : B E T W E E N;
 LIKE : L I K E;
 NULL : N U L L;
 EQ : E Q|'=';
 AND : A N D | '&&';
 OR : O R | '||';
 NOT : N O T | '!';
 IS : I S;
 GT : '>'|G T;
 GTE : '>='|'!>'|G T E;
 LT : '<'|L T;
 LTE : '<='|'!<'|L T E;
 NEQ : '<>'|'!='|N E Q;
 IN : I N;
 COMMA : ',' ;
 LPAREN : '(' ;
 RPAREN : ')' ; 


 LETTER  : 'a'..'z'|'A'..'Z'|'_'|'#'|'@'|'\u0080'..'\ufffe'
 ;


 EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
 
 DOT: '.';
 REAL :   '-'?('0'..'9')+ '.' ('0'..'9')* EXPONENT?
 |   '-'?'.' ('0'..'9')+ EXPONENT?
 |   '-'?('0'..'9')+ EXPONENT;
 INTEGER : '-'?('0'..'9')+;
 HexLiteral : '0x' ('a'..'f' | '0'..'9')*;
 Currency
 : // generated as a part of NonQuotedIdentifier rule
 ('$' | '\u00a3'..'\u00a5' | '\u09f2'..'\u09f3' | '\u0e3f' | '\u20a0'..'\u20a4' | '\u20a6'..'\u20ab')
 (('0'..'9')+ '.' ('0'..'9')* )?
 ;


 Iso8601DateTime // yyyy-mm-dd [Thh:mm:ss[.fff]]Z
 // YEAR     Month       Day
 : '\'' ('0'..'9') ('0'..'9') ('0'..'9') ('0'..'9') '-' ('0'..'9') ('0'..'9') '-' ('0'..'9') ('0'..'9')
 ('T' ('0'..'9') ('0'..'9') ':' ('0'..'9') ('0'..'9') ':' ('0'..'9') ('0'..'9') ('.' (('0'..'9'))+)?)?  'Z'? '\'' 
 { Text = Text.Substring(1, Text.Length - 2); }
 ;


 GUID : '\'' HEX HEX HEX HEX HEX HEX HEX HEX '-' HEX HEX HEX HEX '-' HEX HEX HEX HEX '-' HEX HEX HEX HEX '-' HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX  '\'' 
 { Text = Text.Substring(1, Text.Length - 2); }
 ;
 fragment HEX : ('a'..'f' | 'A'..'F' | '0'..'9') ;


 WS  :   ( ' '
 | '\t'
 | '\r'
 | '\n'
 ) {$channel=HIDDEN;}
 ;
 NonQuotedIdentifier
 :
 ('A'..'Z'|'a'..'z' | '_' | '#' | '\u0080'..'\ufffe') (LETTER |  '0'..'9')* // first char other than '@'
 ;


 QuotedIdentifier
 :
 (
  '[' (~']')* ']' (']' (~']')* ']')* { Text = Text.Substring(1, Text.Length - 2); }
 | '"' (~'"')* '"' ('"' (~'"')* '"')* { Text = Text.Substring(1, Text.Length - 2); }
 )
 ;


 ASCIIStringLiteral
 :
 '\'' ((~'\'')|('\\\''))* '\'' ( '\'' ((~'\'')|('\\\''))* '\'' )* { Text = Text.Substring(1, Text.Length - 2).Replace("\\'", "'"); }
 ;


 UnicodeStringLiteral
 :
 'n' '\'' ((~'\'')|('\\\''))* '\'' ( '\'' ((~'\'')|('\\\''))* '\'' )* { Text = Text.Substring(2, Text.Length - 3).Replace("\\'", "'"); }
 ;

Debugging the code I see that it does specify it as a LIKE, but the actual 'name like test-import' is being used as a 'NonQuotedIdentifier' and when it gets to the dash ('-' or minus) it seems to stop processing, throws the error mentioned above and returns just 'test'. I also notice that the dash is being recognized as a minus and placed in the REAL category instead of something else like NonQuotedIdentifier or even it's own field.

Does this seem right? I am thinking I might have to add something to the .g file, but not sure what.

I have posted this question on a few other sites but no one seems to want to answer it. Can someone here give me some insight? If more code is needed or explanations I will be happy to do so.

One last thing, we use Antlr for more than just filtering so the grammer is used more company wide than just my issue.

Thank you.

Eric Vergnaud

unread,

Jun 6, 2016, 1:26:18 AM6/6/16

to antlr-discussion

Hi,

according to your grammar, '-' is not a valid non quoted identifier character.

maybe it should be quoted in the input?

Eric

Steve Welborn

unread,

Jun 6, 2016, 2:11:16 AM6/6/16

to antlr-discussion

That is what I was thinking as well..creating a new field for it (i.e. DASH: '-' ;)

But I wonder if it would conflict with the '-' already being defined as a EXPONENT

I could try that, but that is a lot of work so wanted to confirm it first. Seems like each field defined has its own method in code..its a mess in my opinion.

Eric Vergnaud

unread,

Jun 6, 2016, 3:29:50 AM6/6/16

to antlr-discussion

I encourage you to read the ANTLR book to understand how it works.

You are likely to waste a lot of energy reverse engineering the product if you don't.

Steve Welborn

unread,

Jun 6, 2016, 1:29:24 PM6/6/16

to antlr-discussion

I agree and will start digging into it, so far it has been a waste since I've been looking into this for a week right now. I appreciate the time to help me on this.

Reply all

Reply to author

Forward