I am working on an issue where we have a querystring with a filter ($filter=name like test-import) that only seems to return the first portion of the criteria ('test'). I am new to Antlr so I am going through how another team has implemented it and need some direction if possible.
Here is the grammer that I am using:
grammar FilterParser;
options {
language = CSharp2;
output=AST;
}
@parser::namespace {Namespace.Text}
@lexer::namespace {Namespace.Text}
@members {public static string[] TokenNameList { get { return tokenNames; } }}
public root : filter EOF;
filter
: subFilter ( (AND | OR)^ subFilter )*
;
subFilter
:
(NOT^)? predicate
;
predicate
:
group
| identifier comparisonOperator^ expression
| identifier^ IS (NOT)? NULL
| identifier^ (NOT)? (
LIKE expression // only single char
| BETWEEN expression AND expression
| IN LPAREN expression (COMMA expression)* RPAREN
)
| INTEGER EQ INTEGER
;
group : LPAREN filter RPAREN;
expression : constant | identifier;
stringLiteral
:
UnicodeStringLiteral
| ASCIIStringLiteral
;
identifier
:
NonQuotedIdentifier
| QuotedIdentifier
;
constant
: Iso8601DateTime | Currency | boolean | NULL | number |stringLiteral | GUID
;
comparisonOperator
:
EQ | NEQ | LTE | LT | GTE| GT
;
number : INTEGER | REAL | HexLiteral;
boolean : TRUE | FALSE;
A : 'a'|'A';
B : 'b'|'B';
D : 'd'|'D';
E : 'e'|'E';
F : 'f'|'F';
G : 'g'|'G';
H : 'h'|'H';
I : 'i'|'I';
K : 'k'|'K';
L : 'l'|'L';
M : 'm'|'M';
N : 'n'|'N';
O : 'o'|'O';
P : 'p'|'P';
Q : 'q'|'Q';
R : 'r'|'R';
S : 's'|'S';
T : 't'|'T';
U : 'u'|'U';
W : 'w'|'W';
X : 'x'|'X';
Y : 'y'|'Y';
TRUE : T R U E;
FALSE : F A L S E;
BETWEEN : B E T W E E N;
LIKE : L I K E;
NULL : N U L L;
EQ : E Q|'=';
AND : A N D | '&&';
OR : O R | '||';
NOT : N O T | '!';
IS : I S;
GT : '>'|G T;
GTE : '>='|'!>'|G T E;
LT : '<'|L T;
LTE : '<='|'!<'|L T E;
NEQ : '<>'|'!='|N E Q;
IN : I N;
COMMA : ',' ;
LPAREN : '(' ;
RPAREN : ')' ;
LETTER : 'a'..'z'|'A'..'Z'|'_'|'#'|'@'|'\u0080'..'\ufffe'
;
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
DOT: '.';
REAL : '-'?('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '-'?'.' ('0'..'9')+ EXPONENT?
| '-'?('0'..'9')+ EXPONENT;
INTEGER : '-'?('0'..'9')+;
HexLiteral : '0x' ('a'..'f' | '0'..'9')*;
Currency
: // generated as a part of NonQuotedIdentifier rule
('$' | '\u00a3'..'\u00a5' | '\u09f2'..'\u09f3' | '\u0e3f' | '\u20a0'..'\u20a4' | '\u20a6'..'\u20ab')
(('0'..'9')+ '.' ('0'..'9')* )?
;
Iso8601DateTime // yyyy-mm-dd [Thh:mm:ss[.fff]]Z
// YEAR Month Day
: '\'' ('0'..'9') ('0'..'9') ('0'..'9') ('0'..'9') '-' ('0'..'9') ('0'..'9') '-' ('0'..'9') ('0'..'9')
('T' ('0'..'9') ('0'..'9') ':' ('0'..'9') ('0'..'9') ':' ('0'..'9') ('0'..'9') ('.' (('0'..'9'))+)?)? 'Z'? '\''
{ Text = Text.Substring(1, Text.Length - 2); }
;
GUID : '\'' HEX HEX HEX HEX HEX HEX HEX HEX '-' HEX HEX HEX HEX '-' HEX HEX HEX HEX '-' HEX HEX HEX HEX '-' HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX '\''
{ Text = Text.Substring(1, Text.Length - 2); }
;
fragment HEX : ('a'..'f' | 'A'..'F' | '0'..'9') ;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
NonQuotedIdentifier
:
('A'..'Z'|'a'..'z' | '_' | '#' | '\u0080'..'\ufffe') (LETTER | '0'..'9')* // first char other than '@'
;
QuotedIdentifier
:
(
'[' (~']')* ']' (']' (~']')* ']')* { Text = Text.Substring(1, Text.Length - 2); }
| '"' (~'"')* '"' ('"' (~'"')* '"')* { Text = Text.Substring(1, Text.Length - 2); }
)
;
ASCIIStringLiteral
:
'\'' ((~'\'')|('\\\''))* '\'' ( '\'' ((~'\'')|('\\\''))* '\'' )* { Text = Text.Substring(1, Text.Length - 2).Replace("\\'", "'"); }
;
UnicodeStringLiteral
:
'n' '\'' ((~'\'')|('\\\''))* '\'' ( '\'' ((~'\'')|('\\\''))* '\'' )* { Text = Text.Substring(2, Text.Length - 3).Replace("\\'", "'"); }
;
Debugging the code I see that it does specify it as a LIKE, but the actual 'name like test-import' is being used as a 'NonQuotedIdentifier' and when it gets to the dash ('-' or minus) it seems to stop processing, throws the error mentioned above and returns just 'test'. I also notice that the dash is being recognized as a minus and placed in the REAL category instead of something else like NonQuotedIdentifier or even it's own field.
Does this seem right? I am thinking I might have to add something to the .g file, but not sure what.
I have posted this question on a few other sites but no one seems to want to answer it. Can someone here give me some insight? If more code is needed or explanations I will be happy to do so.
One last thing, we use Antlr for more than just filtering so the grammer is used more company wide than just my issue.
Thank you.