Hello,
I'm trying to write an antlr4 grammar that will allow me to accept all characters, apart from those I define otherwise, to be taken in a token.
Currently I have:
String: [a-zA-Z0-9\.\*_\-\\\/\[\]@:]+ ;
StringWithSpaces: '\'' [a-zA-Z0-9 \*\-\\\/\(\),_\[\]\.@:]+ '\'';
Which has worked fine up until now, but I would like to support all UTF-8 characters (including special characters) unless I define otherwise.
For example, I don't want to allow a single quote ' unless it is escaped \', and I don't want to allow spaces in the String that is not surrounded by quotes.
Based on the rest of the grammar currently, the query
FieldName = 'FieldVame' AND OtherField = 'OtherValue'
Is tokenised to an 'and query' with two nodes that are 'equals queries' with the quotes left in the values. So I'd like to get a similar outcome from, for example,
FieldName = 'My Sister\'s House' AND OtherField = 'план'
EDIT: Whoops, forgot the regex
Something like this would work in regex, but I've no idea how to translate it
String: (?:\\[(']|[^(' ])+;
StringWithSpaces: '\'' (?:\\'|[^'])+ '\'';