Help with regex lookahead to antlr4

607 views
Skip to first unread message

Daniel Dowling

unread,
Apr 13, 2016, 8:17:04 AM4/13/16
to antlr-di...@googlegroups.com

Hello,
I'm trying to write an antlr4 grammar that will allow me to accept all characters, apart from those I define otherwise, to be taken in a token.

Currently I have:
String:  [a-zA-Z0-9\.\*_\-\\\/\[\]@:]+ ;
StringWithSpaces: '\'' [a-zA-Z0-9 \*\-\\\/\(\),_\[\]\.@:]+ '\'';

Which has worked fine up until now, but I would like to support all UTF-8 characters (including special characters) unless I define otherwise.
For example, I don't want to allow a single quote ' unless it is escaped \', and I don't want to allow spaces in the String that is not surrounded by quotes.

Based on the rest of the grammar currently, the query
FieldName = 'FieldVame' AND OtherField = 'OtherValue'

Is tokenised to an 'and query' with two nodes that are 'equals queries' with the quotes left in the values. So I'd like to get a similar outcome from, for example,
FieldName = 'My Sister\'s House' AND OtherField = 'план'

EDIT: Whoops, forgot the regex
Something like this would work in regex, but I've no idea how to translate it

String:  (?:\\[(']|[^(' ])+;
StringWithSpaces: '\'' (?:\\'|[^'])+ '\'';

Eric Vergnaud

unread,
Apr 13, 2016, 2:34:08 PM4/13/16
to antlr-discussion
To accept all characters except than specific ones, you need to use the negative sign: ~[ ] will accept al characters except space

Daniel Dowling

unread,
Apr 14, 2016, 5:06:25 AM4/14/16
to antlr-di...@googlegroups.com
Hi Eric,
I've tried using the negative sign, but the problem is that I want to allow "\'" but not "'", so I can't just negate ' as this excludes the escaped one.

Eric Vergnaud

unread,
Apr 14, 2016, 11:42:02 AM4/14/16
to antlr-discussion
You can combine a negative pattern with a positive one:

~[ ] | '\"'
Reply all
Reply to author
Forward
0 new messages