Regexp MustCompile does not recognize escaped metacharacter .

3,748 views
Skip to first unread message

lwnexgen

unread,
Jun 14, 2011, 10:55:23 AM6/14/11
to golang-nuts
Hi Guys -

I have the following Regexp defined:

var split_exp = regexp.MustCompile("(.+?)(\.[^.]*$|$)")

This should be a valid regular expression as far as I can tell - it
works fine in other languages.

Anyone have any idea why the compiler throws "unknown escape
sequence: ."?

Thanks!

chris dollin

unread,
Jun 14, 2011, 10:59:16 AM6/14/11
to lwnexgen, golang-nuts

Your \. escape is inside a string, where it doesn't mean anything.
You either want \\. (so that the string contains \.) or to use `...`
quotes, which don't do escapes, so the \. will be inside the
string.

Chris

--
Chris "allusive" Dollin

Jan Mercl

unread,
Jun 14, 2011, 11:04:19 AM6/14/11
to golan...@googlegroups.com

Kyle Lemons

unread,
Jun 14, 2011, 11:58:29 AM6/14/11
to lwnexgen, golang-nuts
var split_exp = regexp.MustCompile("(.+?)(\.[^.]*$|$)")

This should be a valid regular expression as far as I can tell - it
works fine in other languages.

In addition to the suggestion to use raw strings (backtick-quoted), it is also important to note that the current regexp package is very limited compared to other languages, so the comparison isn't particularly meaningful (yet).  I believe it is somewhat similar to egrep in syntax.  For instance, `\w` does not match word characters and `[-_a-zA-Z]` is not valid (the - must be escaped, even at the beginning and end of the class).  It is also important to note that the current regexp package has a replacement in the works, which will handle perl- and posix-style regexps much more fully.

~K 

Sam Gardner

unread,
Jun 14, 2011, 12:08:29 PM6/14/11
to Kyle Lemons, golang-nuts
I'm noticing some other weird functionality with this - for instance, the (.+?) seems to throw an error ("repeated closure (**, ++, etc.)") because it checks for repeated metacharacters by specifically disallowing combinations of .,+, and ? (see line 473 in regexp.go).

Does anyone have a good writeup anywhere on how to get around these limitations in Go?

Thanks,
Sam

Kyle Lemons

unread,
Jun 14, 2011, 12:14:54 PM6/14/11
to Sam Gardner, golang-nuts
The regexp package documentation explicitly states the regular expressions that are allowed.  Strictly speaking, these regular expressions can match any regular language.  Non-greedy operators only change which match is returned when something can match twice.  If you absolutely cannot construct your regular expression to match uniquely, and don't want to wait for the new regexp library, I believe there are go bindings for the re2 library somewhere.

I'm noticing some other weird functionality with this - for instance, the (.+?) seems to throw an error ("repeated closure (**, ++, etc.)") because it checks for repeated metacharacters by specifically disallowing combinations of .,+, and ? (see line 473 in regexp.go).
It's not weird functionality, it's just missing some things you might be used to from perl/PCRE.
 
Does anyone have a good writeup anywhere on how to get around these limitations in Go?
Write an unambiguous regular expression.  For instance, `<.*>` matches both "<html>" and "<html>...</html>" (the latter is chosen by the greedy operator) but you can make it unambiguous by e.g. `<[^>]*>`.

~K
Reply all
Reply to author
Forward
0 new messages