regexp syntax and named Unicode character classes

Tom Payne

unread,

Jan 7, 2020, 1:21:58 PM1/7/20

to golang-nuts

Hi,

tl;dr How should I use named Unicode character classes in regexps?

I'm trying to write a regular expression that matches Go identifiers, which start with a Unicode letter or underscore followed by zero or more Unicode letters, decimal digits, and/or underscores.

Based on the regexp syntax, and the variables in the unicode package which mention the classes "Letter" and "Number, decimal digit", I was expecting to write something like:

identiferRegexp := regexp.MustCompile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal digit}]_]*\z`)

However, this pattern does not compile, giving the error:

regexp: Compile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal digit}]_]*\z`): error parsing regexp: invalid character class range: `\p{Letter}`

Using the short name for character classes (L for Letter, Nd for Number, decimal digit) does work however:

identiferRegexp := regexp.MustCompile(`\A[\pL_][\pL\p{Nd}_]*\z`)

You can play with these regexps on play.golang.org.

Is this simply an oversight that Unicode character classes like "Letter" and "Number, decimal digit" are not available for use in regexps, or should I be using them differently?

Many thanks,

Tom

Ian Lance Taylor

unread,

Jan 7, 2020, 1:36:02 PM1/7/20

to Tom Payne, golang-nuts

The strings you can use with \p are the ones listed in
unicode.Categories and unicode.Scripts. So use \pL as you do in the
second example.

Ian

Tom Payne

unread,

Jan 7, 2020, 1:39:29 PM1/7/20

to golang-nuts

Thank you :) Is this worth adding to the regexp/syntax documentation? I'd happily contribute a patch.

Ian Lance Taylor

unread,

Jan 7, 2020, 1:43:44 PM1/7/20

to Tom Payne, golang-nuts

On Tue, Jan 7, 2020 at 10:39 AM Tom Payne <twp...@gmail.com> wrote:
>
> Thank you :) Is this worth adding to the regexp/syntax documentation? I'd happily contribute a patch.

I think so, if it can be described precisely and tersely. Thanks.

Ian

> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a22421cc-becb-496e-8d32-b41506536a54%40googlegroups.com.

alan...@gmail.com

unread,

Jan 7, 2020, 3:44:53 PM1/7/20

to golang-nuts

As Go's regular expressions are based on RE2, I always use the latter's documentation page to check what is and isn't allowed.