non-capturing group not working?

7,727 views
Skip to first unread message

Andy W. Song

unread,
Apr 8, 2012, 2:21:06 AM4/8/12
to golang-nuts
Please see this snippet:

package main

import (
"fmt"
"regexp"
)

func main() {
            s := "datid=12345"
            re,_ := regexp.Compile("(?:datid=)[0-9]{5}")
            fmt.Println(re.FindString(s))
}

I think it should output 12345. The actual output is datid=12345. I'm using go1.

Thanks
Andy


--
---------------------------------------------------------------
有志者,事竟成,破釜沉舟,百二秦关终属楚
苦心人,天不负,卧薪尝胆,三千越甲可吞吴

Michael Jones

unread,
Apr 8, 2012, 3:39:51 AM4/8/12
to Andy W. Song, golang-nuts
http://play.golang.org/p/T0hG-lXkmj
--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

Andy W. Song

unread,
Apr 8, 2012, 10:50:22 PM4/8/12
to Michael Jones, golang-nuts
That works. Thanks.


2012/4/8 Michael Jones <m...@google.com>

tobias...@dynport.de

unread,
Oct 2, 2013, 3:56:16 PM10/2/13
to golan...@googlegroups.com, Michael Jones
Is there any reason why FindString does not support that functionality? Why does FindString not respect non-capture groups? Is that on purpose?

The problem I have with FindStringSubmatch is that I have to check for the length of the returned slice every time I use it (even if I only want the first group).

This is a helper I came up with: http://play.golang.org/p/4NfGmvtfYu but I do not see a reason for non-capture groups being ignored in FindString.

Russ Cox

unread,
Oct 3, 2013, 11:07:34 AM10/3/13
to tobias...@dynport.de, golang-nuts, Michael Jones
I think you misunderstand what a non-capturing group is.

A capturing group is a ( ) that is recorded in the indexed match list, what other languages might call $1, $2, $3 and so on. A non-capturing group is a way to use ( ) without taking one of those numbers. Whether a group is capturing or not has no effect on the full string matched by the overall expression. The full string matched here is "datid=12345", and so that is what FindString returns.

You use non-capturing groups for the same reason you use parentheses in the arithmetic expression (x+y)*z: overriding the default operator precedence. The precedence is the same here with or without the group.

Put another way, (?:datid=)[0-9]{5} is exactly the same regular expression as datid=[0-9]{5}.

Russ

dombrowsk...@gmail.com

unread,
Oct 19, 2017, 11:27:04 AM10/19/17
to golang-nuts
I add this for others having problems with non capture groups when using regular expressions in google scripts:

I struggled trying to debug why this simple regex would not work for me in google application script: /(?:SKU\: \*)(\d*)/  [in this case i am trying to extract the SKU number from the body of an email].

The results were nonsensical and I struggled off and on for a few days trying to resolve the issue while i played around with variations - always leaving in the non-capture group. Then I stumbled on this reference in google docs https://support.google.com/a/answer/1371417?hl=en that suggested that only RE2 syntax is supported by Google. Following the link to RE2 on github and then to the syntax page you will see quite a few formats that are not implemented - non-capture group is one of them: https://github.com/google/re2/wiki/Syntax.

As soon as I removed the non-capture group like Russ pointed out the code worked flawlessly. This worked for me: /SKU\: \*(\d*)/

One of the main reasons I was thrown off for so long is that this Regular Expression works like I need it to and it has a non-capture group - /(?:\*Total\*\s\n|cutoff\*\s\n)(.*?)\s*(?=$|\(|blu|Blu)/

Perhaps I was just lucky and it really is not working correct but correct enough for my purposes. I now plan to simplify my regular expressions using the RE2 guidelines on GitHUB

roger peppe

unread,
Oct 20, 2017, 3:45:40 AM10/20/17
to dombrowsk...@gmail.com, golang-nuts
On 19 October 2017 at 16:01, <dombrowsk...@gmail.com> wrote:
> I add this for others having problems with non capture groups when using
> regular expressions in google scripts:
>
> I struggled trying to debug why this simple regex would not work for me in
> google application script: /(?:SKU\: \*)(\d*)/ [in this case i am trying to
> extract the SKU number from the body of an email].
>
> The results were nonsensical and I struggled off and on for a few days
> trying to resolve the issue while i played around with variations - always
> leaving in the non-capture group. Then I stumbled on this reference in
> google docs https://support.google.com/a/answer/1371417?hl=en that suggested
> that only RE2 syntax is supported by Google. Following the link to RE2 on
> github and then to the syntax page you will see quite a few formats that are
> not implemented - non-capture group is one of them:
> https://github.com/google/re2/wiki/Syntax.
>
> As soon as I removed the non-capture group like Russ pointed out the code
> worked flawlessly. This worked for me: /SKU\: \*(\d*)/

Can you come up with a simple example where /SKU\: \*(\d*)/ works
but /(?:SKU\: \*)(\d*)/ does not? We use non-capturing groups and they
generally seem to work as expected, but perhaps you've found a bug.
Reply all
Reply to author
Forward
0 new messages