Capture group question

53 views
Skip to first unread message

anotherhoward

unread,
Mar 1, 2020, 7:28:42 AM3/1/20
to BBEdit Talk
In the Pattern Playground, I am running this pattern -> (\d{3}[-.]){2}(\d{4})
with the data shown below.

Here is my input data:
123.179-9876
123.456-9876
123-456-9870
126-456-987
1257-456--0
123-456 
123450000

Three capture groups are shown in the Capture groups box.

Screen Shot 2020-03-01 at 7.17.44 AM.png


In Capture group 1, I expected the result to be at least `123.` followed  by '.179-`, not just `179-`. 


Why is only `179-` displaying?


My Replace pattern is `\0`.


Fletcher Sandbeck

unread,
Mar 1, 2020, 11:00:58 AM3/1/20
to bbe...@googlegroups.com
I think the problem is that the {2} calling for a repetition of the previous pattern is outside the parentheses which signal the capture. You can use a non-capturing group (?: ) to group patterns without creating another capture. And then wrap the entire new expression with the repetition in parentheses so its entirety becomes the first capture.

((?:\d{3}[-.]){2})(\d{4})

Hope this helps,

[fletcher]


On Mar 1, 2020, at 4:28 AM, 'anotherhoward' via BBEdit Talk <bbe...@googlegroups.com> wrote:

In the Pattern Playground, I am running this pattern -> (\d{3}[-.]){2}(\d{4})
with the data shown below.

Here is my input data:
123.179-9876
123.456-9876
123-456-9870
126-456-987
1257-456--0
123-456 
123450000

Three capture groups are shown in the Capture groups box.

<Screen Shot 2020-03-01 at 7.17.44 AM.png>


In Capture group 1, I expected the result to be at least `123.` followed  by '.179-`, not just `179-`. 


Why is only `179-` displaying?


My Replace pattern is `\0`.



--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/f4f6710d-bb51-4c21-9270-e43b6c9eb2c6%40googlegroups.com.

anotherhoward

unread,
Mar 1, 2020, 6:09:03 PM3/1/20
to BBEdit Talk
fletcher,

Your change addressed my question. If you could explain what `?:` does I would much appreciate it.

Howard
To unsubscribe from this group and stop receiving emails from it, send an email to bbe...@googlegroups.com.

Tom Robinson

unread,
Mar 1, 2020, 9:12:27 PM3/1/20
to BBEdit Talk
It lets you use parenthesis without creating a capture group.

If you’re looking for ‘def’ in this line:

abc def

Then you could use:

(abc) (def)

But your ‘def’ would end up in capture group 2.

If you instead use:

(?:abc) (def)

Then ‘def’ will be in capture group 1.

(Capture groups being the replacement string you refer to with \1 \2 etc.)

Cheers

Howard

unread,
Mar 6, 2020, 10:33:58 PM3/6/20
to BBEdit Talk
When using the Pattern Playground, in the search pattern's capture group #1 (see below), why is `847-` appearing rather than `717-`?

Search pattern: (\d{3}[.-]?){2}

Source text: 717-847-8015

Capture Groups:
#0: 717-847-
#1: 847-

Fletcher Sandbeck

unread,
Mar 6, 2020, 11:58:25 PM3/6/20
to bbe...@googlegroups.com
There's a good discussion at the following URL but it occurs because the regular expression is evaluated as a state machine. Here the capturing group itself is repeated by the {2}. The first match is discarded when it sees the second match and that's what you see in the results.

https://www.regular-expressions.info/captureall.html

[fletcher]


--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/32702e50-f03b-4a76-94a1-6974886c4f0b%40googlegroups.com.

Roland Küffner

unread,
Mar 7, 2020, 5:12:30 AM3/7/20
to bbe...@googlegroups.com
Fletcher, thanks for the very helpfull link. To summarize that discussion:
the {2} tells the capture group to look twice for it‘s pattern, but the capture group only saves the last instance it found. Putting the whole search term into another capture group should give you the desired result (as the #0 suggests).

It is a little confusing or unintuitive(*), that the pattern FINDS both instances but only CAPTURES the last one. But once you understand the mechanism it is a lot easier to construct working patterns.

Regards, Roland

(*) hm, giving that a second thought I cannot come up with a place where the abstract beauty of regular expressions is blurred by stains of things like intuition :-)


Neil Faiman

unread,
Mar 7, 2020, 9:01:50 AM3/7/20
to BBEdit Talk Mailing List
On Mar 7, 2020, at 5:12 AM, Roland Küffner <medien...@gmail.com> wrote:

It is a little confusing or unintuitive(*), that the pattern FINDS both instances but only CAPTURES the last one. But once you understand the mechanism it is a lot easier to construct working patterns.

I think it may make more sense if you think that you get one capture group for result for each parenthesized subexpression in the regular expression. That is, the set of capture groups maps onto the text of the regular expression as it is written.

Since, because of repetition, a parenthesized expression may match multiple substrings when the pattern is actually applied, that leaves the question of which of the matches for a subexpression is actually captured. The choice is somewhat arbitrary, but from a purely practical point of view, if the first match was captured, there would, in general, be no way to get the last match; but if the last match is captured, you can always get the first ,match as well with a little recoding of the regular expression.

Regards,

Neil Faiman
Reply all
Reply to author
Forward
0 new messages