regex_(search)|(match) with repetition operators

Ralf Goertz

unread,

Apr 17, 2018, 5:02:01 AM4/17/18

to

Hi,

is it possible to catch multiple matches with the repetition operators
"*", "+" and "{,}"?

#include <iostream>
#include <regex>
#include <string>

using namespace std;

int main() {
string s("foobar");
regex r("([fb][ao][or]){2}");
smatch sm;
if (regex_search(s,sm,r)) {
for (auto i:sm) cout<<i<<endl;
}
}

That program only gives

foobar
bar

I would like to also catch the "foo" alone. Of course I could rewrite
the regex but in my real world problem I don't know how many iterations
there will be and I want to catch them all. Is there a way to do that?

Ben Bacarisse

unread,

Apr 17, 2018, 6:43:47 AM4/17/18

to

Ralf Goertz <m...@myprovider.invalid> writes:

> is it possible to catch multiple matches with the repetition operators
> "*", "+" and "{,}"?

No. These operators describe a single pattern to be searched for.

When they apply to a pattern containing ()s the std:smatch just tells
you something about how the pattern was matched -- what was last matched
by that sub-expression.

There is no provision to store the arbitrary number of matches that
might result from one single sub-expression.

> #include <iostream>
> #include <regex>
> #include <string>
>
> using namespace std;
>
> int main() {
> string s("foobar");
> regex r("([fb][ao][or]){2}");
> smatch sm;
> if (regex_search(s,sm,r)) {
> for (auto i:sm) cout<<i<<endl;
> }
> }
>
> That program only gives
>
> foobar
> bar
>
> I would like to also catch the "foo" alone. Of course I could rewrite
> the regex but in my real world problem I don't know how many iterations
> there will be and I want to catch them all. Is there a way to do that?

You will need to write a regexp that matches smallest part you want and
match it repeatedly. Even then you might not get exactly what you want
because that is never entirely clear from a single example.

--
Ben.

Ralf Goertz

unread,

Apr 17, 2018, 8:41:22 AM4/17/18

to

Am Tue, 17 Apr 2018 11:43:36 +0100
schrieb Ben Bacarisse <ben.u...@bsb.me.uk>:

> Ralf Goertz <m...@myprovider.invalid> writes:
>
> > is it possible to catch multiple matches with the repetition
> > operators "*", "+" and "{,}"?
>
> No. These operators describe a single pattern to be searched for.
>
> When they apply to a pattern containing ()s the std:smatch just tells
> you something about how the pattern was matched -- what was last
> matched by that sub-expression.
>
> There is no provision to store the arbitrary number of matches that
> might result from one single sub-expression.

Well, that's a pity.

> > Of course I could rewrite the regex but in my real world problem I
> > don't know how many iterations there will be and I want to catch
> > them all. Is there a way to do that?
>
> You will need to write a regexp that matches smallest part you want
> and match it repeatedly. Even then you might not get exactly what
> you want because that is never entirely clear from a single example.

One of my real world example (there are many with differing complexity)
is the following:

some text 4.7 ( 2.3 ) 5.8 (6.2) 4.3 23.4 (2.9)

I need the numbers after "some text". There can be any number of numbers
but they come in pairs with the second parenthesized. However, that second
number is optional. So my regex looks something like

(([0-9.]+) +($ *([0-9.]+) *$)?)+$

(of course there is potential for improvement since I don't want the
number to start or end with a "." and there should only be one "." in
each number)

Of course here I could use a regex without the trailing "+" and match
repeatedly. And in all my other use cases I could probably use other
tricks. But having many different scenarios where it would be beneficial
to be able to match repetitive patterns I wonder why it isn't possible.

Alf P. Steinbach

unread,

Apr 17, 2018, 10:06:12 AM4/17/18

to

On 17.04.2018 14:41, Ralf Goertz wrote:
>
> [snip]

> One of my real world example (there are many with differing complexity)
> is the following:
>
> some text 4.7 ( 2.3 ) 5.8 (6.2) 4.3 23.4 (2.9)
>
>
> I need the numbers after "some text". There can be any number of numbers
> but they come in pairs with the second parenthesized. However, that second
> number is optional. So my regex looks something like
>
> (([0-9.]+) +($ *([0-9.]+) *$)?)+$
>
> (of course there is potential for improvement since I don't want the
> number to start or end with a "." and there should only be one "." in
> each number)
>
> Of course here I could use a regex without the trailing "+" and match
> repeatedly. And in all my other use cases I could probably use other
> tricks. But having many different scenarios where it would be beneficial
> to be able to match repetitive patterns I wonder why it isn't possible.

Maybe invent a higher level pattern matching language?

Cheers!,

- Alf

Ben Bacarisse

unread,

Apr 17, 2018, 10:55:29 AM4/17/18

to

Ralf Goertz <m...@myprovider.invalid> writes:

> Am Tue, 17 Apr 2018 11:43:36 +0100
> schrieb Ben Bacarisse <ben.u...@bsb.me.uk>:
>
>> Ralf Goertz <m...@myprovider.invalid> writes:
>>
>> > is it possible to catch multiple matches with the repetition
>> > operators "*", "+" and "{,}"?
>>
>> No. These operators describe a single pattern to be searched for.
>>
>> When they apply to a pattern containing ()s the std:smatch just tells
>> you something about how the pattern was matched -- what was last
>> matched by that sub-expression.
>>
>> There is no provision to store the arbitrary number of matches that
>> might result from one single sub-expression.
>
> Well, that's a pity.

Yes, it can be useful but it's not widely supported. I imagine that's
because it often clearer to do repeated matching and it would complicate
getting the results from otherwise simple patterns (though I suppose it
could be an option).

You can do it in Python with named captures using the extended
(3rd-party) regex module.

<snip>

> One of my real world example (there are many with differing complexity)
> is the following:
>
> some text 4.7 ( 2.3 ) 5.8 (6.2) 4.3 23.4 (2.9)
>
>
> I need the numbers after "some text". There can be any number of numbers
> but they come in pairs with the second parenthesized. However, that second
> number is optional. So my regex looks something like
>
> (([0-9.]+) +($ *([0-9.]+) *$)?)+$
>

> Of course here I could use a regex without the trailing "+" and match
> repeatedly.

Yes, that's probably what you'll have to do, though in this case you
could just use sscanf or >> (again, in a loop).

> And in all my other use cases I could probably use other
> tricks.

You might be able to generalise the loop into a function so that all
the cases are done in essentially the same way but that's impossible to
tell from here.

--
Ben.

Jorgen Grahn

unread,

Apr 24, 2018, 9:26:05 AM4/24/18

to

I'd try this approach:
- Split away "some text"
- Split into not-quite-pairs with a regex.
- Parse each pair with a regex, or with std::strtod() and manual
parsing (the latter is simplified by you knowing the text matches a
certain regex).

Doing too much in one regex is dangerous in any language.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .