Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

find which subgroups don't match in regex

0 views
Skip to first unread message

Shoryuken

unread,
Jul 17, 2008, 3:39:58 PM7/17/08
to
Hello gents, here's the thing been confusing me for a while:

$regex="(\w+)\s([0-9]+)";

$a="Tom 1990"; # it's a match
$b="Jack xyz"; # not a match, because of $2 doesn't match ... but
here's my question, exactly how to inform the users of this unmatched
subgroup? (i.e. $2 is the problem, $1 is fine, etc.)

For a regex matching, is there a way to find which subgroups don't
match?

thanks in advance.

Ben Morrow

unread,
Jul 17, 2008, 4:02:25 PM7/17/08
to

Quoth Shoryuken <sakradev...@gmail.com>:

You can use /gc and \G to match one piece at a time, without losing your
place; something like

my @matches = qw/ \w+ \s [0-9]+ /;
my $string = 'Jack xyz';

for my $match (@matches) {
$string =~ /\G$match/gc
or print "$match failed at position " . pos $string;
}

Ben

--
Outside of a dog, a book is a man's best friend.
Inside of a dog, it's too dark to read.
b...@morrow.me.uk Groucho Marx

Leon Timmermans

unread,
Jul 17, 2008, 4:07:47 PM7/17/08
to
On Thu, 17 Jul 2008 12:39:58 -0700, Shoryuken wrote:

> Hello gents, here's the thing been confusing me for a while:
>
> $regex="(\w+)\s([0-9]+)";
>

Regular expressions aren't strings in Perl, please don't make them
strings. There is absolutely no reason to do so. Also, [0-9] can be
better written as \d. Also, you could consider anchoring the regexp to
the beginning and the end of the string.

> $a="Tom 1990"; # it's a match
> $b="Jack xyz"; # not a match, because of $2 doesn't match ... but here's
> my question, exactly how to inform the users of this unmatched subgroup?
> (i.e. $2 is the problem, $1 is fine, etc.)
>

In this case, you could match for /\w+\s/. If that is present then the
absence number is the problem.

> For a regex matching, is there a way to find which subgroups don't
> match?
>

In the general case, no. That's because they fail all of the time, until
they succeed. There is no definitive moment of failure.

Leon Timmermans

xho...@gmail.com

unread,
Jul 17, 2008, 4:15:51 PM7/17/08
to

There isn't a built-in way. You'd have to build it yourself, and that
will probably be non-trivial, as it would pretty much have to be an expert
system in your exact context, not just some standard Perl feature.

For example, whose "fault" is it that this doesn't match:

"1990 Tom" =~ /(\w+)\s(\d+)/;

Both subgroups will match individually, just not when put together.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Shoryuken

unread,
Jul 18, 2008, 1:53:04 PM7/18/08
to
On Jul 17, 5:02 pm, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth Shoryuken <sakradevanamin...@gmail.com>:

This is a great idea, thanks!

And thanks the other guys for the good input, too!

0 new messages