Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Idea for new regex warning.

0 views
Skip to first unread message

Benjamin Goldberg

unread,
Dec 18, 2002, 1:32:33 AM12/18/02
to p5p
I've seen on comp.lang.perl.misc, a number of times that clooless
programmers do stuff like:

$string =~ /something/;
print $1;

Where a regex is performed, and no check is made to whether it succeeds
or not, and then a dollar-digit variable is used.

Does it seem like a reasonable idea to make regex matching in void
context produce a warning?

Possibly, it might be limited to only doing so if the match fails, since
the user might have knowingly written a regex which "always" matches,
and thus he "knows" that the dollar-digit variables are valid.

Right now, I'm just interested in opinions on whether this sounds like a
good idea.

Also, I'd like to know how I could search through all of perl's
*.{pm,pl,t} files for regex matching being done in void context, so that
I can find out how many places might provoke this warning. I'm guessing
one of the B:: modules would do it, but I'm not sure how.

--
$..='(?:(?{local$^C=$^C|'.(1<<$_).'})|)'for+a..4;
$..='(?{print+substr"\n !,$^C,1 if $^C<26})(?!)';
$.=~s'!'haktrsreltanPJ,r coeueh"';BEGIN{${"\cH"}
|=(1<<21)}""=~$.;qw(Just another Perl hacker,\n);

Brent Dax

unread,
Dec 18, 2002, 2:19:02 AM12/18/02
to Benjamin Goldberg, p5p
Benjamin Goldberg:
# Does it seem like a reasonable idea to make regex matching in
# void context produce a warning?
#
# Possibly, it might be limited to only doing so if the match
# fails, since the user might have knowingly written a regex
# which "always" matches, and thus he "knows" that the
# dollar-digit variables are valid.

This could cause heisenbugs:

sub foo { /something/ }

foo(); #warns
$x=foo(); #ok

For the specific case you're thinking about, perhaps you could warn if
you use $NUMBER when the previous match wasn't successful.

--Brent Dax <bren...@cpan.org>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

"If you want to propagate an outrageously evil idea, your conclusion
must be brazenly clear, but your proof unintelligible."
--Ayn Rand, explaining how today's philosophies came to be


Abigail

unread,
Dec 18, 2002, 5:19:47 AM12/18/02
to Benjamin Goldberg, p5p
On Wed, Dec 18, 2002 at 01:32:33AM -0500, Benjamin Goldberg wrote:
> I've seen on comp.lang.perl.misc, a number of times that clooless
> programmers do stuff like:
>
> $string =~ /something/;
> print $1;
>
> Where a regex is performed, and no check is made to whether it succeeds
> or not, and then a dollar-digit variable is used.
>
> Does it seem like a reasonable idea to make regex matching in void
> context produce a warning?
>
> Possibly, it might be limited to only doing so if the match fails, since
> the user might have knowingly written a regex which "always" matches,
> and thus he "knows" that the dollar-digit variables are valid.

I certainly use that often. It also shouldn't warn if the dollar-digit
variables aren't used - a regex could be used for its other side-effects -
even failing ones (consider the (?{ }), (??{ }) and (?(?{ })) constructs).

> Right now, I'm just interested in opinions on whether this sounds like a
> good idea.

I'm very sceptical towards warnings catering the 'clooless' programmer,
and warnings that require experience programmers to code around
constructs, or to turn warnings off.

Abigail

Slaven Rezic

unread,
Dec 18, 2002, 4:45:11 AM12/18/02
to Benjamin Goldberg, p5p
Benjamin Goldberg <gol...@earthlink.net> writes:

> I've seen on comp.lang.perl.misc, a number of times that clooless
> programmers do stuff like:
>
> $string =~ /something/;
> print $1;
>
> Where a regex is performed, and no check is made to whether it succeeds
> or not, and then a dollar-digit variable is used.
>
> Does it seem like a reasonable idea to make regex matching in void
> context produce a warning?

Maybe the user wants to use just $&, $` or $':

$string =~ /something/;
print $&;

>
> Possibly, it might be limited to only doing so if the match fails, since
> the user might have knowingly written a regex which "always" matches,
> and thus he "knows" that the dollar-digit variables are valid.
>
> Right now, I'm just interested in opinions on whether this sounds like a
> good idea.
>
> Also, I'd like to know how I could search through all of perl's
> *.{pm,pl,t} files for regex matching being done in void context, so that
> I can find out how many places might provoke this warning. I'm guessing
> one of the B:: modules would do it, but I'm not sure how.
>

--
Slaven Rezic - slaven...@berlin.de

tkrevdiff - graphical display of diffs between revisions (RCS or CVS)
http://ptktools.sourceforge.net/#tkrevdiff

H.Merijn Brand

unread,
Dec 18, 2002, 6:25:19 AM12/18/02
to slaven...@berlin.de, Perl 5 Porters
On Wed 18 Dec 2002 10:45, Slaven Rezic <slaven...@berlin.de> wrote:
> Benjamin Goldberg <gol...@earthlink.net> writes:
>
> > I've seen on comp.lang.perl.misc, a number of times that clooless
> > programmers do stuff like:
> >
> > $string =~ /something/;
> > print $1;
> >
> > Where a regex is performed, and no check is made to whether it succeeds
> > or not, and then a dollar-digit variable is used.
> >
> > Does it seem like a reasonable idea to make regex matching in void
> > context produce a warning?
>
> Maybe the user wants to use just $&, $` or $':
>
> $string =~ /something/;
> print $&;

or reset the (empty) search without using reset

"" =~ /BaD-áàéè/;

--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.8.0 & 633 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
WinNT 4, Win2K pro & WinCE 2.11. Smoking perl CORE: smo...@perl.org
http://archives.develooper.com/daily...@perl.org/ per...@perl.org
send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org


Nicholas Clark

unread,
Dec 18, 2002, 6:33:52 AM12/18/02
to Abigail, Benjamin Goldberg, p5p

To make this warning useful, we'd be assuming that the clueless programmer has
warnings turned on. I see a flaw in this plan. :-(

Nicholas Clark

Rafael Garcia-Suarez

unread,
Dec 18, 2002, 7:19:05 AM12/18/02
to Benjamin Goldberg, perl5-...@perl.org
Benjamin Goldberg <gol...@earthlink.net> wrote:
> I've seen on comp.lang.perl.misc, a number of times that clooless
> programmers do stuff like:
>
> $string =~ /something/;
> print $1;
>
> Where a regex is performed, and no check is made to whether it succeeds
> or not, and then a dollar-digit variable is used.
>
> Does it seem like a reasonable idea to make regex matching in void
> context produce a warning?
>
> Possibly, it might be limited to only doing so if the match fails, since
> the user might have knowingly written a regex which "always" matches,
> and thus he "knows" that the dollar-digit variables are valid.

If the match has failed, $1 might be undef, unless there was a successful
match earlier in an enclosing scope. If it's undef it will warn when used.

> Right now, I'm just interested in opinions on whether this sounds like a
> good idea.

I don't like it, there are just too many ways this warning can be annoying
or useless.

if ($x =~ /(foo)/ or $x =~ /bar/) { print $1 }

if ($x =~ /(foo)/ or $x =~ /bar/) { print defined $1 ? $1 : "bar" }

if ($x =~ /(foo)/ or $y =~ /(foo)(bar)/) { print $2; }

Paul Johnson

unread,
Dec 18, 2002, 7:41:34 AM12/18/02
to Nicholas Clark, Abigail, Benjamin Goldberg, p5p

And the solution is obvious; make it a mandatory warning.

*ducks and runs*

--
Paul Johnson - pa...@pjcj.net
http://www.pjcj.net

Simon Cozens

unread,
Dec 18, 2002, 6:50:27 AM12/18/02
to perl5-...@perl.org
gol...@earthlink.net (Benjamin Goldberg) writes:
> Also, I'd like to know how I could search through all of perl's
> *.{pm,pl,t} files for regex matching being done in void context, so that
> I can find out how many places might provoke this warning. I'm guessing
> one of the B:: modules would do it, but I'm not sure how.

Look at uninit.pm for ideas. If you really want to turn this into a
custom warning, consider using optimize(r?).pm

--
dd.c: sbrk(64); /* For good measure */
- plan9 has a bad day

Benjamin Goldberg

unread,
Dec 18, 2002, 11:41:49 PM12/18/02
to perl5-...@perl.org
Rafael Garcia-Suarez wrote:

> Benjamin Goldberg wrote:
> > I've seen on comp.lang.perl.misc, a number of times that clooless
> > programmers do stuff like:
> >
> > $string =~ /something/;
> > print $1;
> >
> > Where a regex is performed, and no check is made to whether it
> > succeeds or not, and then a dollar-digit variable is used.
> >
> > Does it seem like a reasonable idea to make regex matching in void
> > context produce a warning?
> >
> > Possibly, it might be limited to only doing so if the match fails,
> > since the user might have knowingly written a regex which "always"
> > matches, and thus he "knows" that the dollar-digit variables are
> > valid.
>
> If the match has failed, $1 might be undef, unless there was a
> successful match earlier in an enclosing scope.

Yes, and that's precisely the situation where problems arise!

The user doesn't know whether $1 is undef, or still is the prior value
of $1 -- he can't know, since he didn't check whether or not the m//
succeeded.

> If it's undef it will warn when used.

But if $1 has a valid string value due to some prior successful match,
then what?

> > Right now, I'm just interested in opinions on whether this sounds
> > like a good idea.
>
> I don't like it, there are just too many ways this warning can be
> annoying or useless.
>
> if ($x =~ /(foo)/ or $x =~ /bar/) { print $1 }
>
> if ($x =~ /(foo)/ or $x =~ /bar/) { print defined $1 ? $1 : "bar" }
>
> if ($x =~ /(foo)/ or $y =~ /(foo)(bar)/) { print $2; }

None of these would trigger the warning, since in all of these cases,
the regex match is not being done in void context. It's being done in
scalar context.

Benjamin Goldberg

unread,
Dec 19, 2002, 12:11:04 AM12/19/02
to Abigail, p5p
Abigail wrote:
>
> On Wed, Dec 18, 2002 at 01:32:33AM -0500, Benjamin Goldberg wrote:
> > I've seen on comp.lang.perl.misc, a number of times that clooless
> > programmers do stuff like:
> >
> > $string =~ /something/;
> > print $1;
> >
> > Where a regex is performed, and no check is made to whether it
> > succeeds or not, and then a dollar-digit variable is used.
> >
> > Does it seem like a reasonable idea to make regex matching in void
> > context produce a warning?
> >
> > Possibly, it might be limited to only doing so if the match fails,
> > since the user might have knowingly written a regex which "always"
> > matches, and thus he "knows" that the dollar-digit variables are
> > valid.
>
> I certainly use that often.

Yes; if you "know" the match will succeed, you generally don't bother to
put code in to do that check.

> It also shouldn't warn if the dollar-digit variables aren't used -

I hadn't suggested that the non-use of dollar-digit variables might
surpress the warning -- indeed, I hadn't even considered associating the
warning with the dollar-digit variables; rather, with the m// operator
itself.

> a regex could be used for its other side-effects - even failing ones
> (consider the (?{ }), (??{ }) and (?(?{ })) constructs).

What about regexen which don't have side-effects? Warn if the match
fails, and it doesn't have any of (?{ }), (??{ }) or (?(?{ })), and it's
in void context?

> > Right now, I'm just interested in opinions on whether this sounds
> > like a good idea.
>
> I'm very sceptical towards warnings catering the 'clooless'
> programmer,

But aren't *all* warnings aimed at 'clooless' programmers? I mean, you
or I would *never* accidentally leave a variable uninitialized, right?

> and warnings that require experience programmers to code
> around constructs, or to turn warnings off.

Don't all warnings that one might provoke require experienced
programmers to code around the construct, or turn warnings off?

Changing
my $x;
Into
my $x = "";

To avoid an uninitalized warning is, in my book, coding around the
construct to avoid a warning. How horrible would it be to change

m/$pattern/;
Into:
m/$pattern/ + 0;

In those situations where it's alright for the match to fail, and you
don't want to otherwise check whether or not it succeeded.

Benjamin Goldberg

unread,
Dec 19, 2002, 12:22:01 AM12/19/02
to H.Merijn Brand, slaven...@berlin.de, Perl 5 Porters
H.Merijn Brand wrote:
>
> On Wed 18 Dec 2002 10:45, Slaven Rezic <slaven...@berlin.de> wrote:
> > Benjamin Goldberg <gol...@earthlink.net> writes:
> >
> > > I've seen on comp.lang.perl.misc, a number of times that clooless
> > > programmers do stuff like:
> > >
> > > $string =~ /something/;
> > > print $1;
> > >
> > > Where a regex is performed, and no check is made to whether it
> > > succeeds or not, and then a dollar-digit variable is used.
> > >
> > > Does it seem like a reasonable idea to make regex matching in void
> > > context produce a warning?
> >
> > Maybe the user wants to use just $&, $` or $':
> >
> > $string =~ /something/;
> > print $&;

But if the match fails, then $& will not be properly set. Without the
user checking what =~// evaluates to (that is, the match operation was
performed in void context), he cannot discover that the match has
failed. If he doesn't know the match failed, and cannot know that $&
isn't properly set.

> or reset the (empty) search without using reset
>

> "" =~ /BaD-באיט/;

According to perldoc perlop:

If the PATTERN evaluates to the empty string, the last successfully
matched regular expression is used instead. In this case, only the g
and c flags on the empty pattern is honoured - the other flags are
taken from the original pattern. If no match has previously
succeeded, this will (silently) act instead as a genuine empty
pattern (which will always match).

So your example, "" =~ /BaD-באיט/, will not reset this, since it's not a
successful match. Thus, imho, this should produce a warning.

Rafael Garcia-Suarez

unread,
Dec 19, 2002, 3:14:56 AM12/19/02
to Benjamin Goldberg, perl5-...@perl.org
Benjamin Goldberg <gol...@earthlink.net> wrote:

> Rafael Garcia-Suarez wrote:
> >
> > I don't like it, there are just too many ways this warning can be
> > annoying or useless.
> >
> > if ($x =~ /(foo)/ or $x =~ /bar/) { print $1 }
> >
> > if ($x =~ /(foo)/ or $x =~ /bar/) { print defined $1 ? $1 : "bar" }
> >
> > if ($x =~ /(foo)/ or $y =~ /(foo)(bar)/) { print $2; }
>
> None of these would trigger the warning, since in all of these cases,
> the regex match is not being done in void context. It's being done in
> scalar context.

That's what I meant; a useful warning would catch cases 1 and 3 above.
More precisely, cases where a $DIGIT variable is used and where the
last pattern match hasn't set it (either the last match was unsuccessful,
or it was successful but only set $1..$n and not the ${n+1} used.)
And even with that definition, I'm not sure I want this warning.
What would be the warning message ? Thinking about the warning message
in the first place usually helps to precise its meaning.

Abigail

unread,
Dec 19, 2002, 4:26:27 AM12/19/02
to Benjamin Goldberg, p5p
On Thu, Dec 19, 2002 at 12:11:04AM -0500, Benjamin Goldberg wrote:
> Abigail wrote:
> >
> > It also shouldn't warn if the dollar-digit variables aren't used -
>
> I hadn't suggested that the non-use of dollar-digit variables might
> surpress the warning -- indeed, I hadn't even considered associating the
> warning with the dollar-digit variables; rather, with the m// operator
> itself.
>
> > a regex could be used for its other side-effects - even failing ones
> > (consider the (?{ }), (??{ }) and (?(?{ })) constructs).
>
> What about regexen which don't have side-effects? Warn if the match
> fails, and it doesn't have any of (?{ }), (??{ }) or (?(?{ })), and it's
> in void context?

But all regexes have side effects. If only because of $`, $& and $'.
But also the digit variables - even a regex without parens, which will
undef the digit variables on success. And then we have 'reset', the
empty regex, '? ?', pos and \G, which can all be influenced by a regex
in void context.

Also, what to do with s/// in void context? That too can set $1 and
friends, but s/// in void context is even more common than a match in
void context.

>
> > > Right now, I'm just interested in opinions on whether this sounds
> > > like a good idea.
> >
> > I'm very sceptical towards warnings catering the 'clooless'
> > programmer,
>
> But aren't *all* warnings aimed at 'clooless' programmers? I mean, you
> or I would *never* accidentally leave a variable uninitialized, right?
>
> > and warnings that require experience programmers to code
> > around constructs, or to turn warnings off.
>
> Don't all warnings that one might provoke require experienced
> programmers to code around the construct, or turn warnings off?
>
> Changing
> my $x;
> Into
> my $x = "";
>
> To avoid an uninitalized warning is, in my book, coding around the
> construct to avoid a warning. How horrible would it be to change
>
> m/$pattern/;
> Into:
> m/$pattern/ + 0;
>
> In those situations where it's alright for the match to fail, and you
> don't want to otherwise check whether or not it succeeded.


Quite horribly actually. It's confusing, and makes people wonder why
on earth you're adding 0 to it. I wouldn't at all compare that to
initalize variables - that warning is only issued if you are actually
using an uninitialized variable in operations where that doesn't make
much sense. You won't get it on the 'my $x;' line.

Abigail

Slaven Rezic

unread,
Dec 19, 2002, 5:55:27 AM12/19/02
to Benjamin Goldberg, H.Merijn Brand, Perl 5 Porters
Benjamin Goldberg <gol...@earthlink.net> writes:

> H.Merijn Brand wrote:
> >
> > On Wed 18 Dec 2002 10:45, Slaven Rezic <slaven...@berlin.de> wrote:
> > > Benjamin Goldberg <gol...@earthlink.net> writes:
> > >
> > > > I've seen on comp.lang.perl.misc, a number of times that clooless
> > > > programmers do stuff like:
> > > >
> > > > $string =~ /something/;
> > > > print $1;
> > > >
> > > > Where a regex is performed, and no check is made to whether it
> > > > succeeds or not, and then a dollar-digit variable is used.
> > > >
> > > > Does it seem like a reasonable idea to make regex matching in void
> > > > context produce a warning?
> > >
> > > Maybe the user wants to use just $&, $` or $':
> > >
> > > $string =~ /something/;
> > > print $&;
>
> But if the match fails, then $& will not be properly set. Without the
> user checking what =~// evaluates to (that is, the match operation was
> performed in void context), he cannot discover that the match has
> failed. If he doesn't know the match failed, and cannot know that $&
> isn't properly set.

Maybe he *knows* that the match always succeeds (e.g. because of
former matches) and wants only to extract something from the string.

Regards,
Slaven

h...@crypt.org

unread,
Dec 19, 2002, 7:36:30 AM12/19/02
to Benjamin Goldberg, perl5-...@perl.org
Benjamin Goldberg <gol...@earthlink.net> wrote:
:How horrible would it be to change

:
: m/$pattern/;
:Into:
: m/$pattern/ + 0;
:
:In those situations where it's alright for the match to fail, and you
:don't want to otherwise check whether or not it succeeded.

I'd find it fairly horrible.

Hugo

Benjamin Goldberg

unread,
Dec 19, 2002, 6:57:43 PM12/19/02
to p5p, Abigail
Abigail wrote:
> Benjamin Goldberg wrote:
> > Abigail wrote:
> > >
> > > It also shouldn't warn if the dollar-digit variables aren't used -
> >
> > I hadn't suggested that the non-use of dollar-digit variables might
> > surpress the warning -- indeed, I hadn't even considered associating
> > the warning with the dollar-digit variables; rather, with the m//
> > operator itself.
> >
> > > a regex could be used for its other side-effects - even failing
> > > ones (consider the (?{ }), (??{ }) and (?(?{ })) constructs).
> >
> > What about regexen which don't have side-effects? Warn if the match
> > fails, and it doesn't have any of (?{ }), (??{ }) or (?(?{ })), and
> > it's in void context?
>
> But all regexes have side effects. If only because of $`, $& and $'.
> But also the digit variables - even a regex without parens, which will
> undef the digit variables on success.

But when a regex fails, it has no side effects unless it has (?{}) or
(??{}) in it.

A failing regex will not change $`, $&, $', or the digit variables.

> And then we have 'reset', the empty regex, '? ?', pos and \G, which
> can all be influenced by a regex in void context.

The empty regex is not influenced by a failing regex in void context,
only by a successful one.

A regex with ?? delimiters is not influenced by a failing regex in void
contexted, only by a successful one.

pos and \G are influenced by a failing regex in void context,
if-and-only-if that regex was done with the /g flag and without the /c
flag. (Note that I think it would be rather peculair to have an m//g
done in void context where it's ok to fail.)

The reset operator influences ??, but is not itself influenced by
anything (it doesn't return any value, afaik).

> Also, what to do with s/// in void context? That too can set $1 and
> friends, but s/// in void context is even more common than a match in
> void context.

The regex part of s/// isn't itself in void context -- it's success or
failure *is* checked, and influences whether or not a substitution takes
place.

> > > > Right now, I'm just interested in opinions on whether this
> > > > sounds like a good idea.
> > >
> > > I'm very sceptical towards warnings catering the 'clooless'
> > > programmer,
> >
> > But aren't *all* warnings aimed at 'clooless' programmers? I mean,
> > you or I would *never* accidentally leave a variable uninitialized,
> > right?
> >
> > > and warnings that require experience programmers to code
> > > around constructs, or to turn warnings off.
> >
> > Don't all warnings that one might provoke require experienced
> > programmers to code around the construct, or turn warnings off?
> >
> > Changing
> > my $x;
> > Into
> > my $x = "";
> >
> > To avoid an uninitalized warning is, in my book, coding around the
> > construct to avoid a warning. How horrible would it be to change
> >
> > m/$pattern/;
> > Into:
> > m/$pattern/ + 0;
> >
> > In those situations where it's alright for the match to fail, and
> > you don't want to otherwise check whether or not it succeeded.
>
> Quite horribly actually. It's confusing, and makes people wonder why
> on earth you're adding 0 to it.

Hmm, you're right. m/(?:$pattern)?/ would be better.

> I wouldn't at all compare that to initalize variables - that warning
> is only issued if you are actually using an uninitialized variable in
> operations where that doesn't make much sense. You won't get it on the
> 'my $x;' line.

Hmm.... You're right. Initalized variables isn't the right thing to
compare it to.

A better comparison might be:

use vars qw($foo);
use SomePackage qw($foo);

Where SomePackage doesn't have or inherit an import method. This
silently fails, giving the false impression that you have the
$SomePackage::foo variable imported into your namespace.

Similarly:
$something =~ /Some match that fails./;

Silently fails, giving the false impression that the dollar digit
variables, and the empty match, and the $`, $&, and $' variables have
all been set to something.

0 new messages