Negated Perl Regexp

7 vues
Accéder directement au premier message non lu

Ronny

non lue,
30 mai 2006, 03:51:3630/05/2006
à
If I want to express that a variable $v does NOT match some regular
expression RE,
I usually write this as

$v !~ /RE/ and print "string does not contain pattern\n"

Is there an easy way to write this in a positive way, i.e using $v =~
/.../ ?

I thought about using some of the zero-width lookahead operators, such
as

$v =~ /($?RE)/ # DOES NOT WORK

but this does not work of course, because in general, somewhere within
$v *will* be a position where RE would not match, even if RE would
match
at some other position.


Background of what this is needed for: I'm writing tiny utilities in
Perl, which
act as a filter for input text. Typically, the core of the "program"
contains
something like

/$PATTERN/ && print(transform($_))

i.e. read all lines from stdin, and if they match some pattern, print
out a transformed
version of the line. The is supplied via ARGV. This works fine, but I
also would like
the user of this utility to be able to *revert* the sense (i.e. read
all lines from stdin,
and if they DO NOT match the pattern, etc.), like you have with grep
(where the
option -v reverts the test). The keypoint here is that in this
particular application,
I would prefer NOT to introduce an option such as grep's "-v" to my
utility, but encode
the "negation of the pattern" into the pattern itself.

Is this possible at all within the realm of Perl regular expressions,
or do I have
to invent my own workaround (which of course would be possible)?

Mirco Wahab

non lue,
30 mai 2006, 04:04:5930/05/2006
à
Thus spoke Ronny (on 2006-05-30 09:51):

> Typically, the core of the "program" contains something like
>
> /$PATTERN/ && print(transform($_))

> ...


> This works fine, but I also would like the user of this utility
> to be able to *revert* the sense (i.e. read all lines from stdin,
> and if they DO NOT match the pattern, etc.),

you mean

print(transform($_)) unless /$PATTERN/;

or something?

Regards

Mirco

Xicheng Jia

non lue,
30 mai 2006, 04:39:5630/05/2006
à
Ronny wrote:
> If I want to express that a variable $v does NOT match some regular
> expression RE,
> I usually write this as
>
> $v !~ /RE/ and print "string does not contain pattern\n"

you can use "or"

$v =~ /RE/ or print "string does not contain pattern\n";

For better maintenance, it might be better to write it in the following
form:

if (not $v =~ /RE/) {
print "string does not contain pattern\n";
}

Xicheng

Veli-Pekka Tätilä

non lue,
30 mai 2006, 06:49:0330/05/2006
à
Ronny wrote:
> $v !~ /RE/ and print "string does not contain pattern\n"
> Is there an easy way to write this in a positive way, i.e using $v =~
> /.../ ?
I have a related question here. One case which the so far posted solutions
don't address is the use of compiled regular expressions with the qr
operator. Many modules can take qr regular expressions for filtering or
homing in some particular datum. However, in some cases I'd like to use a
negated test for matching. I'm not really willing to extend the original
module code if I can avoid it. So, can one easily negate a qr-regexp when
the module code supposedly uses =~ for testing?

PS: The module in this case is:
Win32::IE::Mechanize

--
With kind regards Veli-Pekka Tätilä (vta...@mail.student.oulu.fi)
Accessibility, game music, synthesizers and programming:
http://www.student.oulu.fi/~vtatila/


Xicheng Jia

non lue,
30 mai 2006, 07:38:3030/05/2006
à
Veli-Pekka Tätilä wrote:
> Ronny wrote:
> > $v !~ /RE/ and print "string does not contain pattern\n"
> > Is there an easy way to write this in a positive way, i.e using $v =~
> > /.../ ?
> I have a related question here. One case which the so far posted solutions
> don't address is the use of compiled regular expressions with the qr
> operator. Many modules can take qr regular expressions for filtering or
> homing in some particular datum. However, in some cases I'd like to use a
> negated test for matching. I'm not really willing to extend the original
> module code if I can avoid it. So, can one easily negate a qr-regexp when
> the module code supposedly uses =~ for testing?

you want to match anything except those matching the qr//
expression???? so you might want to try the following:

my $RE = qr/something here/;

if ($v =~ /^(?:(?!$RE).)*$/) {
# any string $v that doesnot match $RE
}

(untested)
Xicheng

Ted Zlatanov

non lue,
30 mai 2006, 12:27:1030/05/2006
à
On 30 May 2006, ro.nald...@gmail.com wrote:

> Background of what this is needed for: I'm writing tiny utilities in
> Perl, which act as a filter for input text. Typically, the core of
> the "program" contains something like
>
> /$PATTERN/ && print(transform($_))
>
> i.e. read all lines from stdin, and if they match some pattern,
> print out a transformed version of the line. The is supplied via
> ARGV. This works fine, but I also would like the user of this
> utility to be able to *revert* the sense (i.e. read all lines from
> stdin, and if they DO NOT match the pattern, etc.), like you have
> with grep (where the option -v reverts the test).

> The keypoint here is that in this particular application, I would
> prefer NOT to introduce an option such as grep's "-v" to my utility,
> but encode the "negation of the pattern" into the pattern itself.

You either ask the user to rewrite $PATTERN, or you give a -v option.
I don't understand how you would know *when* to negate the pattern
without a -v option.

> Is this possible at all within the realm of Perl regular
> expressions, or do I have to invent my own workaround (which of
> course would be possible)?

Yes usually (for example, it may not work nicely if you have code
embedded inside the regex, and there are many cases that are possible
but computationally very expensive), but it's much more complicated to
invert a regex than to invert the test for that regex.

I honestly don't see a reason why you shouldn't provide a -v option,
or some way for the user to say "invert this pattern", and then act
upon that to invert the test. Maybe you can explain...

Ted

ska

non lue,
31 mai 2006, 04:34:3131/05/2006
à
Ronny wrote:
> particular application,
> I would prefer NOT to introduce an option such as grep's "-v" to my
> utility, but encode
> the "negation of the pattern" into the pattern itself.

Hmm, never played with this stuff in that depth before, but how about
this:

$yes = 'RE';
$no = 'RR';

foreach $s (qw/RE RERE aRERE aaaaRERE REa REREaa/) {
$s =~ /^(?!.*${yes})/ and print "$yes FAILs on $s\n";
$s =~ /^(?!.*${no})/ and print "$no neg-matches $s\n" or print
"$no FAILS on $s\n";
}

So: ^ -> matches any string (hopefully :-)
(?!.*<<pattern>>) --> the start of the string MUST NOT be followed
by your pattern at any distance from the start

Dunno about efficiency.

BTW: If you want to have an human enter the pattern, I'd go with the
flag-alternative, e.g. the first character of the pattern is + ->
positive, - -> negative match.

-- ska

Ronny

non lue,
31 mai 2006, 06:37:5131/05/2006
à

Ted Zlatanov schrieb:

> On 30 May 2006, ro.nald...@gmail.com wrote:
>
> > Background of what this is needed for: I'm writing tiny utilities in
> > Perl, which act as a filter for input text. Typically, the core of
> > the "program" contains something like
> >
> > /$PATTERN/ && print(transform($_))
> >
> > i.e. read all lines from stdin, and if they match some pattern,
> > print out a transformed version of the line. The is supplied via
> > ARGV. This works fine, but I also would like the user of this
> > utility to be able to *revert* the sense (i.e. read all lines from
> > stdin, and if they DO NOT match the pattern, etc.), like you have
> > with grep (where the option -v reverts the test).
>
> > The keypoint here is that in this particular application, I would
> > prefer NOT to introduce an option such as grep's "-v" to my utility,
> > but encode the "negation of the pattern" into the pattern itself.
>
> You either ask the user to rewrite $PATTERN, or you give a -v option.
> I don't understand how you would know *when* to negate the pattern
> without a -v option.

You exactly got the point: I want the user to rewrite the Pattern. The
question
is, how to write a *negated* pattern using Perl RE Syntax?

To the outside world (i.e. to the user), the interface always says kind
of
"Supply a pattern and you get a list of lines matching the pattern"
(actually,
the lines returned are transformed, but this is not the point here).
Given
*this* user interface, is it possible for the user to specify a pattern
with
negated meaning - for example, return all lines which do NOT contain
the string "foo"?

A variation of this question could be: Return all the lines which do
contain
the string "foo" and "bar", but ONLY if they do not contain "baz"
somewhere
between "foo" and "bar". I.e. the lines

...foo.......bar......baz... (OK, baz after bar)
...baz......foo......bar.... (OK, baz before bar)
...foo..................bar... (OK, no baz)

should match, but the lines

...foo........baz......bar... (baz between foo and bar)
...foo........................... (bar missing)
...bar........................... (foo missing)

should not match. Is it possible to express THIS using perl regexp,
or do I break here the power of Perl regular expressions? If there
is a solution to this foo/bar/baz problem, then there is obviously
one for my original problem as well.

> > Is this possible at all within the realm of Perl regular
> > expressions, or do I have to invent my own workaround (which of
> > course would be possible)?
>
> Yes usually (for example, it may not work nicely if you have code
> embedded inside the regex, and there are many cases that are possible
> but computationally very expensive), but it's much more complicated to
> invert a regex than to invert the test for that regex.

Of course, one hack for my original problem would be to "invent" a
special
character (say, exclamation mark) which is allowed to be at the very
start
of the expession, and just has the meaning "pattern has negated
meaning".
My Perl code would then be:

if($pattern =~ /^!(.*)$/)
{
# negated meaning
$pattern=$1; # drop ! from pattern
print transform($line) unless($line =~ $pattern)
}
else
{
print transform($line) if ($line =~ $pattern)
}

This would do the job (and the exclamation mark here is just a "-v"
switch
in disguise), but I wondered whether the same effect could also be
achieved
by just changing the pattern in a suitable way.

> I honestly don't see a reason why you shouldn't provide a -v option,

The reason is because I simplified the problem very much so to make
it better feasible to discuss here. The interesting point for me is not
finding out whether the negation effect can be done solely within the
pattern, or has to be "moved outside" to the distinction between
=~ and !~, or if/unless construct.

I have read the man pages about pattern "negation" (such as it occurs
in the "negative lookahead pattern"), but I did not see whether they
could
be applied to my case.

Ronald

Ronny

non lue,
31 mai 2006, 06:49:5131/05/2006
à

Mirco Wahab schrieb:

No, the corresponding code would always be as stated. I think I did not
explain my problem in a very understandable way. See my reply to Ted
for a more elaborate explanation.

Maybe here a more mathematical formulation of the problem:

Given an arbitrary Perl regexp P, is it then possible to derive from it
another
regexp Q, with the property that for every string S the following
equation holds:

(S =~ P) == (S !~ Q)

(S matches P if S does not match Q, and vice versa).

I.e. is there a general mechanism within the Perl regexp realm which
allows
me to find a negated pattern for a given pattern?

Of course this is easy for specific pattern. For example, assume that P
is
the pattern

[abc]

which means "every line which either contains at least one a, b or c
somewhere".
The negated pattern Q, "every line which contains neither a, b or c" is
then

^[^abc]+$

In this example, I have kind of "handcrafted" the negated pattern after
having
investigated the original pattern. For the [abc] case, it was easy to
find the
negated pattern, but in general, this might be hard, so I wondered
whether
Perl provided a specific construct which just negates a pattern.

Ronald

Ronny

non lue,
31 mai 2006, 08:25:4231/05/2006
à

Xicheng Jia schrieb:

> my $RE = qr/something here/;
>
> if ($v =~ /^(?:(?!$RE).)*$/) {
> # any string $v that doesnot match $RE
> }

Great! I think this is something I could use for *my* original problem
too!

Thank you for pointing this out!

Ronny

Mumia W.

non lue,
31 mai 2006, 09:27:0631/05/2006
à
Ronny wrote:
> [...]

> Maybe here a more mathematical formulation of the problem:
>
> Given an arbitrary Perl regexp P, is it then possible to derive from it
> another
> regexp Q, with the property that for every string S the following
> equation holds:
>
> (S =~ P) == (S !~ Q)
>
> (S matches P if S does not match Q, and vice versa).
>
> I.e. is there a general mechanism within the Perl regexp realm which
> allows
> me to find a negated pattern for a given pattern?
>

I don't think so, and given the complexity of RE's, it's probably
impossible. But all is not lost.

You could do what (Debian) aptitude does: Let the user place a prefix
code in the RE that specifies inversion, e.g.:

aptitude search '~niso-8859!~nbase'

This searches for all Debian packages that have the string iso-8859 in
their names, but excludes any that have 'base' in their names.

~n introduces an RE to match package names.
!~n introduces an RE to *not* match package names.

> Of course this is easy for specific pattern. For example, assume that P
> is
> the pattern
>
> [abc]
>
> which means "every line which either contains at least one a, b or c
> somewhere".
> The negated pattern Q, "every line which contains neither a, b or c" is
> then
>
> ^[^abc]+$
>
> In this example, I have kind of "handcrafted" the negated pattern after
> having
> investigated the original pattern. For the [abc] case, it was easy to
> find the

> negated pattern, but in general, this might be hard, [...]

Depending on the pattern, it might be so hard, supercomputers would take
eternity to do it.


Mumia W.

non lue,
31 mai 2006, 12:06:1431/05/2006
à
Xicheng Jia wrote:
> [...]

> you want to match anything except those matching the qr//
> expression???? so you might want to try the following:
>
> my $RE = qr/something here/;
>
> if ($v =~ /^(?:(?!$RE).)*$/) {
> # any string $v that doesnot match $RE
> }
>
> (untested)
> Xicheng
>

Well, I tested it, and it seems pretty darn good, and just like Ronny, I
might end up using this in my programs if I can figure out how it works.
Thanks Xicheng.

Ted Zlatanov

non lue,
31 mai 2006, 12:52:4731/05/2006
à
On 31 May 2006, ro.nald...@gmail.com wrote:

> You exactly got the point: I want the user to rewrite the
> Pattern. The question is, how to write a *negated* pattern using
> Perl RE Syntax?

You can do it for some cases, but because of limitations on memory and
CPU cycles, most complex regexes can't be inverted in a reasonable
amount of time. When there's code inside, it gets even worse.

Look at the book "Higher-Order Perl" by Mark-Jason Dominus. It has a
long section on finding all the strings that can match a given regular
expression; if you read it carefully you'll see why inverting a
regular expression is generally a hard problem, just as producing all
the strings that match it.

Note also that if security is a concern, giving users regexp access is
equivalent to letting them run any code due to the code escapes
possible in Perl's regex interpreter. It may be simpler to give the
users a limited language with a NOT operator. Parse::RecDescent has
some good examples of this kind of parser in the distribution. The
users may also prefer this to the raw power of regexps, and it's what
I would do for a production system.

> Of course, one hack for my original problem would be to "invent" a
> special character (say, exclamation mark) which is allowed to be at
> the very start of the expession, and just has the meaning "pattern
> has negated meaning".

Yes :) That would be easiest.

>> I honestly don't see a reason why you shouldn't provide a -v option,
>
> The reason is because I simplified the problem very much so to make
> it better feasible to discuss here. The interesting point for me is
> not finding out whether the negation effect can be done solely
> within the pattern, or has to be "moved outside" to the distinction
> between =~ and !~, or if/unless construct.

It should be moved outside, so you can go on to finish the project :)

Ted

Brian McCauley

non lue,
31 mai 2006, 13:54:4631/05/2006
à

Xicheng Jia wrote:

> my $RE = qr/something here/;
>
> if ($v =~ /^(?:(?!$RE).)*$/) {
> # any string $v that doesnot match $RE
> }
>

I've not benchmarked it but I'd suspect that's less efficient than the
usual answer[1] the OP would have found if he'd been bothered to type
"negate regex" into a Usenet search engine on this newsgroup.

[1] The on ska gave.

Xicheng Jia

non lue,
31 mai 2006, 16:39:1231/05/2006
à

Here is an old post from Tom Christensen which might best address this
problem:

http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/cf66e7281514182f/7af7898218075b5b?q=negate+regex&rnum=3#7af7898218075b5b

while the notion of (?:(?!$RE).)* to match anything except $RE(as far
as I can know) is from Jeffery's book "Mastering Regular Expression".

HTH,
Xicheng

Ronny

non lue,
1 juin 2006, 03:49:3401/06/2006
à

> I've not benchmarked it but I'd suspect that's less efficient than the
> usual answer[1] the OP would have found if he'd been bothered to type
> "negate regex" into a Usenet search engine on this newsgroup.

Point taken!

Ronald

Ted Zlatanov

non lue,
1 juin 2006, 10:17:1001/06/2006
à

This post does not mention that negating some regexes is
computationally prohibitive, and code escapes are a problem. Also,
the "Higher-Order Perl" book I mentioned came out after that post
(1999), and has some very interesting information in the chapter on
generating all the possible strings a regex can match. There's
security considerations when you allow a user to provide you with a
regex. None of those things is answered by a naive Usenet search.

Furthermore, the real question was "why doesn't the OP want a -v flag?
How can he simulate it instead?" and not "how to negate a regex."
Usually that's the case when people ask for negating a regex, btw.

Ted

Répondre à tous
Répondre à l'auteur
Transférer
0 nouveau message