I would like to alter this to something like.
unix:.* (WARN|NOTICE) && ![^core]
I have been told that TCL cannot handle logical NOTs, but I thought
I'd just
check with the experts in this group for confirmation.
I tested the second expression above using awk and of course it
worked.
I have also tried
unix:.* (WARN|NOTICE).*[^core]{0}
I'm not sure if I have the syntax correct above, but what I'm trying
to say is
look for unix: then any characters followed by a space, then EITHER
WARN or NOTICE, then
any characters, but only when there a '0' (zero) occurances of core.
Hmm, hope someone understands what I'm trying to do. ;-)
Cheers
Craig.
Logical not is not part of any standard regular expression notation.
The regular languages are closed under complementation, which one can
see by taking a deterministic machine and swapping the accepting and
non-accepting states, but there isn't any straightforward
corresponding regular expression notation.
However, Tcl does support logical not at the expression level, so you
can do things like:
[regexp {unix:.* (WARN|NOTICE).*} $s] && ![regexp {unix:.* (WARN|
NOTICE).*core} $s]
where the first regular expression is more general and the second one
matches a subset of the strings matched by the first.
In the specific case you mention, though, a "regular expression" with
a negative forward assertion will work:
^unix:.* (WARN|NOTICE(?!core)$
The part (?!core) blocks matches ending in "core".
Oh, I should say that Tcl 7.5 is very old. I think that it probably
doesn't support assertions since the new regular expression package
came in with Tcl 8.0. Why on earth are you using such an ancient Tcl?
> I would like to alter this to something like.
> unix:.* (WARN|NOTICE) && ![^core]
I guess you do not mean "any character apart from c,e,o and r", right?
Because that's what [^core] means in a regular expression.
> I have been told that TCL cannot handle logical NOTs, but I thought
> I'd just
> check with the experts in this group for confirmation.
It can, just not in a single expression. You do
if {[regexp {.* ((WARN)|(NOTICE))} $String] && ![regexp core $String]} {
...
}
>
> I tested the second expression above using awk and of course it
> worked.
Not sure how you managed that. None of the expressions you posted
worked for the grep on /my/ machine.
>
> I have also tried
> unix:.* (WARN|NOTICE).*[^core]{0}
>
> I'm not sure if I have the syntax correct above, but what I'm trying
> to say is
> look for unix: then any characters followed by a space, then EITHER
> WARN or NOTICE, then
> any characters, but only when there a '0' (zero) occurances of core.
>
> Hmm, hope someone understands what I'm trying to do. ;-)
Lots of Greetings!
Volker
--
For email replies, please substitute the obvious.
> (WARN|NOTICE)
> I have a program that uses tcl7.5 regexp to find the following
> unix:.* (WARN|NOTICE)
> If you wanted to match either WARN or NOTICE the expression
> ought to be ((WARN)|(NOTICE)). Otherwise the only thing it
> matches is WARNOTICE.
Did you try it before answering?
(Documents) 1 % regexp {^(WARN|NOTICE)$} NOTICE
1
(Documents) 2 %
Ramon Ribó
Schelte.
--
set Reply-To [string map {nospam schelte} $header(From)]
It doesn't handle logical negation, since that's a surprisingly non-
trivial concept with REs for fairly deep reasons. However, it does
provide some techniques that can help. Most important among these is
the negative lookahead assertion, which is *almost* negation. But with
some important restrictions, the most important of which are that they
cannot have backreferences in (not that that matters this time) and
that they are always non-greedy. They're also quite slow (though we
mitigate that by putting them at the end of the RE).
The RE you are looking for is probably something like this:
unix:.*\m(WARN|NOTICE)\M(?!.*\mcore\M.*$).*
You'll want to make sure you put that in {braces}, of course. (The \m
and \M ensure that you don't get tripped up by words like "scores"...)
Donal (alas, no time to explain properly what this does today...)
[negative lookahead assertions]
>are always non-greedy. They're also quite slow
Indeed. Horribly slow. Can anyone give me any
idea why this is so, and what I can do to
ease the problem? I have an example (real,
practical) like this: Search for all occurrences
of a pattern which is two words followed by a
literal left parenthesis...
regexp -inline -all {\m\w+\s+\w+\s*\(} $str
This is great - it scans a 1MB $str, with about 2000
matches of the RE, in around half a second on my PC.
But now I add the criterion that the first of the
two words must not be the literal string "module";
this will reject around 150 of the 2000 matches...
{\m(?!module\M)\w+\s+\w+\s*\(}
and the execution time goes up by a factor of FIFTY.
Obviously, the pragmatic solution is to use my first
RE and then post-process all the matches; but why is
the second RE so very slow?
--
Jonathan Bromley, Consultant
DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services
Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan...@MYCOMPANY.com
http://www.MYCOMPANY.com
The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
Bill