python regex "negative lookahead assertions" problems

Jelle Smet

unread,

Nov 22, 2009, 8:58:09 AM11/22/09

to pytho...@python.org

Hi List,

I'm trying to match lines in python using the re module.
The end goal is to have a regex which enables me to skip lines which have ok and warning in it.
But for some reason I can't get negative lookaheads working, the way it's explained in "http://docs.python.org/library/re.html".

Consider this example:

Python 2.6.4 (r264:75706, Nov 2 2009, 14:38:03)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> line='2009-11-22 12:15:441 lmqkjsfmlqshvquhsudfhqf qlsfh qsduidfhqlsiufh qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf lqsuhf lqksjfhqisudfh qiusdfhq iusfh'
>>> re.match('.*(?!warning)',line)
<_sre.SRE_Match object at 0xb75b1598>

I would expect that this would NOT match as it's a negative lookahead and warning is in the string.

Thanks,

--
Jelle Smet
http://www.smetj.net

Tim Chase

unread,

Nov 22, 2009, 9:52:39 AM11/22/09

to je...@smetj.net, pytho...@python.org

>>>> import re
>>>> line='2009-11-22 12:15:441 lmqkjsfmlqshvquhsudfhqf qlsfh qsduidfhqlsiufh qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf lqsuhf lqksjfhqisudfh qiusdfhq iusfh'
>>>> re.match('.*(?!warning)',line)
> <_sre.SRE_Match object at 0xb75b1598>
>
> I would expect that this would NOT match as it's a negative lookahead and warning is in the string.

This first finds everything (".*") and then asserts that
"warning" doesn't follow it, which is correct in your example.
You may have to assert that "warning" doesn't exist at every
point along the way:

re.match(r'(?:(?!warning).)*',line)

which will match up-to-but-not-including the "warning" text. If
you don't want it at all, you'd have to also anchor the far end

re.match(r'^(?:(?!warning).)*$',line)

but in the 2nd case I'd just as soon invert the test:

if 'warning' not in line:
do_stuff()

-tkc

Helmut Jarausch

unread,

Nov 22, 2009, 10:05:47 AM11/22/09

to

'.*' eats all of line. Now, when at end of line, there is no 'warning' anymore, so it matches.
What are you trying to achieve?

If you just want to single out lines with 'ok' or warning in it, why not just
if re.search('(ok|warning)') : call_skip

Helmut.

--
Helmut Jarausch

Lehrstuhl fuer Numerische Mathematik
RWTH - Aachen University
D 52056 Aachen, Germany

MRAB

unread,

Nov 22, 2009, 11:32:49 AM11/22/09

to pytho...@python.org

Tim Chase wrote:
>>>>> import re
>>>>> line='2009-11-22 12:15:441 lmqkjsfmlqshvquhsudfhqf qlsfh
>>>>> qsduidfhqlsiufh qlsiuf qldsfhqlsifhqlius dfh warning qlsfj lqshf
>>>>> lqsuhf lqksjfhqisudfh qiusdfhq iusfh'
>>>>> re.match('.*(?!warning)',line)
>> <_sre.SRE_Match object at 0xb75b1598>
>>
>> I would expect that this would NOT match as it's a negative lookahead
>> and warning is in the string.
>

> This first finds everything (".*") and then asserts that "warning"
> doesn't follow it, which is correct in your example. You may have to
> assert that "warning" doesn't exist at every point along the way:
>
> re.match(r'(?:(?!warning).)*',line)
>
> which will match up-to-but-not-including the "warning" text. If you
> don't want it at all, you'd have to also anchor the far end
>
> re.match(r'^(?:(?!warning).)*$',line)
>
> but in the 2nd case I'd just as soon invert the test:
>
> if 'warning' not in line:
> do_stuff()
>

The trick is to think what positive lookahead you'd need if you wanted
check whether 'warning' is present:

'(?=.*warning)'

and then negate it:

'(?!.*warning)'

giving you:

re.match(r'(?!.*warning)', line)

Helmut Jarausch

unread,

Nov 23, 2009, 4:20:30 AM11/23/09

to

Probably you don't want words like 'joke' to match 'ok'.
So, a better regex is

if re.search('\b(ok|warning)\b',line) : SKIP_ME