Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

REGEX "not contains"

15 views
Skip to first unread message

Brent

unread,
Mar 3, 2008, 10:14:11 AM3/3/08
to
I have written a small regex statement that finds an href anchor,
whatever text is used to describe it, and a date present on the other
side of the anchor. This text might look something like this:

<a href="http://somelink.com">The description</a> (12/29/2007 10:34
PM)

However, HTML being what it is, I can't be guaranteed that there
aren't more spaces and other bits of text than appear in this example,
so I'm having to write a fairly expansive regex statement. Right now
the statement is this:

<a.*?href="(?<url>.*?)".*?>(?<tag>.*?)</a>.*?\((?<date>\d{2}/\d{2}/
\d{4}\s\d{2}:\d{2}\s\w{2})\)

The problem is that the statement finds the first <a href> in the
document, followed by the first date it comes across. That first match
contains a lot of text, including other hrefs, especially the href-
date combo I'm hoping it would find.

Is there any way to tell the statement to ignore a match if it
contains an href? I tried adding a (?!http:\/\/), but that statement
only seems to work for a prefix, and not for text in the middle of a
match.

Thanks for any help.

--Brent

0 new messages