Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Stupid regex problem, s/// catching extra letter

16 views
Skip to first unread message

Jason C

unread,
Jul 18, 2012, 12:01:58 AM7/18/12
to
I know better than to work late at night, but sometimes it just can't be helped :-)

I'm doing a simple s///, converting "www." to "http://www." when "www." occurs without a preceding "http://". Here's what I'm doing:

$text = "www.example.com";
$text =~ s#[^(http://)]www\.#http://www\.#gi;
print $text;

If $text is this, though:

$text = "<div>www.example.com</div>";

the regex is catching the > in <div>, printing:

<divhttp://www.example.com</div>

Where am I screwing up?

Christian Winter

unread,
Jul 18, 2012, 12:57:00 AM7/18/12
to
You don't want to use a character class (square brackets).
[^(http://)] tells perl to look for any character not listed
inside the square brackets after the negation (^), so this
might as well read [^)(/:hpt].

What you're trying to do is a zero width negative look-behind
assertion.
s#(?<!http://)www\.#http://www.#gi should do the trick.
The "(?<!...)" tells the regex engine to only match the following
pattern if it is not preceded by the pattern in the look-behind,
without capturing anything.

"perldoc perlre" has good explanations for character classes
and look-around assertions.

-Chris

Jason C

unread,
Jul 18, 2012, 1:05:20 AM7/18/12
to
On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
> What you're trying to do is a zero width negative look-behind
> assertion.
> s#(?<!http://)www\.#http://www.#gi should do the trick.
> The "(?<!...)" tells the regex engine to only match the following
> pattern if it is not preceded by the pattern in the look-behind,
> without capturing anything.
>
> "perldoc perlre" has good explanations for character classes
> and look-around assertions.
>
> -Chris

Thanks for the help, Chris. Character classes aren't exactly intuitive when a symbol changes definition completely based on context, so I'm still struggling with that a little.

The modification you suggested was perfect, though! Thanks again :-)

Rainer Weikusat

unread,
Jul 18, 2012, 8:30:56 AM7/18/12
to
A character class denotes an unordered set of characters, meaning

[^http://]
[^htp:/]
[^:pppppth/]
[^:/hpt]
[^h:t/p]

all represent identical sets and they all match a single character.
But you wanted to match the string http:// and a regex matching a
string is just the string itself, IOW, THIS sequence of characters.
0 new messages