JC> On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:
>> why do you think the # marks the start of a regex? only if you use m//
>> can you change the regex delim from /.
JC> Thanks to you, too, Uri. Like I replied to Ben a second ago, I
JC> thought that since you could replace the delimiter in s/// ad hoc,
JC> that you could in m//, too. Learn something new every day! :-)
but s/// has the s to mark the next char. =~ ## has no leading marker so it
would just be a comment. also using # for the delimiter is just a bad
idea as it confuses many readers.
>> finally,
>> why are you parsing out urls with a regex when there are modules that do
>> it correctly?
JC> Two reasons:
JC> 1. I've been working with regex for a year or two, and while it's
JC> by no means a strong point in my vocabulary (yet), I'm at least
JC> familiar enough with it to usually figure it out.
good that you are studying them but it still is the wrong tool for
this. learning when regexes aren't a good solution is part of learning
regexes.
JC> 2. I briefly looked for a module that would handle this correctly,
JC> but wasn't sure what to look for. And, I'm not sure that it
JC> warrants the including of a full module if it could potentially be
JC> done in a simple regex. If you can recommend a module that would
JC> be more stable and/or faster than what I'm doing, though, then I
JC> would definitely appreciate the reference!
JC> FWIW, this modification did work:
JC> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
it will fail if the opening quote is " and the string has a ' inside
it. perfectly legal html but you can't parse it that way.
JC> Admittedly, I'm not sure why $2 is stored long enough for the if()
JC> statement, but inside of the if() statement it's empty. Storing
JC> them to a different variable worked for this purpose, but if
JC> there's a better way, I'm very much open to it.
you need to read more about regexes and the $1 stuff. they live until
the next regex is run (they are global).
uri