Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Can't find a syntax error, hoping a second set of eyes will help
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  25 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Jason C  
View profile  
 More options Sep 24 2012, 12:09 am
Newsgroups: comp.lang.perl.misc
From: Jason C <jwcarl...@gmail.com>
Date: Sun, 23 Sep 2012 21:09:43 -0700 (PDT)
Local: Mon, Sep 24 2012 12:09 am
Subject: Can't find a syntax error, hoping a second set of eyes will help
Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy :-)

while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
  if ($2 =~ /^http/i) {
    $text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi;
  }

}

The error is on the while() line (at least, I remove it and no more error). The error just says:

syntax error at blah.cgi line 239, near "if"
syntax error at blah.cgi line 246, near "}"

The purpose of the function is to remove the <a href=...></a> code in submitted text, but only if the linked text begins with http.

TIA,

Jason


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 24 2012, 1:03 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Mon, 24 Sep 2012 05:52:55 +0100
Local: Mon, Sep 24 2012 12:52 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth Jason C <jwcarl...@gmail.com>:

> Can someone look at this and tell me what I'm messing up? I've been
> coding all night, and my eyes have gone fuzzy :-)

> while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

                 ^^ m

(I would suggest finding a highlighting editor. It makes this sort of
syntactic mistake much easier to spot.)

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Uri Guttman  
View profile  
 More options Sep 24 2012, 1:23 am
Newsgroups: comp.lang.perl.misc
From: Uri Guttman <u...@stemsystems.com>
Date: Mon, 24 Sep 2012 01:22:39 -0400
Local: Mon, Sep 24 2012 1:22 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

>>>>> "JC" == Jason C <jwcarl...@gmail.com> writes:

  JC> Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy :-)
  JC> while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {

why do you think the # marks the start of a regex? only if you use m//
can you change the regex delim from /.
and ^ will not invert a char class for \1 as \1 isn't a char class
element. so even if you fix the regex delim, that will fail. finally,
why are you parsing out urls with a regex when there are modules that do
it correctly?

uri


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason C  
View profile  
 More options Sep 24 2012, 5:28 am
Newsgroups: comp.lang.perl.misc
From: Jason C <jwcarl...@gmail.com>
Date: Mon, 24 Sep 2012 02:28:11 -0700 (PDT)
Local: Mon, Sep 24 2012 5:28 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

On Monday, September 24, 2012 1:03:03 AM UTC-4, Ben Morrow wrote:

> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
>                  ^^ m

> (I would suggest finding a highlighting editor. It makes this sort of
> syntactic mistake much easier to spot.)

Thanks, Ben. I didn't realize the m//; was required; since you can change the delimiter with s/// ad hoc, I thought you could here, too.

I'm using Notepad++, and while it helps me catch opening and ending brackets, it didn't do a lot in recognizing syntax errors (at least, not that I know of). What editor do you recommend?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason C  
View profile  
 More options Sep 24 2012, 5:35 am
Newsgroups: comp.lang.perl.misc
From: Jason C <jwcarl...@gmail.com>
Date: Mon, 24 Sep 2012 02:35:19 -0700 (PDT)
Local: Mon, Sep 24 2012 5:35 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:
> why do you think the # marks the start of a regex? only if you use m//
> can you change the regex delim from /.

Thanks to you, too, Uri. Like I replied to Ben a second ago, I thought that since you could replace the delimiter in s/// ad hoc, that you could in m//, too. Learn something new every day! :-)

> and ^ will not invert a char class for \1 as \1 isn't a char class
> element. so even if you fix the regex delim, that will fail.

Oh. Now THAT I did NOT know at all! It does explain a few other errors I've had, though, and couldn't figure out.

> finally,
> why are you parsing out urls with a regex when there are modules that do
> it correctly?

Two reasons:

1. I've been working with regex for a year or two, and while it's by no means a strong point in my vocabulary (yet), I'm at least familiar enough with it to usually figure it out.

2. I briefly looked for a module that would handle this correctly, but wasn't sure what to look for. And, I'm not sure that it warrants the including of a full module if it could potentially be done in a simple regex. If you can recommend a module that would be more stable and/or faster than what I'm doing, though, then I would definitely appreciate the reference!

FWIW, this modification did work:

while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
  $pattern = $1$2$3;
  $repl = $2;

  if ($2 =~ /^http/i) {
    $text =~ s/$pattern/$repl/gsi;
  }

}

Admittedly, I'm not sure why $2 is stored long enough for the if() statement, but inside of the if() statement it's empty. Storing them to a different variable worked for this purpose, but if there's a better way, I'm very much open to it.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Peter Makholm  
View profile  
 More options Sep 24 2012, 5:49 am
Newsgroups: comp.lang.perl.misc
From: Peter Makholm <pe...@makholm.net>
Date: Mon, 24 Sep 2012 11:49:31 +0200
Local: Mon, Sep 24 2012 5:49 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Jason C <jwcarl...@gmail.com> writes:
>> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
>>                  ^^ m

> Thanks, Ben. I didn't realize the m//; was required; since you can
> change the delimiter with s/// ad hoc, I thought you could here, too.

You can change the delimiter, but the m is only optional when you use
the // delimiters.

//Makholm


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marc Girod  
View profile  
 More options Sep 24 2012, 6:30 am
Newsgroups: comp.lang.perl.misc
From: Marc Girod <marc.gi...@gmail.com>
Date: Mon, 24 Sep 2012 03:30:37 -0700 (PDT)
Local: Mon, Sep 24 2012 6:30 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help
On Sep 24, 10:28 am, Jason C <jwcarl...@gmail.com> wrote:

> What editor do you recommend?

GNU emacs with cperl-mode

Marc


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
anotheranne  
View profile  
 More options Sep 24 2012, 8:19 am
Newsgroups: comp.lang.perl.misc
From: anotheranne <anothera...@nowhere.com>
Date: Mon, 24 Sep 2012 12:19:23 +0000 (UTC)
Local: Mon, Sep 24 2012 8:19 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Jason C wrote:
> Can someone look at this and tell me what I'm messing up? I've been coding all night, and my eyes have gone fuzzy :-)

> while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
>   if ($2 =~ /^http/i) {
>     $text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi;
>   }
> }

Whatever other errors your regex may have, I would suggest that
you stick with the regular m// and s/// constructs. You should of
course then escape the '/' in </a> . Changing this should make it run.

Don't use # as an eye-easy replacement for / because a) it is the perl
character for a comment, and b) in a regex (at least with the /x
modifier) it is also a metacharacter.  Trouble will come your way if
you use this.

If you do want to get away from // and /// then use balanced
delimiters like m{} and s{}{} . See p319 in Friedl MASTERING REGULAR
EXPRESSIONS. O'Reilly.

When use use any alternate to m// the m is then mandatory. Only when
using // can you omit the m. thus // or m{} are valid constructs.

Also you can remove the ';' after the gsi

hope this helps.

anotheranne


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
anotheranne  
View profile  
 More options Sep 24 2012, 8:42 am
Newsgroups: comp.lang.perl.misc
From: anotheranne <anothera...@nowhere.com>
Date: Mon, 24 Sep 2012 12:42:09 +0000 (UTC)
Local: Mon, Sep 24 2012 8:42 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Jason C wrote:
> On Monday, September 24, 2012 1:03:03 AM UTC-4, Ben Morrow wrote:

>> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
>>                  ^^ m

>> (I would suggest finding a highlighting editor. It makes this sort of
>> syntactic mistake much easier to spot.)

> Thanks, Ben. I didn't realize the m//; was required; since you can change the delimiter with s/// ad hoc, I thought you could here, too.

> I'm using Notepad++, and while it helps me catch opening and ending brackets, it didn't do a lot in recognizing syntax errors (at least, not that I know of). What editor do you recommend?

Padre is a nice perl IDE.

http://padre.perlide.org/

anotheranne


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 24 2012, 10:48 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Mon, 24 Sep 2012 15:37:47 +0100
Local: Mon, Sep 24 2012 10:37 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth Peter Makholm <pe...@makholm.net>:

> Jason C <jwcarl...@gmail.com> writes:

> >> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
> >>                  ^^ m

> > Thanks, Ben. I didn't realize the m//; was required; since you can
> > change the delimiter with s/// ad hoc, I thought you could here, too.

> You can change the delimiter, but the m is only optional when you use
> the // delimiters.

Or ??, but that has special semantics.

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 24 2012, 11:03 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Mon, 24 Sep 2012 15:53:15 +0100
Local: Mon, Sep 24 2012 10:53 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth anotheranne <anothera...@nowhere.com>:

> Jason C wrote:

> > Can someone look at this and tell me what I'm messing up? I've been
> coding all night, and my eyes have gone fuzzy :-)

> > while ($text =~ #<a[^>]* href=(["'])*[^\1>]*\1[^>]*?>(.*?)</a>#gsi) {
> >   if ($2 =~ /^http/i) {
> >     $text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi;
> >   }
> > }

> Whatever other errors your regex may have, I would suggest that
> you stick with the regular m// and s/// constructs. You should of
> course then escape the '/' in </a> . Changing this should make it run.

That's a bad idea. Perl has changable delimiters for a reason: to avoid
huge unreadable nests of /\/\\/.

> Don't use # as an eye-easy replacement for / because a) it is the perl
> character for a comment, and b) in a regex (at least with the /x
> modifier) it is also a metacharacter.  Trouble will come your way if
> you use this.

Nonsense. Perl is perfectly capable of getting this right.

> Also you can remove the ';' after the gsi

...but that would probably also be a bad idea.

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 24 2012, 11:03 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Mon, 24 Sep 2012 15:48:28 +0100
Local: Mon, Sep 24 2012 10:48 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth Jason C <jwcarl...@gmail.com>:

> On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:

> > finally,
> > why are you parsing out urls with a regex when there are modules that do
> > it correctly?

> Two reasons:

> 1. I've been working with regex for a year or two, and while it's by no
> means a strong point in my vocabulary (yet), I'm at least familiar
> enough with it to usually figure it out.

> 2. I briefly looked for a module that would handle this correctly, but
> wasn't sure what to look for.

HTML::LinkExtor, probably, depending on what you're trying to do.
Perhaps one of the other HTML::Parser-based modules.

> FWIW, this modification did work:

> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
>   $pattern = $1$2$3;

              ^^    ^^
I think not...

>   $repl = $2;

>   if ($2 =~ /^http/i) {
>     $text =~ s/$pattern/$repl/gsi;

This almost certainly doesn't do what you think. If nothing else, you
want to \Q $pattern. What are you trying to do here: strip tags? Why not
just do one s/// (or, you know, use a module)?

>   }
> }

> Admittedly, I'm not sure why $2 is stored long enough for the if()
> statement, but inside of the if() statement it's empty. Storing them to
> a different variable worked for this purpose, but if there's a better
> way, I'm very much open to it.

The $N variables last until the next successful pattern match. In this
case, the '$2 =~ /^http/i' in the condition of the if clears them all
(even though it doesn't capture anything).

In general I prefer to assign captures to real variables right away:

    while (my ($tag, $url) = m#(<a...>(.*?)</a>)#gsi) {

(notice also that captures can be nested, and DTRT).

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Scott Bryce  
View profile  
 More options Sep 24 2012, 11:11 am
Newsgroups: comp.lang.perl.misc
From: Scott Bryce <sbr...@scottbryce.com>
Date: Mon, 24 Sep 2012 09:11:20 -0600
Local: Mon, Sep 24 2012 11:11 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help
On 9/24/2012 3:28 AM, Jason C wrote:

> I'm using Notepad++,

I assume that means you are on a Windows box.

> What editor do you recommend?

I like UltraEdit.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 24 2012, 1:03 pm
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Mon, 24 Sep 2012 17:54:04 +0100
Local: Mon, Sep 24 2012 12:54 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth Jason C <jwcarl...@gmail.com>:

> > (I would suggest finding a highlighting editor. It makes this sort of
> > syntactic mistake much easier to spot.)

> I'm using Notepad++, and while it helps me catch opening and ending
> brackets, it didn't do a lot in recognizing syntax errors (at least, not
> that I know of). What editor do you recommend?

Personally I use Vim, which runs on Unix/Mac/Windows, but it takes a
little getting used to. The GUI version (which is probably what you
would use on Windows) has menus and mouse support as you would expect,
and there is an 'easy' mode which makes it behave more like a
Windows-style point-and-type editor, but I'm not sure I see the point of
using a programmer's editor if you're not going to learn to use it
properly.

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Uri Guttman  
View profile  
 More options Sep 24 2012, 3:44 pm
Newsgroups: comp.lang.perl.misc
From: Uri Guttman <u...@stemsystems.com>
Date: Mon, 24 Sep 2012 15:43:42 -0400
Local: Mon, Sep 24 2012 3:43 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

>>>>> "JC" == Jason C <jwcarl...@gmail.com> writes:

  JC> On Monday, September 24, 2012 1:23:40 AM UTC-4, Uri Guttman wrote:
  >> why do you think the # marks the start of a regex? only if you use m//
  >> can you change the regex delim from /.

  JC> Thanks to you, too, Uri. Like I replied to Ben a second ago, I
  JC> thought that since you could replace the delimiter in s/// ad hoc,
  JC> that you could in m//, too. Learn something new every day! :-)

but s/// has the s to mark the next char. =~ ## has no leading marker so it
would just be a comment. also using # for the delimiter is just a bad
idea as it confuses many readers.

  >> finally,
  >> why are you parsing out urls with a regex when there are modules that do
  >> it correctly?

  JC> Two reasons:

  JC> 1. I've been working with regex for a year or two, and while it's
  JC> by no means a strong point in my vocabulary (yet), I'm at least
  JC> familiar enough with it to usually figure it out.

good that you are studying them but it still is the wrong tool for
this. learning when regexes aren't a good solution is part of learning
regexes.

  JC> 2. I briefly looked for a module that would handle this correctly,
  JC> but wasn't sure what to look for. And, I'm not sure that it
  JC> warrants the including of a full module if it could potentially be
  JC> done in a simple regex. If you can recommend a module that would
  JC> be more stable and/or faster than what I'm doing, though, then I
  JC> would definitely appreciate the reference!

  JC> FWIW, this modification did work:

  JC> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {

it will fail if the opening quote is " and the string has a ' inside
it. perfectly legal html but you can't parse it that way.

  JC> Admittedly, I'm not sure why $2 is stored long enough for the if()
  JC> statement, but inside of the if() statement it's empty. Storing
  JC> them to a different variable worked for this purpose, but if
  JC> there's a better way, I'm very much open to it.

you need to read more about regexes and the $1 stuff. they live until
the next regex is run (they are global).

uri


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason C  
View profile  
 More options Sep 24 2012, 8:54 pm
Newsgroups: comp.lang.perl.misc
From: Jason C <jwcarl...@gmail.com>
Date: Mon, 24 Sep 2012 17:54:33 -0700 (PDT)
Local: Mon, Sep 24 2012 8:54 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

On Monday, September 24, 2012 11:03:04 AM UTC-4, Ben Morrow wrote:
> > FWIW, this modification did work:

> > while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
> >   $pattern = $1$2$3;
>               ^^    ^^
> I think not...

Blah, sorry; that's what I get for trying to type of dummy code at 5am. In practice, I put it in quotes:

$pattern = "$1$2$3";

> >   if ($2 =~ /^http/i) {
> >     $text =~ s/$pattern/$repl/gsi;

> This almost certainly doesn't do what you think. If nothing else, you
> want to \Q $pattern.

Excellent point about \Q. What do you mean, though, that it doesn't do what I think?

> What are you trying to do here: strip tags?

Yes and no. I'm using a contenteditable instead of a textarea, and I've discovered that when someone copy-and-pastes an URL from Chrome or FF, it's automatically making the URL a link. Eg:

<a href="http://www.google.com">http://www.google.com</a>

But of course, if you just type the address, then it doesn't. So on my end, I was using URI::Find to convert addresses to links, and ending up with a mess like:

<a href="<a href="http://www.google.com">http://www.google.com</a>"><a href="http://www.google.com">http://www.google.com</a></a>

So, my goal here is to remove the <a href> tag, but only if the linked text is an URL.

> Why not
> just do one s/// (or, you know, use a module)?

I had originally tried doing it with a simple s///, but couldn't figure out how to make it conditional. Like this:

$text =~ s#<a[^>]*? href=(["'])*([^\1>]*)\1[^>]*?>(.*?)</a>#$2#gsi
  if ($3 =~ /^http/i);

This worked correctly if I removed the if() statement. In testing, I changed the replacement to:

1 - $1, 2 - $2, 3 - $3

just to make sure that $3 did begin with http, and it did, so I couldn't figure out why the if() wasn't catching it unless it was dropping the $3 value before reaching the if().

> > Admittedly, I'm not sure why $2 is stored long enough for the if()
> > statement, but inside of the if() statement it's empty. Storing them to
> > a different variable worked for this purpose, but if there's a better
> > way, I'm very much open to it.

> The $N variables last until the next successful pattern match. In this
> case, the '$2 =~ /^http/i' in the condition of the if clears them all
> (even though it doesn't capture anything).

Ahh, that makes sense. I mistakenly thought that, since I wasn't assigning $N, then they would retain the previous value.

> In general I prefer to assign captures to real variables right away:

>     while (my ($tag, $url) = m#(<a...>(.*?)</a>)#gsi) {

> (notice also that captures can be nested, and DTRT).

Great to know! Thanks.

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason C  
View profile  
 More options Sep 24 2012, 8:56 pm
Newsgroups: comp.lang.perl.misc
From: Jason C <jwcarl...@gmail.com>
Date: Mon, 24 Sep 2012 17:56:33 -0700 (PDT)
Local: Mon, Sep 24 2012 8:56 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

On Monday, September 24, 2012 11:03:04 AM UTC-4, Ben Morrow wrote:
>     while (my ($tag, $url) = m#(<a...>(.*?)</a>)#gsi) {

In this, how does it know that we're testing $test? Or, did you mean to type something like:

while (my (tag, $url) = $text =~ m#(<a...>(.*?)</a>)#gsi)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason C  
View profile  
 More options Sep 24 2012, 9:04 pm
Newsgroups: comp.lang.perl.misc
From: Jason C <jwcarl...@gmail.com>
Date: Mon, 24 Sep 2012 18:04:17 -0700 (PDT)
Local: Mon, Sep 24 2012 9:04 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

On Monday, September 24, 2012 3:44:44 PM UTC-4, Uri Guttman wrote:
>   JC> while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {

> it will fail if the opening quote is " and the string has a ' inside
> it. perfectly legal html but you can't parse it that way.

I'll probably discard this idea and pursue a module, like you guys suggested. But for the sake of learning...

I recognized this issue, too, which is why I was originally using [^\1], like so:

(["'])*([^\1>]*)\1

I think it was you that pointed out that I can't negate a backreference like that, though.

What would be the correct way to do this, if I can't negate a backreference as a character class?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jim Gibson  
View profile  
 More options Sep 24 2012, 9:26 pm
Newsgroups: comp.lang.perl.misc
From: Jim Gibson <jimsgib...@gmail.com>
Date: Mon, 24 Sep 2012 18:26:32 -0700
Local: Mon, Sep 24 2012 9:26 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help
In article <6d53b708-9e94-4bc9-8707-d9a130b2da2c@googlegroups.com>,

Capture the leading delimiter and use a backreference that is not in a
character class:

  while ($text =~ m{(<a[^>]* href=(["']).*?\2.*?>)(.*?)(</a>)}gsi) {
                                           ^^

--
Jim Gibson


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 25 2012, 4:48 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Tue, 25 Sep 2012 09:40:09 +0100
Local: Tues, Sep 25 2012 4:40 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth Jason C <jwcarl...@gmail.com>:

> On Monday, September 24, 2012 11:03:04 AM UTC-4, Ben Morrow wrote:

> >     while (my ($tag, $url) = m#(<a...>(.*?)</a>)#gsi) {

> In this, how does it know that we're testing $test? Or, did you mean to
> type something like:

> while (my (tag, $url) = $text =~ m#(<a...>(.*?)</a>)#gsi)

Just so :). Sorry...

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 25 2012, 5:33 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Tue, 25 Sep 2012 10:28:35 +0100
Local: Tues, Sep 25 2012 5:28 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth Jason C <jwcarl...@gmail.com>:

> On Monday, September 24, 2012 11:03:04 AM UTC-4, Ben Morrow wrote:
> > > FWIW, this modification did work:

> > > while ($text =~ m#(<a[^>]* href=["'].*?["'].*?>)(.*?)(</a>)#gsi) {
> > >   $pattern = $1$2$3;
<snip>
> > >   if ($2 =~ /^http/i) {
> > >     $text =~ s/$pattern/$repl/gsi;

> > This almost certainly doesn't do what you think. If nothing else, you
> > want to \Q $pattern.

> Excellent point about \Q. What do you mean, though, that it doesn't do
> what I think?

Well, for one thing, this link

    <a href="http://html5.org">HTML5</a>

will be stripped. I don't think that's what you meant.

You're doing this backwards. You want to use HTML::Parser (or perhaps
HTML::TokeParser) to separate tags from text, and then just apply
URI::Find to 'text' sections which aren't already inside an <a> element.

...No. Maybe it would be clearer if you wrote it like this:

    if ($3 =~ /^http/i) {
        $text = s#...#...#gsi;
    }

(which is *exactly* equivalent)? The 'if' condition executes first, so
$3 is something completely random from the previous pattern match; and
in any case, the if covers the *whole* s///, not just one iteration.

You need to push the condition inside the s///. The obvious way of doing
that is

    s#<a ...>http:.*?</a>#$2#gsi;

though in more difficult cases you can use s///ge and put a ?: or
equivalent in the RHS.

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Morrow  
View profile  
 More options Sep 25 2012, 6:03 am
Newsgroups: comp.lang.perl.misc
From: Ben Morrow <b...@morrow.me.uk>
Date: Tue, 25 Sep 2012 10:53:32 +0100
Local: Tues, Sep 25 2012 5:53 am
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help

Quoth Jim Gibson <jimsgib...@gmail.com>:

That's not the same in general: .*? doesn't *want* to match a quote, but
it will if necessary to make the whole match succeed. In this particular
case it doesn't change anything because there is nothing between the \2
and the next .*?, but for instance these two

    m{<a href="[^"]*">}
    m{<a href=".*?">}

don't match the same thing. The second will match q{<a href="foo"">},
because the .*? will match a quote if forced, but the first will not.

The correct way to match 'everything until $rx' is (?:(?!$rx).)*, so in
this case

    m{... href=(["'])(?:(?!\2).)*\2 ...}

(which would certainly benefit from /x).

Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eli the Bearded  
View profile  
 More options Sep 26 2012, 5:09 pm
Newsgroups: comp.lang.perl.misc
From: Eli the Bearded <*...@eli.users.panix.com>
Date: Wed, 26 Sep 2012 21:09:34 +0000 (UTC)
Local: Wed, Sep 26 2012 5:09 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help
In comp.lang.perl.misc, Jason C  <jwcarl...@gmail.com> wrote:

> 2. I briefly looked for a module that would handle this correctly, but
> wasn't sure what to look for. And, I'm not sure that it warrants the
> including of a full module if it could potentially be done in a simple
> regex. If you can recommend a module that would be more stable and/or
> faster than what I'm doing, though, then I would definitely appreciate
> the reference!

Do you want to deal with human generated HTML? You'll find that a
"simple" regex will fail you.

http://www.panix.com/~eli/some.links.html

:r! cat $PHTML/some.links.html
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>linky</title>
</head>
<body>
<h1>linky</h1>
<ul>
<li><a href = http://www.google.com/ > Space no quotes </a>
        (this link gives validation errors)</li>
<li><a href = 'http://www.google.com/'> Space single quotes </a></li>
<li><a href='http://www.google.com/'> End space single quotes </a ></li>
<li><a
href
=
'http://www.google.com/'

> No spaces (newlines) single quotes </a
></li>

<li><a href="http://www.google.com/"> No spaces (tabs) double quotes </a     ></li>
</ul>
</body>
</html>

That's not even trying to be an exhaustive way to break it.

Elijah
------
no javascript, for example


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Kaz Kylheku  
View profile  
 More options Sep 26 2012, 6:54 pm
Newsgroups: comp.lang.perl.misc
From: Kaz Kylheku <k...@kylheku.com>
Date: Wed, 26 Sep 2012 22:54:15 +0000 (UTC)
Local: Wed, Sep 26 2012 6:54 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help
On 2012-09-26, Eli the Bearded <*...@eli.users.panix.com> wrote:

>:r! cat $PHTML/some.links.html

UUOC infects the the vi command line!

 :r!cat <file>    ->    :r <file>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Eli the Bearded  
View profile  
 More options Sep 26 2012, 7:38 pm
Newsgroups: comp.lang.perl.misc
From: Eli the Bearded <*...@eli.users.panix.com>
Date: Wed, 26 Sep 2012 23:38:09 +0000 (UTC)
Local: Wed, Sep 26 2012 7:38 pm
Subject: Re: Can't find a syntax error, hoping a second set of eyes will help
In comp.lang.perl.misc, Kaz Kylheku  <k...@kylheku.com> wrote:

> On 2012-09-26, Eli the Bearded <*...@eli.users.panix.com> wrote:
> >:r! cat $PHTML/some.links.html

> UUOC infects the the vi command line!

>  :r!cat <file>    ->    :r <file>

You got me. I tend to use :r! a lot in posts, and didn't optimize
it down to :r here.

Elijah
------
map * "yyy@y


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »