Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Replacing expression in a file from mechanize

2 views
Skip to first unread message

Nospam

unread,
Dec 23, 2006, 12:01:55 PM12/23/06
to
Basically I have a local html file, called file1.html it has a series of
links (with a particular domain name) in addition
to the html code, I am trying to follow each of these links (based on the
regular expression /on\.fe/) each of these links, in their content have a
link to another page, (I would like to capture this particular page based on
a regular expression /www\.arax/), and substitute for each link (with
regular expression /on\.fe/)in file1.html with their corresponding link
(with regular expression/www\.arax/)

So far this is what I have come up with, and am a little stuck

#! perl\bin\perl

use strict;
use warnings;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();

open(FILE, "< file1.html") || print "Unable to open the file file1 \n";

while (<FILE>)
{
if($_ =~ /on\.fe/)
{
my $url = $_;
print $mech->uri."\n";
$mech->get($_);
$mech->content();
if($mech->content()=~ /www\.arax/)
{
my $url2 = $mech->content() =~ /www\.arax/;
print $mech->uri."\n";
s/$url/$url2/;
print;

}

}

}


close(FILE);

kens

unread,
Dec 23, 2006, 4:26:58 PM12/23/06
to

Hi,
I have never used WWW::Mechanize module, and I am a little confused by
your code
(could just be me).

The statement "my $url2 = $mech->content() =~ /www\.arax/;" is not
going to
set $url2 to a string if that was your intent. Since you already know
that the regular expression matches (the preceding 'if' statement),
$url2 is set to 1 (true) indicating there was a match.

Did you just want the following?

my $url2 = $mech->content();

Ken

Mumia W. (on aioe)

unread,
Dec 23, 2006, 7:34:56 PM12/23/06
to

Neither your prose nor your program give me a feel for what you're
trying to do. Can we see some sample data for both file1.html and one of
the "www\.arax" containing files?

--
paduille.4...@earthlink.net
http://home.earthlink.net/~mumia.w.18.spam/

Nospam

unread,
Dec 24, 2006, 9:30:03 AM12/24/06
to

"Mumia W. (on aioe)" <paduille.4...@earthlink.net> wrote in message
news:emkmp5$fh0$1...@aioe.org...

From file1.html, a sample of the html code:


<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link1/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 2</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link2/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 3</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link3/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 4</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link4/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 5</span></a> </span></li>

The contents of the link http://...online.feeds.com/link1/ for example is:


<body>
...
</td></tr><tr><td
style="height:81%;width:100%;padding:0;text-align:left;"><embed
src="http://...arax.../v/gomlckZfGYU..." </embed> </td>
</tr>
<tr>
<td style="height:13%;width:100%;padding:0;text-align:left;">


Gunnar Hjalmarsson

unread,
Dec 24, 2006, 7:47:11 PM12/24/06
to
Nospam wrote:
> Basically I have a local html file,

The guy is multi-posting.
http://www.thescripts.com/forum/thread580426.html

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

0 new messages