good day dear list
first of all i apologize for asking a question that might have been
asked a million times before..
i have some problems with mozrepl-timeouts in a web-thumbnail-scraper
that runs on a openSuse-linux-box.
tryin to find a better solution either in Ruby / Python or PHP - but
if you have ideas to re-work the perl-script. i would be glad too.
The question: Is there a way to specify Net::Telnet timeout with
WWW::Mechanize::Firefox?
At the moment my internet connection [a quite fast dsl one] is very
slow and sometimes I get error
with $mech->get():
command timed-out at /usr/local/share/perl/5.12.3/MozRepl/Client.pm
line 186
SEE THIS ONE: $mech->repl->repl->timeout(100000);
Unfortunatly it does not work: Can't locate object method "timeout"
via package "MozRepl"
Documentation says this should:
$mech->repl->repl->setup_client( { extra_client_args => { timeout => 1
+80 } } );
problem: I have a list of 2500 websites and need to grab a thumbnail
screenshot (!) of them. How do I do that?
I could try to parse the sites either with Perl.- Mechanize would be a
good thing.
Note: i only need the results as a thumbnails that are a maximum 240
pixels in the long dimension.
At the moment i have a solution which is slow and does not give back
thumbnails:
How to make the script running faster with less overhead - spiting out
the thumbnails
My prerequisites: addon/mozrepl/
the module WWW::Mechanize::Firefox;
the module imager
This is my source ... see a snippet [example]of the sites i have in
the url-list.
urls.txt [the list of sources in a file]
www.google.com
www.cnn.com
www.msnbc.com
news.bbc.co.uk
www.bing.com
www.yahoo.com - and so on and so forth...:
What i have tried allready; here it is:
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize::Firefox;
my $mech = new WWW::Mechanize::Firefox();
open(INPUT, "<urls.txt") or die $!;
while (<INPUT>) {
chomp;
print "$_\n";
$mech->get($_);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(OUTPUT, ">$name");
print OUTPUT $png;
sleep (5);
}
Well this does not care about the size:
See the output commandline:
linux-vi17:/home/martin/perl # perl
mecha_test_1.pl
www.google.com
www.cnn.com
www.msnbc.com
command timed-out at /usr/lib/perl5/site_perl/5.12.3/MozRepl/Client.pm
line 186
linux-vi17:/home/martin/perl #
Question: how to extend the solution either to make sure that it does
not stop in a time out.
Note again: i only need the results as a thumbnails that are a maximum
240 pixels in the long dimension.
As a prerequisites, i allready have installed the module imager.
How to make the script running faster with less overhead - spiting out
the thumbnails
Update: in addition to the mothere is a Monksthread
perlmonks.org/?
node_id=901572
i also tried out this one here:
$mech->repl->repl->setup_client( { extra_client_args => { timeout =>
5*60 } } );
putting links to @list and use eval
while (scalar(@list)) {
my $link = pop(@list);
print "trying $link\n";
eval{
$mech->get($link);
sleep (5);
my $png = $mech->content_as_png();
my $name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(OUTPUT, ">$name");
print OUTPUT $png;
close(OUTPUT);
}
if ($@){
print "link: $link failed\n";
push(@list,$link);#put the end of the list
next;
}
print "$link is done!\n";
}
Question: is there a Ruby / Python /PHP-Solution that runs more
efficient - or can you suggest a Perl-solution that is more stable..
Look forward to hear from you
greetings
martin