Fwd: WWW-Mechanize issues

41 views
Skip to first unread message

Igor Korot

unread,
Dec 4, 2011, 3:11:26 AM12/4/11
to www-mecha...@googlegroups.com
---------- Forwarded message ----------
From: Igor Korot <ikor...@gmail.com>
Date: Sun, Dec 4, 2011 at 12:09 AM
Subject: Fwd: WWW-Mechanize issues
To: www-mecha...@googlegroups.com


By suggestion of Andy Lester I'm forwarding this here.

Any help appreciated.

I'm running Gentoo Linux with perl-5.12 and WWW-Mechanize-1.66.

I can upgrade to 1.710 version of WWW-Mechanize but if this is known problem
I need to mention about it in the documentation.

Thank you.


---------- Forwarded message ----------
From: Igor Korot <ikor...@gmail.com>
Date: Sat, Dec 3, 2011 at 10:36 PM
Subject: WWW-Mechanize issues
To: an...@petdance.com


Hi,
My name is Igor and currently I'm trying to write a web crawler using Perl.
I found the WWW-Mechanize module and it is very useful. However, I
have some questions.

1. On the page:
http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/FAQ.pod at
the very bottom
there is a following phrase:

[quote]
Mech is a big memory pig! I'm running out of RAM!

Mech keeps a history of every page, and the state it was in. It
actually keeps a clone of the full Mech object at every step along the
way.

You can limit this stack size with the [b]stack_depth[/b] parm in the
new() constructor. If you set [b]stack_size[/b] to 0, Mech will not
keep any history.
[/quote]

So, what is the name of the parameter: stack_depth or stack_size? Or
they are two different variables? If they are different then how do
one use it?

2. When I'm trying to test the script it works and then when I run it
the very next second it does not. I need to wait at least 4-5 min
before running it again to succeed.

Here is the code I use:

[code]
#!/usr/bin/perl -w
use DBI;
use JSON;
use WWW::Mechanize;

my $url = shift;
my $browser = WWW::Mechanize->new();
my ($content, $json, $parsed_text, $company_name, $company_url);
eval
{
       $browser->get( $url );
#       die "Can't get the companies list.\n" unless( $browser->status );
       $content = $browser->content();
#       die "Can't get companies names.\n" unless( $browser->status );
       $json = new JSON;
       $parsed_text =
$json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode(
$content );
[/code]

The offending line is:

$browser->get( $url );

Script just dies inside this call.
If I un-comment the "die ..." lines nothing happen. Script will still
die inside the "get()" call.
Reading the FAQ that mentioned in question 1 didn't help.
Any idea on how to find out the problem?
If I can let the script to  continue to hit the die() message it would be great.

Thank you.

Andy Lester

unread,
Dec 4, 2011, 3:17:22 PM12/4/11
to www-mecha...@googlegroups.com

On Dec 4, 2011, at 2:11 AM, Igor Korot wrote:

So, what is the name of the parameter: stack_depth or stack_size? Or
they are two different variables? If they are different then how do
one use it?

It's stack_depth.  Looks like stack_size is a typo.

xoa


Igor Korot

unread,
Dec 4, 2011, 11:59:50 PM12/4/11
to WWW::Mechanize users
Thank you, Andy.
What about issue #2?
I have WWW-Mechanize-1.66. It's not latest and greatest, but I can't
call get() twice consecutively.
Was it a problem? Does it solved in latest 1.71?


On Dec 4, 12:17 pm, Andy Lester <a...@petdance.com> wrote:
> On Dec 4, 2011, at 2:11 AM, Igor Korot wrote:
>
> > So, what is the name of the parameter: stack_depth or stack_size? Or
> > they are two different variables? If they are different then how do
> > one use it?
>
> It's stack_depth.  Looks like stack_size is a typo.
>
> xoa
>
> --

> Andy Lester => a...@petdance.com =>www.petdance.com=> AIM:petdance

Andy Lester

unread,
Dec 5, 2011, 12:01:06 AM12/5/11
to www-mecha...@googlegroups.com

On Dec 4, 2011, at 10:59 PM, Igor Korot wrote:

Thank you, Andy.
What about issue #2?
I have WWW-Mechanize-1.66. It's not latest and greatest, but I can't
call get() twice consecutively.
Was it a problem? Does it solved in latest 1.71?

I don't know.  I don't have the time to get into the specifics of your question.  I'm sorry.

You could also try perlmonks.org.

xoa


--

Paul Miller

unread,
Dec 5, 2011, 6:50:01 AM12/5/11
to WWW::Mechanize users
On Dec 4, 11:59 pm, Igor Korot <ikoro...@gmail.com> wrote:
> call get() twice consecutively.

To be fair, I'm on whatever the latest version is, but there hasn't
been more than a couple days gone by that I don't use Mech for
something. I'm pretty much 100% sure I could call get()
consecutively on every major release in history. Why is it that you
can't upgrade to the latest version? Authors are much more likely to
help you if you try the latest version. Also, if it works for them
(and it almost surely does work for him or he wouldn't have released)
then you'll probably have to provide a lot more detail about what's
going wrong.

Try running something really simple, like this. If this works, then
you can probably point to other code. If it doesn't, I would check
your firewalls and things.

time perl -MWWW::Mechanize -e '$m = WWW::Mechanize->new; for(1..5)
{ $m->get("http://ip.kfr.me/"); print $m->content }'
75.x.x.x
75.x.x.x
75.x.x.x
75.x.x.x
75.x.x.x

real 0m0.477s
user 0m0.130s
sys 0m0.017s

At the very least, try to tell us what error Mech produced. You hide
it with the eval. Try something like eval { $mech->get($blarg); 1 }
or die "hrm, mech borked: $@"

Andy Lester

unread,
Dec 5, 2011, 9:45:19 AM12/5/11
to www-mecha...@googlegroups.com

On Dec 5, 2011, at 5:50 AM, Paul Miller wrote:

At the very least, try to tell us what error Mech produced.  You hide
it with the eval.  Try something like eval { $mech->get($blarg); 1 }
or die "hrm, mech borked: $@"

Or better yet, turn off autocheck.

my $mech = WWW::Mechanize->new( autocheck => 0 );

Then you can (and must) manually check the return.

my $res = $mech->get( $blarg );
if  ( $res->status == 200 ) ….
Reply all
Reply to author
Forward
0 new messages