CHI and WWW::Mechanize::Cached

63 views
Skip to first unread message

J.C. Wren

unread,
Nov 8, 2010, 10:34:41 PM11/8/10
to Perl-Cache Discuss
Let me preface this by saying if this is not the appropriate forum for
this question, please let me know.

I'm using WWW::Mechanize::Cached with CHI for some web scraping. The
majority of the web pages I fetch are cacheable, but my "top level"
pages are not. Data in the top level pages will tell me if my
individual cached pages are can be retrieved from the cache or if they
need to be re-fetched. I'm unclear from how CHI integrates with
WWW::Mechanize and WWW::Mechanize::Cached to determine if there's a
way to invalidate a page. I know the URL of the page that I need to
fetch, but I don't know how that relates to the cache until after I
fetch it.

Basically, I'm looking for the equivalent way of saying $mechanize-
>get ($url, cache => 'ignore'); or somesuch, where it will fetch the
page regardless, and update the cache with the newly fetched page. In
lieu of that, I'd settle for $mech->cache->clear ($url); or similar,
to clear only that URL from the cache so it will be re-fetched.

I spent some time hunting around trying to see if there was a more
appropriate forum, but didn't find anything (which probably means
there *is* a better place :) ).

I also write Perl like C (I'm a C programmer by trade), and while I
can read and write generic Perl, reverse engineering the fancy OO
stuff is a little beyond me. I looked at the various .pm files for
CHI, WWW::Mechanize::Cached, and a few others, and didn't find what I
think I'm looking for. So I like to think I did some research before
I went asking a question that likely has some simple answer.

Thanks,
--jc

Mike Friedman

unread,
Nov 8, 2010, 11:21:09 PM11/8/10
to perl-cach...@googlegroups.com
Looking under the hood, it appears that the 'cache' method on WWW::Mechanize::Cached is read/write (not too clear from the docs), so you should just be able to do something like:

if ( $want_to_delete_url ) {
    $mech->cache->remove( $url );
}
$mech->get( $url );


Mike


--
You received this message because you are subscribed to the Google Groups "Perl-Cache Discuss" group.
To post to this group, send email to perl-cach...@googlegroups.com.
To unsubscribe from this group, send email to perl-cache-disc...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/perl-cache-discuss?hl=en.


J.C. Wren

unread,
Nov 9, 2010, 10:33:11 AM11/9/10
to perl-cach...@googlegroups.com
OK, gave that a shot, but not much luck.

WWW::Mechanize usually deals with URLs as URI objects, although there is a method to get the absolute URL as a text string.  Passing the URI object to $mech->cache->remove() throws the following error:

encountered object 'WWW::Mechanize::Link=ARRAY(0x9ab7c90)', but neither allow_blessed nor convert_blessed settings are enabled at /usr/lib/perl5/site_perl/5.12.2/CHI/Serializer/JSON.pm line 16.

while passing the absolute URL generates this error:

encountered object 'http://www.something.com?arg=1', but neither allow_blessed nor convert_blessed settings are enabled at /usr/lib/perl5/site_perl/5.12.2/CHI/Serializer/JSON.pm line 16.

--jc
Reply all
Reply to author
Forward
0 new messages