Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

login/password to Yahoo group archives. sample script, need help!

1 view
Skip to first unread message

Slava

unread,
Feb 20, 2002, 12:41:59 PM2/20/02
to
Hello everybody.

I wrote a perl script (ActivePerl for Win32) that downloads messages
from Yahoo group archive. Actually, I have two scripts for now. First one
downloads all messages, but every fourth message is skipped, because
Yahoo uses a redirect to display an ad page. My second script downloads
every 4th msg. For now they only work when the access to archived
messages is public, that is it doesn't require logging in as a group
member.

I am posting the 1st script here for your review. I am still a
beginner and will appreciate any help in getting through that
(login/password) obstacle. "SalviaD" is the group that I am trying to
download, but you can test it with any other group that you like.

********************************************

#!c:/perl/bin/perl -w

use warnings;
use diagnostics;
use LWP::UserAgent;
use HTTP::Cookies;
$base_ref = 'http://groups.yahoo.com/group/SalviaD/message/';
$work_dir = 'D:/!sites/salviad/archive/!'; #local path
$ref = "http://groups.yahoo.com/group/SalviaD/messages";

$jar = $work_dir . "/" . "cookie_jar.txt"; # file that stores cookies
$size = 10; #just to test, I choose 10 msgs to download
$i=1; # starting from the 1st message
$cookie_jar = HTTP::Cookies -> new (file => $jar, autosave => 1,
ignore_discard => 1);
$cookie_jar->load($jar);

$ua = new LWP::UserAgent;
$ua->agent("Mozilla" . $ua->agent);
$ua->timeout(30);
$ua->cookie_jar( $cookie_jar );

do
{

$file = $base_ref . $i; #the files/messages names are just numbers
without extension

$path = $work_dir . "/" . $i . ".html";
open(raw_file, ">$path") || die "File: Can not create file";

$request = new HTTP::Request GET => $file;


$request->header('accept' => "text/html", 'referer'
=> $ref);

$cookie_jar->load( $jar );
$cookie_jar->add_cookie_header( $request );
$ua->requests_redirectable( );
$ua->prepare_request($request);
$response = $ua->request( $request );
# print $response->as_string ();
if ($response) {
$cookie_jar->extract_cookies( $response );
$cookie_jar->save( $jar );
print raw_file $response->as_string ();
close(raw_file);
}

$i++;
} until ($i > $size);

exit(0);

*********************************************************
Sincerely,
Slava

0 new messages