Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Stuck automating login to reuters.com and getting a page

9 views
Skip to first unread message

Carl Wells

unread,
Jan 16, 2011, 7:48:13 AM1/16/11
to beginn...@perl.org
Hi,

I hope you don't mind my newbie question. I'm new to web-programming (and indeed am somewhat rusty with programming in general). I'm out of work and trying to teach myself C++, PERL, SQL and other skills and in order to do this I've set myself a project. As part of this project I need to access data from this URL:

http://www.reuters.com/finance/stocks/incomeStatement/detail?perType=ANN&symbol=BATS.L

the problem I'm having is that this redirects to the reuters.com login page. I've tried to use both existing cookie files from internet explorer (I had to rename these because the name of the cookie involved my user name which incorporates a space and an @ e.g. fred bumb...@honeypot.org and Perl didn't seem to like that/my syntax was wrong) and setting up perl to receive a new cookie from the site. Neither has worked for me. I've spent the past 3 days trying to glue bits of code together from various googles and the cpan module descriptions for LWP and Mechanize. An example of code thats not working for me is as below:

#!/usr/local/bin/perl -w
use strict;
use Crypt::SSLeay;
use LWP::UserAgent;
use LWP::Simple;
use HTTP::Request::Common qw(POST);
use HTTP::Cookies;

my $ua = LWP::UserAgent->new;
my $cookie_jar = HTTP::Cookies->new(file => "lwpcookies2.txt",
autosave => 1);
$ua->cookie_jar( $cookie_jar);
$ua->agent('Mozilla/5.0');
my $url = 'https://commerce.us.reuters.com/login/pages/login/login.do';
my $req = POST $url, ['login' => 'Fredbumblebee', 'password' => 'BzzZZZ!'];
my $res = $ua->request($req);
$cookie_jar->extract_cookies($res);

if ($res->is_success) {
# print out result to look at headers
print $res->as_string;

# access page with cookie secured after logged in
my $req = HTTP::Request->new(GET => 'http://www.reuters.com/finance/stocks/incomeStatement/detail?perType=ANN&symbol=BATS.L');
$cookie_jar->add_cookie_header($req);
$res = $ua->request($req);
#print $res->as_string;
} else {
print "Failed: ", $res->status_line, "\n";
}

The cookie file only contains #LWP-Cookies-1.0. I'm currently trying to use the Live HTTP Headers addon in firefox to figure out what is being passed to and from the web server but I am a bit out of my depth :(.

Once I've done this for BATS I'm planning to get a few more pages for other stocks so I'm guessing I'll want to create a session, not create a new cookie/log in again for each page request! I also don't want to hammer their site, I gather one can use a 'sleep' command, do you have any advice on this?

I've managed to use HTML::tableextract to get tables I want from other reuters.com pages which didn't require the free logon but no joy here! I started using C++/CURL/tidylib/tinyxml but moved to PERL as its so much easier to use! Once I have done this I'll want to call PERL from C++ so that I can pass my data into C++ objects; I've already looked into this and am finding it tricky (running a simple perl script from C++ is fine but calling PERL with modules such as LWP has not worked for me yet; I've read the docs but not managed to get the XS thing to run, Perl was saying it couldn't run dynamic code in this way; does anyone know a good, easy to use Perl Wrapper for C++?? there are several but they all seem to be from 2003!! and not sure they will work)

If some kind soul would help me out or even suggest what I might need to read to find my solution that would be very much appreciated!!

Thanks,

Carl



Mike Williams

unread,
Jan 16, 2011, 8:31:55 AM1/16/11
to Carl Wells, beginn...@perl.org
On Sun, Jan 16, 2011 at 7:48 AM, Carl Wells <cgrw...@yahoo.co.uk> wrote:

> Hi,
>
> I hope you don't mind my newbie question. I'm new to web-programming (and
> indeed am somewhat rusty with programming in general). I'm out of work and
> trying to teach myself C++, PERL, SQL and other skills and in order to do
> this I've set myself a project. As part of this project I need to access
> data from this URL:
>
>
> http://www.reuters.com/finance/stocks/incomeStatement/detail?perType=ANN&symbol=BATS.L
>
> the problem I'm having is that this redirects to the reuters.com login
> page. I've tried to use both existing cookie files from internet explorer
> (I had to rename these because the name of the cookie involved my user name
> which incorporates a space and an @ e.g. fred bumb...@honeypot.org and
> Perl didn't seem to like that/my syntax was wrong) and setting up perl to
> receive a new cookie from the site. Neither has worked for me. I've spent
> the past 3 days trying to glue bits of code together from various googles
> and the cpan module descriptions for LWP and Mechanize. An example of code
> thats not working for me is as below:
>
>

Carl,

Hi there. One big problem here is that the login page contains javascript.
If the javascript code is not run on the client side then whatever you post
will not work.
Sorry, but I do not know of a way around this, I've tried similar things in
the past and had no luck. For forms that do not include javascript
WWW::Mechanize works nicely.

If you have any other specific perl issues post them here. Or maybe someone
knows a way around this javascript issue and we'll both learn something.

You're tackling a lot of different things at one time: c++, perl, sql, CGI
and xs - you may want to consider narrowing your focus a bit.

As far as the c++ issues go, I would suggest that you first have the perl
code write the data to disk, then have the c++ code read it off the disk, or
just do whatever you intend to do with the data all in perl. Calling perl
from c++ is a large can of worms. Unless you have a compelling reason for
doing this, I would advise against it. It might be easier to put your c++
code in a library and then create a wrapper around that with xs. One
approach that generally works for me is to identify stand alone tasks, get
them working by themselves, then work on making them play together.

Mike

Carl Wells

unread,
Jan 16, 2011, 11:38:08 AM1/16/11
to Mike Williams, beginn...@perl.org
Thanks Mike.  I thought that might be the case so looks like learning
some javascript is in order!  I want to get that data direct from
source.

Yes, it crossed my mind that sticking with PERL might be easier, I've
lost almost 2 months already (I was ill for 3 weeks with flu) and am
quite results oriented now!  I can always try to redo bits later as
necessary.  So I may use perl to stick the data into sql/access and then
use C++ to get it back out/manipulate it.  By the time I'm at that
stage I should have more of an overall structure to what I'm trying to
do!

If anyone else has any ideas/advice, much appreciated!

Best,

Carl

--- On Sun, 16/1/11, Mike Williams <drum...@gmail.com> wrote:

Greg

unread,
Jan 16, 2011, 3:28:14 PM1/16/11
to Carl Wells, beginn...@perl.org
Hi Carl , if I read your post correctly , your trying to scrape a
website of some data using the Perl LWP methods , It is a common
task for Perl , May I suggest that you do some research on
Scrapping and Perl , you will find that there are several approaches to
navigating the target site , your user agent should be able to
respond to login request from the target site , and proceed to the
next page the site presents as well as make selections from drop down
box's and fill in text entry fields and press the submit buttons,
check with perl.com for some tutorials. WWW::Mechanize may be the
module your looking for. or a combination of LWP::UserAgent and the
Perl Expect.pm may be hacked together.

I have found when I last wrote a scraping script it helped to
manually walk through each and every step , look at the source of each
page and record the form widgets names and what they were suppose
to contain , then reproduce the same experience programing with the
script

hope this helps

Greg

0 new messages