Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

get framed page with LWP Parse Perl

8 views
Skip to first unread message

Joan Coll - NUS VIRTUAL

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
Hi
I'm trying to get a framed page with LWP::Simple Perl module from
another server
but error <NOFRAMES> appears if URL contains frames.
Do you know Perl code to parse page and get framed page?
I use as well HTML::Parse; to change links to absolute.
Thanks a lot. I've working with that for long days.
- Joan


Tony Curtis

unread,
Jan 21, 2000, 3:00:00 AM1/21/00
to
Joan Coll - NUS VIRTUAL <joan...@nusvirtual.com> writes:

> I'm trying to get a framed page with LWP::Simple
> Perl module from another server but error
> <NOFRAMES> appears if URL contains frames. Do you
> know Perl code to parse page and get framed page?
> I use as well HTML::Parse; to change links to
> absolute.

My guess is that the remote server is checking the
supplied UserAgent name to see if it thinks <FRAME>s
will be understood.

Try setting the UserAgent name in your request call
(perldoc LWP::UserAgent) to something Netscape- or
IE-like and see if that helps.

Silly remote server :-(

hth
tony

Jonathan Stowe

unread,
Jan 22, 2000, 3:00:00 AM1/22/00
to
On 21 Jan 2000 18:52:43 +0000 Tony Curtis wrote:
> Joan Coll - NUS VIRTUAL <joan...@nusvirtual.com> writes:
>
>> I'm trying to get a framed page with LWP::Simple
>> Perl module from another server but error
>> <NOFRAMES> appears if URL contains frames. Do you
>> know Perl code to parse page and get framed page?
>> I use as well HTML::Parse; to change links to
>> absolute.
>
> My guess is that the remote server is checking the
> supplied UserAgent name to see if it thinks <FRAME>s
> will be understood.
>

No it isnt - the page that defines the frameset is sent and if the
user agent can do frames then it will request the individual pages
that comprise the frameset and build the frames - if it (the user agent)
cant handle frames then the <NOFRAMES> bit will be displayed.

> Try setting the UserAgent name in your request call
> (perldoc LWP::UserAgent) to something Netscape- or
> IE-like and see if that helps.
>

This wont help. The OP will have to create a user agent that understands
frames.

/J\
--
Jonathan Stowe <j...@gellyfish.com>
<http://www.gellyfish.com>
** Uri Guttman - Have You CPANed Backward.pm Yet ? **

Jonathan Stowe

unread,
Jan 23, 2000, 3:00:00 AM1/23/00
to
On Fri, 21 Jan 2000 19:16:54 +0100 Joan Coll - NUS VIRTUAL wrote:
> Hi

> I'm trying to get a framed page with LWP::Simple Perl module from
> another server
> but error <NOFRAMES> appears if URL contains frames.
> Do you know Perl code to parse page and get framed page?
> I use as well HTML::Parse; to change links to absolute.
> Thanks a lot. I've working with that for long days.

OK. This one seems to be coming up quite a bit of late and the last time
I did kind of imply that I might make an example.

You need to make a 'user agent' that understands a frameset in such a way
as it will retrieve the constituent frames : this is a relatively simple
thing given HTML::Parser.

The following is pretty dumb - all it does is print the contents of all
the pages to STDOUT : you may want to put more of your processing in the
sub start rather than simply printing the stuff . It is run by supplying
a full URL on the command line:


#!/usr/bin/perl -w

package Framething;

use strict;

use LWP::UserAgent;
require HTML::Parser;
use URI;

@Framething::ISA = qw(HTML::Parser);

my $starturl = shift || die "No url supplied\n";

my $baseuri = URI->new($starturl);

my @urls ;

push @urls,$starturl;

my $agent = new LWP::UserAgent;
my $parser = new Framething;


$agent->agent("Gelzilla/666");


while( my $url = shift @urls)
{
my $request = new HTTP::Request 'GET' => $url;

my $result = $agent->request($request);

if ($result->is_success)
{
print $result->as_string;
$parser->parse($result->content);
}
else
{
print "Error: " . $result->status_line . "\n";
}
}

sub start
{
my($self,$tag,$attr,$attrseq,$orig) = @_;

if ($tag eq 'frame' )
{
if (exists $attr->{src})
{
my $thisuri = URI->new($attr->{src});
push @urls, $thisuri->abs($baseuri);
}
}
}

The above is based on the version 2.* of HTML::Parser - it requires that the
HTML::Parser class is subclassed so the sub start can be over-ridden. As I
previously said I would start presenting some examples for the version 3.*
api heres the same thing as you might write it for the latest and greatest
version of HTML::Parser :


#!/usr/bin/perl -w

use strict;

use LWP::UserAgent;
use HTML::Parser;
use URI;

my $starturl = shift || die "No url supplied\n";

my $baseuri = URI->new($starturl);

my @urls ;

push @urls,$starturl;

my $agent = new LWP::UserAgent;
my $parser = HTML::Parser->new(api_version => 3,
start_h => [\&start ,"tagname, attr"]);

$agent->agent("Gelzilla/666");

while( my $url = shift @urls)
{
my $request = new HTTP::Request 'GET' => $url;

my $result = $agent->request($request);

if ($result->is_success)
{
print $result->as_string;
$parser->parse($result->content);
}
else
{
print "Error: " . $result->status_line . "\n";
}
}

sub start
{
my($tag,$attr) = @_;
if ($tag eq 'frame' )
{
my $thisuri = URI->new($attr->{src});
push @urls, $thisuri->abs($baseuri);
}
}

As you can see there is not a great deal of difference in this particular
example - there is just no need to create your own sub-class of HTML::Parser
and you control what things get passed as arguments to your handler sub.

Joan Coll - NUS VIRTUAL

unread,
Jan 24, 2000, 3:00:00 AM1/24/00
to
Thanks for your help. ;-)
Now I'm trying to install URI because there is not in standard Perl library
(Windows box). I've not privileges to do that, may be with push order...
Joan


Jonathan Stowe escribió:

David Cassell

unread,
Jan 24, 2000, 3:00:00 AM1/24/00
to
Joan Coll - NUS VIRTUAL wrote:
>
> Thanks for your help. ;-)
> Now I'm trying to install URI because there is not in standard Perl library
> (Windows box). I've not privileges to do that, may be with push order...

Since you're on an NT box, you'll want to use ppm to install your
modules.
If you are concerned that you cannot install your modules in the set of
directories that we would refer to as @INC [you can see these by typing
perl -V and reading through the output], then use perldoc to read about
your options for installing these where it is convenient, and then
letting Perl know where to look. At a command prompt, type:

perldoc -q module

David
--
David Cassell, OAO Corp. cas...@mail.cor.epa.gov
Senior Computing Specialist
mathematical statistician

Joan Coll

unread,
Jan 26, 2000, 3:00:00 AM1/26/00
to
I've no possibity to access command prompt because I've no rights (Telnet) to do
that in NT box.
May be I'll come back to UNIX box if I don't find a solution.

Joan

David Cassell escribió:

David Cassell

unread,
Jan 26, 2000, 3:00:00 AM1/26/00
to
Joan Coll wrote:
>
> I've no possibity to access command prompt because I've no rights (Telnet) to do
> that in NT box.
> May be I'll come back to UNIX box if I don't find a solution.

Well, you can use perldoc to read the documentation and the FAQ on
*any* machine where Perl is installed [properly]. Don't let access
problems keep you from the best source of info. It's also available
at CPAN [www.cpan.org] without the install.

0 new messages