1. The url after the href= within the following tags <link
rel="alternate" and />
So if there is <link rel="alternate" type="application/atom+xml"
title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
the http://hello.typepad.com/hello/atom.xml
2. everything bewtween the following tags <title> and </title>
so if there is <title>hello, typepad</title> I want hello, typepad
3. everything between the tags <h2 id="banner-description"> and </h2>
4. Finally i would like the results to be saved to a delimited file in
the following format:
column 1: original url
column 2: data obtained from step 1
column 3: data obtained from step 2
column 4: data obtained from step 3
if there is no result for any one of the steps a null should be saved.
I would like to thank whoever can provide me with the code in advance,
Thank you.
it is highly unlikely that anyone will do so for a simple "thanks".
check out jobs.perl.org for someone willing to follow orders in return
for compensation.
-jp
> So I need code that will go through a list of URLs (formatted as
> http://www.google.com) and for each url get the following information:
>
> 1. The url after the href= within the following tags <link
> rel="alternate" and />
That's just one tag.
> So if there is <link rel="alternate" type="application/atom+xml"
> title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
> the http://hello.typepad.com/hello/atom.xml
use HTML::TokeParser::Simple;
http://search.cpan.org/~ovid/HTML-TokeParser-Simple-3.15/
> 2. everything bewtween the following tags <title> and </title>
> so if there is <title>hello, typepad</title> I want hello, typepad
Ditto.
> 3. everything between the tags <h2 id="banner-description"> and </h2>
Ditto.
> 4. Finally i would like the results to be saved to a delimited file in
> the following format:
>
> column 1: original url
> column 2: data obtained from step 1
> column 3: data obtained from step 2
> column 4: data obtained from step 3
Trivial.
> if there is no result for any one of the steps a null should be saved.
>
>
> I would like to thank whoever can provide me with the code in advance,
That's not how it works here. Feel free to compose a proper post showing us
what you have tried - after having read and followed the posting guidelines
- and help will flow.
If you don't want to be bothered with that, you might be able to generate
enough "warm glow" to motivate me to help by making a donation of $1000 or
more to the Perl foundation:
http://donate.perl-foundation.org/index.pl?node=Fund+Drive+Details&selfund=2
If you don't want to bother with either of those, then try:
Sinan
PS: For the record, I am in no way affiliated with the Perl Foundation.
I have not yet donated. I should. Oh, the guilt.
--
A. Sinan Unur <1u...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
> So I need code that will go through a list of URLs (formatted as
> http://www.google.com) and for each url get the following information:
>
> 1. The url after the href= within the following tags <link
> rel="alternate" and />
>
> So if there is <link rel="alternate" type="application/atom+xml"
> title="Atom" href="http://hello.typepad.com/hello/atom.xml" /> I want
> the http://hello.typepad.com/hello/atom.xml
>
>
> 2. everything bewtween the following tags <title> and </title>
> so if there is <title>hello, typepad</title> I want hello, typepad
>
> 3. everything between the tags <h2 id="banner-description"> and </h2>
I use HTML::TreeBuilder for this, since it makes life really easy. See
http://johnbokma.com/perl/ for several examples (Web automation).
For example 3. can be done as:
my $root = HTML::TreeBuilder->new_from_content( $content );
:
:
my @column4;
push @column4, $_->as_trimmed_text
for $root->look_down( _tag => h2, id =>'banner-description' );
> I would like to thank whoever can provide me with the code in advance,
> Thank you.
I can provide the code, and forms to thank me are here:
http://johnbokma.com/wish-list.html
Either Object Oriented Perl or Perl Best Practices would be fine with me
since directly and indirectly you will contribute back to the Perl
community.
--
John Bokma Freelance software developer
&
Experienced Perl programmer: http://castleamber.com/
> Subject: Need help with parsing data
What part is it that you need help with?
(you should use a module that understands XHTML data if you need
to process XHTML data.
)
> I would like to thank whoever can provide me with the code in advance,
What makes you think that someone will write your program for you?
--
Tad McClellan SGML consulting
ta...@augustmail.com Perl programming
Fort Worth, Texas