Or Hpricot ...
>
> --
> Andy Lester => an...@petdance.com => www.petdance.com => AIM:petdance
>
>
>
>
>
>
--
thanks,
-pate
-------------------------
http://on-ruby.blogspot.com
WWW::Mechanize is a wrapper around Hpricot, just as the Perl
WWW::Mechanize is a wrapper around LWP. It handles lots of the
drudgery.
You could check out my older (but still fine I guess) article on this:
http://www.rubyrailways.com/data-extraction-for-web-20-screen-scraping-in-rubyrails
It would need some polishing and adding HPricot there (working on it
actually), but even like this it could provide some help.
btw. I am just releasing (in 2-3-4 something days) a powerful web
extraction language written in Ruby. It is based on Mechanize and
Hpricot and it really does a lot of heavy lifting (although I may be a
little bit biased for obvious reasons :-) - well you will see it
yourself next week)
Peter
__
http://www.rubyrailways.com
> require 'net/http'
> website = Net::HTTP.get 'www.yahoo.com', '/'
Now you have the yahoo.com startpage sourcode in website. To see it:
> puts website
The Net::HTTP documentation has more examples:
http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/index.html
Martin
> anon1m0us wrote:
>> Hi;
>> No clue how to do this. My program to go to a website and read
>> data and
>> process it. Don't kow where to even begin! How do I go to a
>> website in
>> RUBY? How to I start reading the data?
<snip>
> btw. I am just releasing (in 2-3-4 something days) a powerful web
> extraction language written in Ruby. It is based on Mechanize and
> Hpricot and it really does a lot of heavy lifting (although I may
> be a little bit biased for obvious reasons :-) - well you will see
> it yourself next week)
After finding your article on screen scraping *very* useful, I'm
really looking forward to this!
Gav
I am happy to hear this... Web scraping can be very-very-very tedious,
(even with a superb tool like scRUBYt! :-)) so I will need a lot of
users to try it on a lot of pages to help find and report the problems
and come out with a really stable system. On the pages I am testing it
works perfectly (and it already has a decent feature set), however, so
far nearly always when I went to a previously unknown page there were
some problems...
However, as you will see it will worth the time to report problems etc.
because in the case of complex scenarios the solution will be much-much
faster and robust than with a hand-coded stuff...
Back to coding :)
Cheers,
Peter
Great example, but because I'm lazy, I prefer open-uri:
> require 'open-uri'
> puts open('http://www.yahoo.com').read
Probably better to get familiar with Net::HTTP, but when that gets
old... :-)