how to extract data from webstites ?

yaoyao

unread,

Aug 9, 2012, 9:44:22 AM8/9/12

to emacs-o...@googlegroups.com

without using nokogori or mechanize, regardless of the structure of the web page, some idea ?

ddoherty03

unread,

Aug 10, 2012, 4:08:46 PM8/10/12

to emacs-o...@googlegroups.com

Try selenium. It drives and actual browser, so it is able to work where some others fail. Start with the firefox plugin for selenium, then look into the Ruby webdriver interface.

Good luck.

Jacob Tjørnholm

unread,

Aug 10, 2012, 5:37:55 PM8/10/12

to emacs-o...@googlegroups.com

It's hard to come up with a useful answer without some more detail :)

But for just grabbing the contents of a website, I often use curl.

/Jacob

Sathish Kumar

unread,

Aug 14, 2012, 2:36:33 PM8/14/12

to emacs-o...@googlegroups.com

Hi,
APIfy is a useful service for extracting data from websites as json using simple css/xpath selectors.

Tutorial: http://apify.heroku.com/tutorial/create
Example API: http://apify.heroku.com/resources/4fca535156983f0001000002

Regards,
Sathish

Lennart Borgman

unread,

Aug 14, 2012, 7:04:16 PM8/14/12

to emacs-o...@googlegroups.com

To get the CSS/XPath selectors you may use XCPath bookmarklet:

http://dl.dropbox.com/u/848981/it/xp/xp.html

yao liu

unread,

Aug 15, 2012, 4:06:11 AM8/15/12

to emacs-o...@googlegroups.com

Extact particular information from website regardless of there css/xpath structure

Reply all

Reply to author

Forward