how to extract data from webstites ?

48 views
Skip to first unread message

yaoyao

unread,
Aug 9, 2012, 9:44:22 AM8/9/12
to emacs-o...@googlegroups.com
without using nokogori or mechanize, regardless of the structure of the web page, some idea ?

ddoherty03

unread,
Aug 10, 2012, 4:08:46 PM8/10/12
to emacs-o...@googlegroups.com
Try selenium.  It drives and actual browser, so it is able to work where some others fail.  Start with the firefox plugin for selenium, then look into the Ruby webdriver interface.

Good luck.

Jacob Tjørnholm

unread,
Aug 10, 2012, 5:37:55 PM8/10/12
to emacs-o...@googlegroups.com
It's hard to come up with a useful answer without some more detail :)

But for just grabbing the contents of a website, I often use curl. 

/Jacob

Sathish Kumar

unread,
Aug 14, 2012, 2:36:33 PM8/14/12
to emacs-o...@googlegroups.com
Hi,
APIfy is a useful service for extracting data from websites as json using simple css/xpath selectors.

Tutorial: http://apify.heroku.com/tutorial/create
Example API: http://apify.heroku.com/resources/4fca535156983f0001000002

Regards,
Sathish

Lennart Borgman

unread,
Aug 14, 2012, 7:04:16 PM8/14/12
to emacs-o...@googlegroups.com
To get the CSS/XPath selectors you may use XCPath bookmarklet:

http://dl.dropbox.com/u/848981/it/xp/xp.html

yao liu

unread,
Aug 15, 2012, 4:06:11 AM8/15/12
to emacs-o...@googlegroups.com
Extact particular information from website regardless of there css/xpath structure
Reply all
Reply to author
Forward
0 new messages