Nokogiri Capability

14 views
Skip to first unread message

Stuart Holmes

unread,
Apr 15, 2014, 11:40:52 AM4/15/14
to nokogi...@googlegroups.com
All...I have a question centered on Nokogiri's capability to gather data from various credit union websites. I have been trying to figure out for sometime how to effectively gather auto rates from credit union websites.
Mike

I found your name attached to nokogiri and I have a couple of questions. For several years I have been dabbling with a website that will enable consumers to search for local credit unions and see corresponsing auto rates. The issue I have is that in order to get the credit union auto rate data, I would have to do one of the following:

Manually Visit the CU Website and Data Enter the rates data Daily
Give the Credit Unions A Interface In Which They can Data Enter the rates daily
Scrape the Credit Unions Website and somehow parse the data and recompile it in the proper format.

I would prefer the Scraping, but have not been able to find any technology that could enable this. Another layer of complexity is that every credit unions website may have the data structured slighty different, so the scraping would have to be somewhat intelligent.

Could Nokogiri Accomplish this? I appreciate your comments.


Ben Langfeld

unread,
Apr 16, 2014, 8:12:18 AM4/16/14
to nokogi...@googlegroups.com
You may want to look at Mechanize, which is build on Nokogiri: http://mechanize.rubyforge.org/


--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at http://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

Jack Royal-Gordon

unread,
Apr 16, 2014, 4:28:39 PM4/16/14
to nokogi...@googlegroups.com
I would second Ben’s suggestion. I’ve been using Mechanize/Nokogiri for a couple years to scrape information from several websites. That said, neither of these packages are panaceas: you will have to spend a great deal of time analyzing each website to determine the best way to extract information from them. Then you’ll have to deal with the website owners making changes which invalidate your code. Do not undertake this project unless you are prepared to dedicate yourself to it in perpetuity (or as long as you want the project to be useful). With that warning in mind, happy scraping!
Reply all
Reply to author
Forward
0 new messages