Best language to use to write scrapers

84 views
Skip to first unread message

Rickyhow

unread,
Oct 15, 2011, 10:57:47 AM10/15/11
to ScraperWiki
I just learned about ScraperWiki by listening to the Floss Weekly
podcast. Now I am completely sold on the idea! http://www.youtube.com/watch?v=TtkCA2Fducw

My core interest is with the semantic web. Scraperwiki (along with
Google Refine) look like a perfect solution for moving from
unstructured to structured, linked data.

So now I am learning how to create scrapers. Unfortunately my
programming experience is with Java and Javascript, not Python, PHP
nor Ruby.

My questions
- which language would be best for me to use to create scrapers?
- what percentage of scrapers are in which the three languages?

Zarino Zappia

unread,
Oct 15, 2011, 2:45:02 PM10/15/11
to scrap...@googlegroups.com
Hi Ricky,

Python's the most popular language on ScraperWiki, has the most 3rd party libraries, and is arguably the simplest to pick up. PHP's syntax is a little closer to Javascript though, so you personally might find that easier, if a little limiting in the long run. My advice: Just pick one and give it a go. We have lots of tutorials and documentation in all three languages (http://scraperwiki.com/docs), so you'll probably find you're up and running far quicker than you think.

Zarino

Thad Guidry

unread,
Oct 16, 2011, 11:49:44 PM10/16/11
to scrap...@googlegroups.com
Hi Ricky,

Your not limited to just ScraperWiki, but for collaborating and maintaining scrapers it is certainly the place to be !

You can also do fairly robust scraping with browser add-ons, one of my favorites is iOpus iMacros for Firefox and Google Chrome.

You can also do basic Fetch URL (GETs) with Google Refine and then use Refine's GREL syntax along with it's JSoup integration to parseHtml().select() to your hearts content.
Html web scraping I have begun to document under http://code.google.com/p/google-refine/wiki/StrippingHTML

-- 
Reply all
Reply to author
Forward
0 new messages