Is there a Clojure lib for web scraping?

2,300 views
Skip to first unread message

Z.A

unread,
May 16, 2012, 5:33:05 AM5/16/12
to Clojure
Hi
Is there a good Clojure lib for web scraping. I intend to collect
story links using regex from a news site's home page, then visit each
link to gather photos and text.

Thanks
Zubair

David Powell

unread,
May 16, 2012, 9:26:36 AM5/16/12
to clo...@googlegroups.com
Check out enlive [1]; it works as both a web templating library, and a
screen scraping library.
It basically lets you pull in HTML, and then match and extract content
using CSS-style rules.

[1] https://github.com/cgrand/enlive

--
Dave

Michael Klishin

unread,
May 16, 2012, 10:10:52 AM5/16/12
to clo...@googlegroups.com
Z.A:

> Is there a good Clojure lib for web scraping. I intend to collect
> story links using regex from a news site's home page, then visit each
> link to gather photos and text.

Take a look at Crawlista [1]. It primarily tries to cover link and content extraction (there is no actual
crawling/requests/throttling part) but it was created for use cases similar to yours.

1. https://github.com/michaelklishin/crawlista

MK


Benny Tsai

unread,
May 16, 2012, 5:00:34 PM5/16/12
to clo...@googlegroups.com
There's also clj-webdriver [1], which can drive a browser and also offers CSS-style querying capability.

Z.A

unread,
May 17, 2012, 1:19:57 AM5/17/12
to Clojure
Thanks. I ll try all of these. Previously I was using Perl/
WWW::Mechanize for such uses. Now want to do everything in Clojure.
Reply all
Reply to author
Forward
0 new messages