Is there a Clojure lib for web scraping?

Z.A

unread,

May 16, 2012, 5:33:05 AM5/16/12

to Clojure

Hi
Is there a good Clojure lib for web scraping. I intend to collect
story links using regex from a news site's home page, then visit each
link to gather photos and text.

Thanks
Zubair

David Powell

unread,

May 16, 2012, 9:26:36 AM5/16/12

to clo...@googlegroups.com

Check out enlive [1]; it works as both a web templating library, and a
screen scraping library.
It basically lets you pull in HTML, and then match and extract content
using CSS-style rules.

[1] https://github.com/cgrand/enlive

--
Dave

Michael Klishin

unread,

May 16, 2012, 10:10:52 AM5/16/12

to clo...@googlegroups.com

Z.A:

> Is there a good Clojure lib for web scraping. I intend to collect
> story links using regex from a news site's home page, then visit each
> link to gather photos and text.

Take a look at Crawlista [1]. It primarily tries to cover link and content extraction (there is no actual
crawling/requests/throttling part) but it was created for use cases similar to yours.

1. https://github.com/michaelklishin/crawlista

MK

Benny Tsai

unread,

May 16, 2012, 5:00:34 PM5/16/12

to clo...@googlegroups.com

There's also clj-webdriver [1], which can drive a browser and also offers CSS-style querying capability.

[1] https://github.com/semperos/clj-webdriver

Z.A

unread,

May 17, 2012, 1:19:57 AM5/17/12

to Clojure

Thanks. I ll try all of these. Previously I was using Perl/
WWW::Mechanize for such uses. Now want to do everything in Clojure.

Reply all

Reply to author

Forward