Harvestman crawler 2.0 beta released !

33 views
Skip to first unread message

Lukasz Szybalski

unread,
Nov 5, 2008, 7:09:00 PM11/5/08
to harvestm...@lists.berlios.de
How are you crawling and downloading websites, files, images?
Do you need something better?
Its time for a change !
Download the beta version of harvestman crawler today!!!!

HarvestMan is a modular, extensible and flexible web crawler program
written in pure Python. HarvestMan can be used to download files from
websites according to a number of customized rules and constraints. It
can be used to find information from websites matching keywords or
regular expressions. The latest version of HarvestMan supports as much
as 60 plus customization options.

Download the files here:
http://harvestman-crawler.googlecode.com/files/Harvestman-2.0.4beta.tar.gz

Unzip and install:
tar -xzvf Harvestman-2.0.4beta.tar.gz
cd Harvestman-2.0.4beta
python setup.py install

Create config file and run harvestman:
harvestman --selftest
harvestman --genconfig (open easy web gui and add the site you want
to crawl, and all the details. Save the config xml file)

Run harvestman
harvestman -C mycrawl.xml
or use harvestman from a command line
harvestman -h

Project website:
http://code.google.com/p/harvestman-crawler/


Forward to anybody that might be interested!!!!

Thank you,
Harvestman Team

Reply all
Reply to author
Forward
0 new messages