Hi Pradeep,
On 2015-05-22 14:36, Pradeep Kumar wrote:
> Does this means, we cant run ldspider on hadoop??
I think so.
LDSpider uses multiple threads, but those are Java threads.
The parallel processing mode in Hadoop is something entirely
different. Plus, LDSpider requires to keep state, which would
imply something like HBase in a Hadoop infrastructure.
LDSpider has been designed to crawl small- to medium-sized
datasets, and I think it does that well.
We are doing some internal development right now in the direction
of LDSpider. What are your requirements?
Best regards,
Andreas.