LDspider on hadoop

36 views
Skip to first unread message

Pradeep Kumar

unread,
May 22, 2015, 3:41:22 AM5/22/15
to ldsp...@googlegroups.com
how to run ldspider in a distributed enviroment?

Andreas Harth

unread,
May 22, 2015, 7:07:24 AM5/22/15
to ldsp...@googlegroups.com
On 2015-05-22 09:41, Pradeep Kumar wrote:
> how to run ldspider in a distributed enviroment?

you don't

Pradeep Kumar

unread,
May 22, 2015, 8:36:38 AM5/22/15
to ldsp...@googlegroups.com
Does this means, we cant run ldspider on hadoop??

Andreas Harth

unread,
May 22, 2015, 5:56:37 PM5/22/15
to ldsp...@googlegroups.com
Hi Pradeep,

On 2015-05-22 14:36, Pradeep Kumar wrote:
> Does this means, we cant run ldspider on hadoop??

I think so.

LDSpider uses multiple threads, but those are Java threads.
The parallel processing mode in Hadoop is something entirely
different. Plus, LDSpider requires to keep state, which would
imply something like HBase in a Hadoop infrastructure.

LDSpider has been designed to crawl small- to medium-sized
datasets, and I think it does that well.

We are doing some internal development right now in the direction
of LDSpider. What are your requirements?

Best regards,
Andreas.
Reply all
Reply to author
Forward
0 new messages