Make LDSpider behave like a non linked data spider.

44 views
Skip to first unread message

Altruist

unread,
Jun 17, 2013, 8:41:43 PM6/17/13
to ldsp...@googlegroups.com
Hello All

Is it possible for the LDSpider to behave like regular crawler which does not neccesarily looks for semantic links but follows regular hyperlinks and crawls based on the seedfile.

I know that this defeats the purpose of LDSpider but there are certain features of LDS that I need but I need it to function like a regular crawler.I am using the Java API.Any guidance would be greatly appreciated.

Thank You.

Thanks.

Andreas Harth

unread,
Jun 18, 2013, 6:11:54 PM6/18/13
to ldsp...@googlegroups.com
Hi,
you could set up Any23 which may have HTML handlers. You have to
make sure that the links get parsed out and fed back into the queue
(IIRC via the LinkHandler). Also, you might need to change the
Accept header and the Content-Type header check when downloading.

Good luck,
Andreas.
Reply all
Reply to author
Forward
0 new messages