Iteratively call the crawler.

19 views
Skip to first unread message

Altruist

unread,
Jan 21, 2013, 3:01:49 PM1/21/13
to ldsp...@googlegroups.com

Hello All,

I need to crawl a lrge set of URLs that are stored in a file and since adding all the URLs to the Frontier would lead to Out of Memory exception s , I need to be able to call the crawler iteratively with a fixed set of URLs read from the file in each iteration. as I understand that ldspider does not support specifying input from a file using the API , but does so from the command prompt.

Is calling the LDSpider iteratively the only way to handle the crawling of a large set of seed URLs. I have not implemented the iterative calling yet , but I wanted to double check if anyone has any better solution to this .

Thank You.
Reply all
Reply to author
Forward
0 new messages