Altruist
unread,Jan 21, 2013, 3:01:49 PM1/21/13Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ldsp...@googlegroups.com
Hello All,
I need to crawl a lrge set of URLs that are stored in a file and since adding all the URLs to the Frontier would lead to Out of Memory exception s , I need to be able to call the crawler iteratively with a fixed set of URLs read from the file in each iteration. as I understand that ldspider does not support specifying input from a file using the API , but does so from the command prompt.
Is calling the LDSpider iteratively the only way to handle the crawling of a large set of seed URLs. I have not implemented the iterative calling yet , but I wanted to double check if anyone has any better solution to this .
Thank You.