Starting with LDSpider

107 views
Skip to first unread message

albertonm81

unread,
May 8, 2013, 6:58:47 AM5/8/13
to ldsp...@googlegroups.com
Hello and thanks in advance.
I am starting using LDSpider I am having problems with just the first part of the development. I have an error using these lines:

frontier.setBlacklist(CrawlerConstants.BLACKLIST);
frontier.add(new URI(seedUri));

Eclipse says that the method setBlacklist is not define and in the second line that the method add URI is not applicable in the type Frontier.

I also have some questions:
What is a Frontier? And where can I find information about it? I can not find it in the documentation.
What is the seedUri? I imagine it refers to the URI you use as the begining to crawl, isn't it?

Thank you again and regards.

albertonm81

unread,
May 8, 2013, 7:48:06 AM5/8/13
to ldsp...@googlegroups.com
Ok I could fix the problem with the URI and now I have all the answers about it, but I still didn't find information about the Frontiers. So it could be helpful if anyone could give me an answer about it or where to find information.
Thank you and sorry for this stupid questions, I am starting with it.
Regards

Andreas Harth

unread,
May 9, 2013, 1:43:08 AM5/9/13
to ldsp...@googlegroups.com
Hi,

On 08/05/13 04:48, albertonm81 wrote:
> Ok I could fix the problem with the URI and now I have all the answers
> about it, but I still didn't find information about the Frontiers. So it
> could be helpful if anyone could give me an answer about it or where to
> find information.

read up on Wikipedia about crawling:

"A Web crawler starts with a list of URLs to visit, called the seeds. As
the crawler visits these URLs, it identifies all the hyperlinks in the
page and adds them to the list of URLs to visit, called the crawl
frontier. URLs from the frontier are recursively visited according to a
set of policies." [1]

LDSpider follows the standard HTML crawler architecture, but groks
RDF documents and links.

Good luck!

Best regards,
Andreas.

[1] http://en.wikipedia.org/wiki/Web_crawler

albertonm81

unread,
May 9, 2013, 5:02:37 PM5/9/13
to ldsp...@googlegroups.com

Thanks for the information about Frontiers, but maybe I didn't explain myself correctly. What I was asking also is if there is any documentation about the methods used with the class Frontier.
Thanks again.
Regards.


El miércoles, 8 de mayo de 2013 12:58:47 UTC+2, albertonm81 escribió:
Reply all
Reply to author
Forward
0 new messages