Crawler crawler = new Crawler(5);
Frontier frontier = new BasicFrontier();
frontier.add(new URI("http://**********")); -> Here I have to put the URI of the dump haven't I?
Then the LinkFilter. If I want to download only a dataset for example dbpedia and don't want to download the datasets it is connected with. Do I have to set it here? How?
LinkFilter linkFilter = new LinkFilterDefault(frontier);
crawler.setLinkFilter(linkFilter);
Next step is to set the ContentHandler. If I want to download datasets in any kind of format, is it mandatory to use Any23 server?
ContentHandler contentHandler = new ContentHandlers(new ContentHandlerRdfXml(), new ContentHandlerNx());
crawler.setContentHandler(contentHandler);
Finally I have to set the Sink, where I am storing all the information of the downloaded dataset. How I set it to download the information in RDF?
OutputStream os = new FileOutputStream("/Users/*************"); -> Do I have to put here the path with the file where I want to download the information?
Sink sink = new SinkCallback(new CallbackNQOutputStream(os));
crawler.setOutputCallback(sink);
Thank you I wait for your answers.
Regards