Working on converting an abot project into an abotx project for parallelized crawling.
The way I read it, each site should be seeded into ParallelCrawlerEngine with a single URL:
But how do you deal with scenarios where you have a site that you want to seed with multiple URLs? E.g. a few key points in the site that tend to have regularly new/updated links that you want to make sure get picked up and crawled right away.
I assume you don't want to create a SiteToCrawl for each one?
Another thing not clear to me from the abotx docs, where do I inject custom implementations at the crawler level? I guess that I should do this within the CrawlerInstanceCreated event handler, but not sure this is right:
crawlEngine.CrawlerInstanceCreated += (sender, eventArgs) =>
{
eventArgs.Crawler.Impls.HtmlParser = new MyHyperlinkParser();
eventArgs.Crawler.Impls.CrawlDecisionMaker = new MyCrawlDecisionMaker();
};
?