Using scheduler to preload with Urls, but then, the links on those preloaded Urls are not scheduled

Skip to first unread message

ghislain borremans

Feb 19, 2021, 3:29:04 PM2/19/21
to Abot Web Crawler
At present i load for ex 10 url's from a database with scheduler.
I add scheduler in the constructor and these pages get crawled, but the links found on those pages are not processed.
I assume it is because i added a scheduler to the constructor.
Is it possible to use the scheduler to load the first set of pages and then disconnect the scheduler so that the built in scheduler takes over?
If so, how can this be done?

best regards

Feb 22, 2021, 5:23:40 PM2/22/21
to ghislain borremans, Abot Web Crawler

Adding links to the scheduler should still allow the Abot/AbotX to process the links on each page. However, depending on the root uri (the first link crawled) you may need to tinker with the following config value settings....

/// <summary>
/// Whether pages external to the root uri should be crawled
/// </summary>
public bool IsExternalPageCrawlingEnabled { get; set; }

/// <summary>
/// Whether pages external to the root uri should have their links crawled. NOTE: IsExternalPageCrawlEnabled must be true for this setting to have any effect
/// </summary>
public bool IsExternalPageLinksCrawlingEnabled { get; set; }

also you can edit what the crawler considers "internal" by sending in a custom delegate to this property

You received this message because you are subscribed to the Google Groups "Abot Web Crawler" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

ghislain borremans

Feb 23, 2021, 11:59:16 AM2/23/21
to Abot Web Crawler
Thank you for the feedback. I will test them.

Op maandag 22 februari 2021 om 23:23:40 UTC+1 schreef sjdirect:


Apr 30, 2021, 2:16:41 AM4/30/21
to Abot Web Crawler
suffer from similar problem.

this leads to nothing:
            crawler.IsInternalUriDecisionMaker = (uriInQuestion, rootUri) =>

                return true;
            crawler.ShouldDownloadPageContentDecisionMaker = (crawledPage, crawlContext) =>
                var decision = new CrawlDecision { Allow = true };

                return decision;
and this too:

 var config = new CrawlConfiguration
               IsExternalPageCrawlingEnabled = true,

should i switch to commercial version? thank you.
Reply all
Reply to author
0 new messages