Hello,
i'm getting always null or empty e.CrawledPage.Content.Text in PageCrawlCompleted(object sender, PageCrawlCompletedArgs e) event.
the code looks like
var urls = new List<string>();
for (int nmid = 17518000; nmid <= 17618000; nmid += 1)
urls.Add("
https://www.foo.bar/" + nmid.ToString() + "/product/data");
var config3 = new CrawlConfiguration
{
MaxConcurrentThreads = 100,
MaxPagesToCrawl = 1000000,
DownloadableContentTypes = "application/json;charset=UTF-8",
UserAgentString = _chrome,
IsExternalPageCrawlingEnabled = false,
IsExternalPageLinksCrawlingEnabled = false,
MaxCrawlDepth = 1000000
};
var scheduler = new UrlScheduler(urls);
var decisionDefault = new CrawlDecisionMaker();
var crawler = new PoliteWebCrawler(config, decisionDefault, null, scheduler, null, null, null, null, null);
crawler.PageCrawlCompleted += PageCrawlCompleted;
}
class UrlScheduler : Scheduler
{
/// <summary>
/// Instantiate the URL queue with list of URLs.
/// </summary>
/// <param name="urls"></param>
public UrlScheduler(IEnumerable<string> urls)
: base()
{
this.Add(urls.Select(url => new PageToCrawl(new Uri(url))));
}
}
foo.bar (and urls in list) returns a valid json
log in debug mode shows no errors.
Thank you in advance.
Best,
Vlad