crawling all pages using Abot Webcrawler

chakrapani beemanaboina

unread,

Sep 1, 2016, 5:52:04 AM9/1/16

to Abot Web Crawler

Hi all,

I am using Abot web crawler to crawl "http://bestessayexperts.com/" website.But crawler had crawled only 26 pages.

The same website has 1,53,000 pages in google search engine.

Below is the Abot web crawler configuration which i set in the app.config file

<abot>

<crawlBehavior

maxConcurrentThreads="2"

maxPagesToCrawl="1000"

maxPagesToCrawlPerDomain="0"

maxPageSizeInBytes="0"

userAgentString="Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"

crawlTimeoutSeconds="0"

downloadableContentTypes="text/html, text/plain"

isUriRecrawlingEnabled="false"

isExternalPageCrawlingEnabled="false"

isExternalPageLinksCrawlingEnabled="false"

httpServicePointConnectionLimit="200"

httpRequestTimeoutInSeconds="15"

httpRequestMaxAutoRedirects="7"

isHttpRequestAutoRedirectsEnabled="true"

isHttpRequestAutomaticDecompressionEnabled="false"

isSendingCookiesEnabled="false"

isSslCertificateValidationEnabled="false"

isRespectUrlNamedAnchorOrHashbangEnabled="false"

minAvailableMemoryRequiredInMb="5"

maxMemoryUsageInMb="0"

maxMemoryUsageCacheTimeInSeconds="0"

maxCrawlDepth="1000"

isForcedLinkParsingEnabled="false"

maxRetryCount="0"

minRetryDelayInMilliseconds="0" />

<politeness

isRespectRobotsDotTextEnabled="true"

isRespectMetaRobotsNoFollowEnabled="false"

isRespectAnchorRelNoFollowEnabled="false"

isIgnoreRobotsDotTextIfRootDisallowedEnabled="false"

robotsDotTextUserAgentString="abot"

maxRobotsDotTextCrawlDelayInSeconds="5"

minCrawlDelayPerDomainMilliSeconds="0" />

</extensionValues>

</abot>

How to crawl all pages using Abot web crawler?

Waiting for reply.Thanks in advance.

Regards,

Chakrapani

sjdi...@gmail.com

unread,

Sep 1, 2016, 12:12:10 PM9/1/16

to chakrapani beemanaboina, Abot Web Crawler

I ran the crawl through Screaming Frog (another crawler) and it found 22 pages. Doesn't look like your configuration is the issue. Abot can only crawl the hyperlinks that it finds. When I look at that home page I only see 13 anchor tags and the pages that I clicked on have pretty much the same links (mostly just nav links). I suspect that there is no way for a crawler to get to all the pages that you see in your google index search (ie... there is no link path to all of them). I did notice that there is a sitemap.xml with a ton of links. I suspect that sitemap was submitted to google and that's how it knows about them. Abot does not crawl the sitemap.xml file for links but since it is so flexible you could modify it's behavior to do so. A little more information here. Maybe you can reply to that thread and ask what his solution was.

-

--
You received this message because you are subscribed to the Google Groups "Abot Web Crawler" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abot-web-crawler+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

just...@gmail.com

unread,

Nov 10, 2017, 3:08:09 PM11/10/17

to Abot Web Crawler

I used your configuration to crawl http://www.classgist.com and it crawled over 100 pages. Your configuration isn't the problem.

Reply all

Reply to author

Forward