Documents located in the same folder will crawl while others will
return the error.
Net Diag Returns...
DNS Server xxx TCP connection timed out - ACL'ed out?
DNS Server xxx OK
DNS Server xxx OK
NTP Server xxx OK
NTP Server xxx OK
SMTP Server xxxHost not Responding
SMTP Server xxx OK
Test URL http://www.xxx.com/robots.txt Host not Respondi
Test URL http://www.xxx.com/robots.txt OK - pingable
This is generally not good:
>Test URL http://www.xxx.com/robots.txt Host not Respondi
Do you get this exact same result everytime or is this intermittent?
And also the fact that it is intermittent during the crawl wound
indicate perhaps some network issues. Some things to look at:
1. Do you see the requests in your webserver logs coming through?
2. What happens when you request http://www.xxx.com/robots.txt from a
browser? First clear your cache and restart your browser to make sure
that it is not behind any security. The GSA must receive a 200 or 404
to crawl the site.
http://code.google.com/apis/searchappliance/documentation/46/admin_crawl/Introduction.html#robots
If you need to login, then you need to make sure that you add the
credentials for that URL pattern to Crawler Access.
3. Check you network speed settings on the back of the appliance
(orange port) and make sure they match with your switche settings. You
could also try auto-negotiate or some other setting to try it out.
4. Take a tcpdump between your webserver and your GSA. Do you see any
packets getting dropped?
Let us know what you find.
Brian
On Oct 9, 7:29 am, "Michael Kamara" <mikekam...@gmail.com> wrote:
> Also, Some documenets will crawl, then receive the robots.txt error on a
> second crawo, then crawl.
>