Hi All,
I have noticed that when LDSpider is given a URL to follow it apparently reads the robots.txt does I say that when I give a specific page in a website to be followed I get the following message in the console .
1357845366 280 127.0.0.1 TCP_MISS/200 2154 GET
http://www.guardian.co.uk/robots.txt - NONE/- text/plain
Jan 10, 2013 2:16:06 PM com.ontologycentral.ldspider.http.internal.ResponseGzipUncompress process
INFO: gzip compression
Does this mean that the Disallow directive in the robots.txt would be respected by LDSpider for all the URLs passed into the import com.ontologycentral.ldspider.frontier.Frontier class.
I am passing a set of URLs to the Frontier class and I need to make sure that LDSpider ignores any URLs that are present in the disallow directive of the robots.txt file.
Can anyone please confirm the behavior of LDSpider.
Thank You.