Google site crawling/indexing policy rumor
I've been getting a lot of queries over the last few days regarding a
recent rumor spreading rapidly that Google has or will soon be
changing their site crawling policy to omit indexing of sites that do
not have a robots.txt file. This rumor appears to probably trace back
to this Google webmaster community video:
https://support.google.com/webmasters/community-video/360202946/fix-robots-txt-unreachable-error-website-not-indexing
This video does seem to say that robots.txt is the first file on a
site Google will attempt to retrieve, and if it can't find it the
crawling and indexing stops at that point. Not crawling/indexing sites
without robots.txt files would make no sense to me -- vast numbers of
older sites and others have never had robots.txt files because they
never felt a need to publish crawling restrictions.
Also note that this Google support doc:
https://support.google.com/webmasters/answer/6062598?hl=en
explicitly says that robots.txt files are not required for crawling/indexing.
Several sources at Google (thanks all!) have now confirmed directly to
me that robots.txt files are NOT required for indexing and no change
to this policy is in the pipeline. Of course, if you have a sitemap
that specifies a robots.txt file that doesn't actually exist you might
create a suboptimal crawling situation, but again Google's official
statement to me on this is that robots.txt files are NOT required to
be present for a site to be crawled and indexed by Google.
- - -
--Lauren--
Lauren Weinstein
lau...@vortex.com (
https://www.vortex.com/lauren)
Lauren's Blog:
https://lauren.vortex.com
Mastodon:
https://mastodon.laurenweinstein.org/@lauren
Signal: By request on need to know basis
Founder: Network Neutrality Squad:
https://www.nnsquad.org
PRIVACY Forum:
https://www.vortex.com/privacy-info
Co-Founder: People For Internet Responsibility
_______________________________________________
privacy mailing list
https://lists.vortex.com/mailman/listinfo/privacy