> Feed readers are not expected to follow robots.txt, as they are not
> web robots, they are user agents.
Hrm. That is not an entirely accurate assumption to make ;)
I assume the disallow rules are put in to prevent feeds showing up in
search results, but the side effect is that they are preventing feed-
aware bots from efficiently indexing new posts.
It's mine, (well, the company I'm working for, anyway)
It's a web crawler, but for sites with feeds, it tries to use the feed
links rather than following all links on the site.
Feedbot seems to be doing something similar... they say on their site
that they ignore robots.txt rules for feed xml files, but the problem
I'm having with feedproxy.google.com is that *everything* is
disallowed, not just the feeds...
Who would we contact to have this changed to the correct "/~r" pattern
if this is the case?
I'm hesitant to begin ignoring robots.txt with our application, which
will often fetch an HTML page simply to determine where associated
feeds are located.
Best,
James
On Aug 19, 6:27 am, Franklin Tse [Community Expert] wrote:
We will soon be applying the exact same robots.txt pattern to
feedproxy.google.com as has been in place on feeds.feedburner.com for
quite some time:
User-agent: *
Disallow: /~a/
This should permit all readers/crawlers that previously retrieved feed
content, but now get a blocked response, to start working properly
again. Apologies for the inconvenience!
> Who would we contact to have this changed to the correct "/~r" pattern
> if this is the case?
> I'm hesitant to begin ignoring robots.txt with our application, which
> will often fetch an HTML page simply to determine where associated
> feeds are located.
> Best,
> James
> On Aug 19, 6:27 am, Franklin Tse [Community Expert] wrote: