Following 301 redirect on robots.txt

269 views
Skip to first unread message

Brendan Reekie

unread,
Jan 23, 2025, 2:32:50 PMJan 23
to ZAP User Group
Hi,

I'm currently attempting a spider scan on a domain when performing a GET:robots.txt is receiving an http 301 with a new location for the file.  Is there any configuration or options to enable the spider scan to follow the redirect?

Thanks in advance,
Brendan

kingthorin+zap

unread,
Jan 24, 2025, 11:35:44 AMJan 24
to ZAP User Group
This is kinda of a lose lose for us. If we follow it serves you well, if we don't follow then it serves others well.
I "guess" we could add an option for it 🤷‍♂️ Though that's kinda questionable as well.

Now that I'm thinking about it more, does the standard even allow for redirects? I'm pretty sure it's supposed to exist in root or /.well-known (maybe not even that: https://en.wikipedia.org/wiki/Well-known_URI)

Here's how Google treats it, though I couldn't find a specific standard or RFC that states redirects are or aren't supported. Arguably from my point of view, if it isn't a 200 right off the bat then it isn't in the expected location and shouldn't be found/obeyed. (Yes I understand that ZAP isn't using it the same as as web crawlers, but gotta be reasonable somehow).

> When requesting a robots.txt file, the HTTP status code of the server's response affects how the robots.txt file will be used by Google's crawlers. The following table summarizes how Googlebot treats robots.txt files for different HTTP status codes.

> 3xx (redirection)
> Google follows at least five redirect hops as defined by RFC 1945 and then stops and treats it as a 404 for the robots.txt file. This also applies to any disallowed URLs in the redirect chain, since the crawler couldn't fetch rules due to the redirects.

> Google doesn't follow logical redirects in robots.txt files (frames, JavaScript, or meta refresh-type redirects).

Simon Bennetts

unread,
Jan 28, 2025, 9:19:48 AMJan 28
to ZAP User Group
Agreed.
I think we should only handle 200's on a robots.txt file. Anything else is much more likely to be a redirect to home or an error page.

Cheers,

Simon
Reply all
Reply to author
Forward
0 new messages