[Cross-posting to news:comp.infosystems.www.servers.misc, for my
questions aren't specific to Apache.]
[…]
> Baiduspider does not respect "/robots.txt"
I've just checked that it occasionally tries to GET /robots.txt
from one of my HTTP servers. I'm yet to check whether it
respects it or not (I don't have one just now.)
> nor repeated 403's. I block its entire set of IP ranges in my
> firewall.
BTW, is there a kind of black list of such unconscientious bots
(networks)? Or a kind of DNSBL?
TIA.
--
FSF associate member #7257
No DNSBL that I'm aware of. There are a handful of web sites dedicated to
user-agent identification but no malicious list per se.