Last year we transitioned from a DNS-RR load balancing method to placing our tomcat servers running THREDDS behind an apache mod_proxy_balancer solution for our pools of *.
hycom.org servers (e.g.,
tds.hycom.org,
ncss.hycom.org, etc). All looked well until today I noticed that the robots.txt we had in place before was no longer present and an empty robots.txt, which has resulted in an uptick in web crawlers feeding off our data servers. This has likely resulted in OPENDAP server timeouts with the overload in traffic.
We have just fixed the missing robots.txt issue (
http://tds.hycom.org/robots.txt) and are now waiting for the crawlers to recognize this update and "honor" the Disallowed URIs for THREDDS (which are endless rabbit holes of data). If the issue does not die down in the next 24 hours then we will look for other potential culprits or manually block certain crawlers.