I have written a full service for checking URLS and domains and so on and GSB is one aspect of checking URLs. However, now I am coming to check it, I am running my service against a list of many thousands of known bad URLs. This generates quite a lot of hits in to the hashes, which of course means that unless I have them cached in the last 30 minutes, then I have to ask the Google service for the full hashes for a particular prefix and because a lot of these URLs are in the database, it means I am generating lots of full hash requests.
I am also trying to load test my service, and so I am running the checks as fast as I can deliver them (and I have not even got to multiple processes yet). However, once i hit about 500 full hash requests (which is pretty quickly in this case of course), then the call always returns 503 - Service unavailable, at least for a while.
So, I know that we are supposed to back off on requesting updates for the hashes and so on if a 503 occurs, and that is all implemented. But if the request for full hashes is going to fail, and we are supposed to back off from asking for more for minutes or hours, then it means it is impossible to create a real life system around GSB. It can work for a few users, where of course most of the requests will not be with malicious URLs, but I am looking eventually at lots of servers. And they will all appear to come from one IP address behind a NAT server.
So, assuming that I getting the 503 error because it thinks I am making too many requests, is there any way for me to request that the full hash requests are not throttled? I know that some people probably abuse it and go through all the prefixes asking for all the full hashes immediately, but I am definitely not doing that.
It seems that there is an ability to request more usage for the query API, but I don't see anything about the Protocol v2 system. Perhaps this limit was just not thought through and I am the first person to be load testing an array of servers with a list of URLs that generate many hits?
Thanks for any help,
Jim
I will email this to the usage request email for the lookup API, in case it is the same person that would deal with this :)