Limit on full hash request

93 views
Skip to first unread message

Chethan prakash

unread,
Feb 20, 2012, 2:56:04 AM2/20/12
to google-safe-...@googlegroups.com
Is there any upper limit on number of prefixes , for which we can make a request for full hash.

As of now if you make any request for hashes more than 4000 its failing.

Garrett Casto

unread,
Feb 21, 2012, 2:22:36 PM2/21/12
to google-safe-...@googlegroups.com
4000 is the limit. I'm not sure if this is documented anywhere, but we never expected anyone to come close to hitting this number. The designed use case is that this is sent per page load, so at most you would be requesting the hash prefixes of every resource on a page. Why exactly are you requesting 4000 prefixes at a time?

On Sun, Feb 19, 2012 at 11:56 PM, Chethan prakash <yours...@gmail.com> wrote:
Is there any upper limit on number of prefixes , for which we can make a request for full hash.

As of now if you make any request for hashes more than 4000 its failing.

--
You received this message because you are subscribed to the Google Groups "Google Safe Browsing API" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-safe-browsing-api/-/1X1RFJeFlKoJ.
To post to this group, send email to google-safe-...@googlegroups.com.
To unsubscribe from this group, send email to google-safe-browsi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-safe-browsing-api?hl=en.

Chetan prakash

unread,
Feb 22, 2012, 3:37:08 AM2/22/12
to google-safe-...@googlegroups.com
We are keeping the full hash also with the prefix in our local DB , as we donot want to make any further calls to GSB  when we are doing the lookups.

So we are requesting for full hash for all the prefixes that we get in one call of GSB, which sometime is more than 4000.

Thanks

Sam Cleaver

unread,
Feb 22, 2012, 10:12:06 AM2/22/12
to Google Safe Browsing API
Seems like a counter-intuitive way of way of doing it, by doing 4000
hash lookups at the same time as your update (so updating, retrieving
chunks, decoding, adding to database, then calculating sha-256 for
each and every prefix and looking up each one) you're putting a hell
of a strain on both your server and on Google's. The whole idea of
switching to the chunk based system is to make it more manageable, if
all developers started hammering the servers with requests for full-
length hashes (when they should be using them on-the-fly as required)
then I'm sure GSB would be nowhere near as lenient as it currently is
with regards to the API.

Furthermore it makes no sense at all to store the full-length has for
every prefix when in reality if you implemented it properly you'd only
ever need to make a request perhaps once in every 10-20 user requests.

--Sam


On Feb 22, 8:37 am, Chetan prakash <yoursche...@gmail.com> wrote:
> We are keeping the full hash also with the prefix in our local DB , as we
> donot want to make any further calls to GSB  when we are doing the lookups.
>
> So we are requesting for full hash for all the prefixes that we get in one
> call of GSB, which sometime is more than 4000.
>
> Thanks
>
>
>
>
>
>
>
> On Wed, Feb 22, 2012 at 12:52 AM, Garrett Casto <gca...@google.com> wrote:
> > 4000 is the limit. I'm not sure if this is documented anywhere, but we
> > never expected anyone to come close to hitting this number. The designed
> > use case is that this is sent per page load, so at most you would be
> > requesting the hash prefixes of every resource on a page. Why exactly are
> > you requesting 4000 prefixes at a time?
>

Garrett Casto

unread,
Feb 22, 2012, 12:11:12 PM2/22/12
to google-safe-...@googlegroups.com
So I still don't think I understand why you don't want to make calls to our server during page load. Is it just for latency? We try to be very fast on these hash completion requests, and generally have an end user latency of 50-100ms. So as long as you send of the safebrowsing request and the request for content at the same time, you should very rarely every be slowing down page load. Last time I checked our stats in Chrome suggested that this happens like 0.01% of the time. If it's for some other reason, I'd like to know what it is.

Chetan prakash

unread,
Feb 22, 2012, 1:52:25 PM2/22/12
to google-safe-...@googlegroups.com
Thanks for the response.

The way our system in designed , we have a cron job running that makes the call to GSB and updates the DB. Lookup is done by a separate component which accesses the same DB. For  us the main goal is to reduce the lookup time , we are ok if the cron job that takes little more time.

One more point to add here is , GSB does not give full length hash for all the add prefixes , it only sends full length hash for the prefixes which are just in add chunk but not in sub chunks, apart from the increase in the data storage is there any other demerits with this design.

Since we are storing the full length hashes , we wont be making call to GSB to get the hash every time when the positive match happens. I am assuming that will reduce the load on GSB also right?

Chethan Prakash

Patrick Kelley

unread,
Feb 22, 2012, 2:12:06 PM2/22/12
to google-safe-...@googlegroups.com
It almost seems like a violation of the policy.  However, they are simply trying to enumerate all full hashes, and not attempting to convert these full hashes to plaintext.  In any case, it definitively seems like an abuse of the API.  

The client should store the lists as it receives them and make no attempt to convert a hashed list to plaintext. 


Maybe the API documentation should be modified to explicitly ban this enumeration in the section titled, "HTTP Request for Full-Length Hashes"

Sam Cleaver

unread,
Feb 22, 2012, 2:56:17 PM2/22/12
to Google Safe Browsing API
> Since we are storing the full length hashes , we wont be making call to GSB
> to get the hash every time when the positive match happens. I am assuming
> that will reduce the load on GSB also right?

The design is flawed, most of the time your client will never even get
to the stage of having to do a full-length lookup. Therefore its not
reducing any load on GSB, even if the client did do a full-length
lookup regularly, hammering GSB with 4000 requests at once could be
compared to a DoS attack (my firewall would block you for sure).

On Feb 22, 6:52 pm, Chetan prakash <yoursche...@gmail.com> wrote:
> Thanks for the response.
>
> The way our system in designed , we have a cron job running that makes the
> call to GSB and updates the DB. Lookup is done by a separate component
> which accesses the same DB. For  us the main goal is to reduce the lookup
> time , we are ok if the cron job that takes little more time.
>
> One more point to add here is , GSB does not give full length hash for all
> the add prefixes , it only sends full length hash for the prefixes which
> are just in add chunk but not in sub chunks, apart from the increase in the
> data storage is there any other demerits with this design.
>
> Since we are storing the full length hashes , we wont be making call to GSB
> to get the hash every time when the positive match happens. I am assuming
> that will reduce the load on GSB also right?
>
> Chethan Prakash
>
>
>
>
>
>
>
> On Wed, Feb 22, 2012 at 10:41 PM, Garrett Casto <gca...@google.com> wrote:
> > So I still don't think I understand why you don't want to make calls to
> > our server during page load. Is it just for latency? We try to be very fast
> > on these hash completion requests, and generally have an end user latency
> > of 50-100ms. So as long as you send of the safebrowsing request and the
> > request for content at the same time, you should very rarely every be
> > slowing down page load. Last time I checked our stats in Chrome suggested
> > that this happens like 0.01% of the time. If it's for some other reason,
> > I'd like to know what it is.
>
> > On Wed, Feb 22, 2012 at 12:37 AM, Chetan prakash <yoursche...@gmail.com>wrote:
>
> >> We are keeping the full hash also with the prefix in our local DB , as we
> >> donot want to make any further calls to GSB  when we are doing the lookups.
>
> >> So we are requesting for full hash for all the prefixes that we get in
> >> one call of GSB, which sometime is more than 4000.
>
> >> Thanks
>
> >> On Wed, Feb 22, 2012 at 12:52 AM, Garrett Casto <gca...@google.com>wrote:
>
> >>> 4000 is the limit. I'm not sure if this is documented anywhere, but we
> >>> never expected anyone to come close to hitting this number. The designed
> >>> use case is that this is sent per page load, so at most you would be
> >>> requesting the hash prefixes of every resource on a page. Why exactly are
> >>> you requesting 4000 prefixes at a time?
>
> >>> On Sun, Feb 19, 2012 at 11:56 PM, Chethan prakash <yoursche...@gmail.com

Chetan prakash

unread,
Feb 22, 2012, 3:11:15 PM2/22/12
to google-safe-...@googlegroups.com
1.We are making a single request with all the four prefixes at once, so we are not bombarding google, actually we are making less number of requests.

2. I agree that most of the time we might not even be needing the full hash. But our use case is little different. We will be looking up lot of urls per second , and we would like to optimize that flow.

Silence

unread,
Feb 22, 2012, 8:32:24 PM2/22/12
to Google Safe Browsing API
I had the same scene and cann't get all fullhash also.
maybe we could use different Account to get it down by separate
request.
but i haven't tested it. maybe it doesn't work.


On 2月23日, 上午4时11分, Chetan prakash <yoursche...@gmail.com> wrote:
> 1.We are making a single request with all the four prefixes at once, so we
> are not bombarding google, actually we are making less number of requests.
>
> 2. I agree that most of the time we might not even be needing the full
> hash. But our use case is little different. We will be looking up lot of
> urls per second , and we would like to optimize that flow.
>

Garrett Casto

unread,
Feb 22, 2012, 9:06:29 PM2/22/12
to google-safe-...@googlegroups.com
On Wed, Feb 22, 2012 at 12:11 PM, Chetan prakash <yours...@gmail.com> wrote:
1.We are making a single request with all the four prefixes at once, so we are not bombarding google, actually we are making less number of requests.


By 4 here, you mean up to 4000, yes? I don't think that your going to be making that many requests, though your requests are obviously going to be more expensive to process.
 
2. I agree that most of the time we might not even be needing the full hash. But our use case is little different. We will be looking up lot of urls per second , and we would like to optimize that flow.


Like I said before, I don't think that speed is really a good reason for changing the workflow like this unless you have some good measurements that say otherwise. Really it doesn't matter much if one person makes these extra requests, but I hope you can see why we don't want everyone to be doing it.If for your particular use case (proxying large amounts of traffic?) this doesn't work, I'd like to know why.
Reply all
Reply to author
Forward
0 new messages