Local Database cache, removals, and indexes.

297 views
Skip to first unread message

Eriq VanBibber

unread,
Apr 20, 2018, 3:06:09 PM4/20/18
to Google Safe Browsing API
I'm in the process of implementing the threatListUpdates:fetch to have a local database for processing.

something that is unclear from the docs relates to index ordinal position and removals.

as i understand this process, i 'fetch' the "short" hashes as an ordered list of values.  the list returned is delimited by the "hash size".
(i also understand that it's b64 encoded).

so, if i have the following imaginary hashes in an initial fetch (full_update)...

Array
Index  |  HashCode
000    |  0xF1630001
001    |  0xF1630002
002    |  0xF1630003
003    |  0xF1630004
004    |  0xF1630005
005    |  0xF1630006

and then i later make a second request for updates (using client 'state' value) and that result has:

ADDITIONS:
Array
Index  |  HashCode
000    |  0xF16300A1
001    |  0xF16300B2

REMOVALS:
Removal Index:
[1]
[4]

I assume we add the 2 "new" hashes to the end of the previous list (giving a total of 8 values now).
Then after adding, we remove index "1" and "4" from the original list of 6?

I suspect then that i have to process the removal indexes in reverse order since each remove will cause the indexes of the items after it to change.
Either that, or i just "identify" the hashes by index first, then reprocess the list and remove each one.

Or, are we supposed to only "mark" indexes 1 and 4 as removed, but let them continue to occupy the index so that the index position of hashes never change?

The previous part is only slightly confusing and only because there's nothing written on this.

however, what happens on a 3rd update?

i'm quite concerned about index restructuring in this process.
assuming the 3rd update is not a full_update and replacement, if additional removes in the 3rd update also require the removal of index 1 and 4, is that the remove of the indexes are now in those positions as a result of the previous removal, or is it the same ones as before.

i hope that the ambiguity around the removal is understood and can be responded to.

-Eriq



Ben Sanders

unread,
Apr 23, 2018, 11:48:45 AM4/23/18
to Google Safe Browsing API
The logic is as follows:
1. Remove old indexes before adding.
2. When adding, insert the new items into your datastructure and sort them to get the new indexes.

Practically, implementations don't have to keep the sorted version of a list in memory, instead opting for a hashmap or similar.

That means on update, they briefly sort all the hash prefixes for a particular list in an auxiliary structure, and then remove the applicable items from their hashmap, and then just delete their sorted copy (and finally, insert new items into the map).

Eriq VanBibber

unread,
Apr 23, 2018, 4:44:04 PM4/23/18
to Google Safe Browsing API
Ok, just so i'm sure i understand.  the indexes for removal are based on the entire hash set being sorted first.  the sorting is ordinal/byte based as well i assume?  

i downloaded the full hash set (which is currently over 2mb of 4-byte hashes!).  Unless we create an index for sorting, the sort of this list each time could become cpu intensive.  are we guaranteed that the 'rawhashes' value returned is "pre sorted"?

also, this leads to a second question about hash size.  since it is possible to have different hash sets for different key sizes, i assume the indexing between the different key sizes is separate and independent, correct?  each hashset for a key size should be kept separate and sorted separately when removing values?

Thanks much.

Ben Sanders

unread,
Apr 23, 2018, 5:29:32 PM4/23/18
to google-safe-...@googlegroups.com
It is for the entire set, but each list is considered separately. So the SOCIAL_ENGINEERING list is going to be separate from a MALWARE list, which is already indicated in the responses. I believe the lists do arrive sorted on initial download, but I don't think that is guaranteed. They are certainly in-order if you request Rice encoding. 

A mitigating factor for the cpu usage is that the update call should only be done once per half hour (as prescribed by the server responses), so while it does use some CPU to sort during update, it is infrequent. The API is designed to save bandwidth (as well as being privacy preserving) for end users, most of which will only rarely need to perform a full hash check.

"Removals are zero-based indices in the lexicographically-sorted client database pointing at entries that should be removed from the local database."

So the sorted lists could have hashes in order:
0xaabbccdd 0xaabbccddee 0xffbbccdd

The removals isn't split by prefix size like the additions is.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.
To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

Eriq VanBibber

unread,
Apr 23, 2018, 5:47:11 PM4/23/18
to Google Safe Browsing API
Ben,

Thanks for the prompt responses.  It's much appreciated.
To explain some intent of use, this is NOT something that an end user will be accessing from a computer or device directly.  
We're building a new DNS proxy solution with some advanced features for global-dns locality and optimizations and security.  From a security perspective we want our DNS proxy to have an option to do hostname checks against google's database so that the dns query can be dropped or refused before a browser or other app even has a chance.  As such, an enterprise may have 100s or 1000s of users funneling thru this solution.  
We hope to prevent even providing IP addresses for bad names with this solution - which would cut down on much of the traffic.  
As such, i have to harden and optimize this aspect as much as possible.

Also, since i will only have hostname values, i assume i'll simply be constructing the value as http://dns-host-name-queried/ for hash lookup...correct?  I haven't seen mention of a separate hostname/domainname list.




Regarding the removal indexes...so, when we get different hashsets of different key sizes, we should union all those hashsets together first, then sort and that gives the index for removals?
as such removals for (example a 5-byte hash) will ALWAYS have an index that is greater than the last 4-byte hash of the ENTIRE database?

IE: (i'm using one char per byte for brevity)

0x0000
0x0001
0x0004
0x0303
0x0401
0x00131
0x00313

A removal of the 5-byte hash '00313' would have an index of 6.  is this what is being said?  
conceptually, by sorting lexicographically, larger hash sizes appear after smaller ones...that is the intent?

-Eriq

To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

Ben Sanders

unread,
Apr 23, 2018, 6:24:52 PM4/23/18
to google-safe-...@googlegroups.com
The API is definitely optimized for end user devices, but of course it will still work for your use case. The caching described at https://developers.google.com/safe-browsing/v4/caching will benefit your implementation even more than the ones built into browsers, as many people could get the same matching hash prefix from clicking the same link.

Since you're only using hostnames, you will do less permutations of checks (maximum 5). Following https://developers.google.com/safe-browsing/v4/urls-hashing#suffixprefix-expressions

For the domain "a.b.c.d.e.f.g"
You would try the hashes of:
"a.b.c.d.e.f.g/"
"c.d.e.f.g/"
"d.e.f.g/"
"e.f.g/"
"f.g/"

Note how b.* was skipped for this deep subdomain. Also a "/" is appended, but the scheme is stripped.

Regarding sorting: today I have learned something new, that there are multiple definitions of lexicographic ordering. According to https://en.wikipedia.org/wiki/Lexicographical_order, the version you are talking about (where shorter 'words' are always before longer ones) is used in combinatorics and is also called shortlex order. This is not how the API does it.

The way the API does it is like if you were sorting for a printed dictionary:
cat
catty
caw

So longer prefixes can be interleaved in shorter ones for the full ordering. Since these are hash prefixes, you could almost just consider the rest of the bytes of the hash 0x00 for sorting purposes, though that wouldn't be very memory efficient (so 0xaabbccdd00000000000000....). There would still be a potential issue sorting 0xaabbccdd against 0xaabbccdd00, but that shouldn't ever happen.

Fortunately, updates are still infrequent, so hopefully the amortized CPU impact is small.



To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

Eriq VanBibber

unread,
Apr 23, 2018, 6:46:13 PM4/23/18
to Google Safe Browsing API
Thanks for the detailed response.

I'll be sure to use a non-collated sort in my database in which i'll save this info.  i now have to see how SQLite sorts by default...i may just have to convert to hex strings first, then sort by text, but i'll figure something out.

Regards,
Eriq


To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

Eriq VanBibber

unread,
Apr 24, 2018, 2:12:35 PM4/24/18
to Google Safe Browsing API
So, a feedback comment on this whole thing...

parsing and using JSON is a pain :).  I know there is the protocol buffer option as well (but no docs on how to do it), and i'm not ready to use that either...
but for the JSON results, some additional metadata would be VERY helpful.

Not sure if or when such could be added, but here's what i propose:

{
  "listUpdateResponsesCount": 1,
  "listUpdateResponses": [
    {
      "threatType": "MALWARE",
      "threatEntryType": "URL",
      "platformType": "WINDOWS",
      "responseType": "FULL_UPDATE",
      "additionsCount": 1,
      "additions": [
        {
          "compressionType": "RAW",
          "rawHashes": {
            "prefixSize": 4,
            "rawHashesCount": 1000000,
            "rawHashesCharCount": 2211521,


My reasoning...with such new metadata, a parser can read these values to pre-allocate space, or to validate results after reading/parsing.   i realize that you have the checksum at the end, but one would have to read ALL the data first, then hash it.  however, if the count immediately didn't match other efforts, processing could stop or never start.

the 2+mb for the entire hashSet (for one list so far!) means several passes over the data to "figure things out".  in an effort to be efficient with code and cpu time, i'd like to operate in a forward-only pass on the data as much as possible.

also, in .NET and JAVA (and i suspect many other compilers/interpreters) strings are converted to Unicode in memory.  i have had to write my own JSON parser that works solely on the returned bytes so that the memory consumption is optimized and not wasteful.  if i use a common JSON to Object framework, the 'rawHashes' will double in size in memory, and i'd still have the original copy, so 3 to 4 times of the original size!

for the arrays in the JSON result, having a value that can be read with the count of array elements is valuable for pre-allocation and bounds checks.  also, it helps for cases where maybe there's a desire to skip over X-number of elements first.

anyway, i hope this feedback is valuable and has a chance of becoming real.

Regards,
Eriq
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

Eriq VanBibber

unread,
Apr 25, 2018, 11:19:06 AM4/25/18
to Google Safe Browsing API
Ok new question on removals.

if i'm maintaining a local cache from 2 lists, and i get an update to remove an element, and if that resulting hash existed in both lists locally, could i get the hash from the index in the first list and simply find that same hash in the other list and remove from there?  the intent is to save from having to do a second sort of hashes to line up the indexes.

Regards,
Eriq

Ben Sanders

unread,
Apr 25, 2018, 6:14:01 PM4/25/18
to google-safe-...@googlegroups.com
Since the removals are specific to a list (threatType/threatEntryType/platformType), it is possible for a hash prefix to be in multiple lists, but then only removed from one of them. For instance, the hash prefixes in two different lists could be for two different urls! Also a single hash prefix in one list can also have multiple matching full hashes for that list. Finally, If a single URL was for some reason on both social engineering and malware lists, it could be removed from one, but still be on the other. So you can't just take a single hash prefix removal and blanket-apply it elsewhere.

From your other post, re:protobufs, you can see how our open source client (written in Go) does it:
In particular api.go handles the http stuff, setting the content type to "application/x-protobuf" instead of "application/json", and setting a query parameter "?alt=proto"

I will forward your comments regarding the json serialized form of the data. Protobufs do address several of your complaints. For instance, the raw hashes are actually byte arrays, but when we convert to json, they are converted to base64 encoded strings (since json doesn't have a 'byte field' type). Most languages can handle byte arrays with minimal overhead, but like you said strings can sometimes have unexpected size blowups. Protobufs also do encode length fields inside of them, so the decoder takes advantage of that.

Also, for parsing json in a streaming fashion in Java, you could look at the Jackson libraries (https://github.com/FasterXML/jackson). It contains both streaming and regular decoding. Beyond that, I always encourage profiling to find where the real CPU issues are. Since updates are infrequent (~30 minutes, and often implemented in a non-blocking way in a background thread), you might have more bang-for-your-buck optimizing elsewhere.

To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

Eriq VanBibber

unread,
Apr 25, 2018, 6:28:52 PM4/25/18
to Google Safe Browsing API
Ben,

again, thanks for your prompt and great answers!

i have successfully moved to protobufs for this work...so we are obviously thinking alike on that.  definitely solves several issues :), including the size of the returned data!
So, we can set aside the JSON stuff for now, but i still think it valuable for others to put some additional meta on the return, and not just for GSB...sounds like you might feel the same since you pushed it up to the team to consider - thanks!

you're answer is spot on and answered several unasked questions i had.  i know what to do now to maintain this local database, including the sorting in SQLite (for any other reading, casting the hash value as a TEXT before calling ORDER BY will give the proper sort sequence, but without actually converting to strings).

another vague area in the docs for me is with regards to the eventual full-hashes that i'll get from the find call (if a match is found in the short-hashes).  i feel like i should cache these results for some time.  remember that i may have 1000s of users funneling thru this proxy...seems silly to keep asking google for the same answer.   but, full hashes may have a higher attenuation between "on list/off list" than the short hashes.
so, any suggestions on how long to hold a full hash before submitting the query again?  i'm thinking once a day or so, but i really don't know where to start.

and finally, and this maybe way out of your space, is there something smart we could do to tell a chrome browser, for example, that we already did the GSB lookup on the domain?  possibly saving the client PC from duplicate step which in an enterprise could turn out to save substantial bandwidth and traffic utilization?  i'm thinking that we could respond to a unique DNS query that normal DNS servers wouldn't understand and would normally respond with "huh? query not understood".  

To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

Ben Sanders

unread,
Apr 25, 2018, 7:07:41 PM4/25/18
to google-safe-...@googlegroups.com
Yes, caching is important, though we do limit how long results are cached. That is so that if we make a mistake (blocking a good site), we can quickly unblock people from getting to the site, even if their browser has cached it. So cached responses are typically on the order of minutes (often 5 minutes for a positive match). 

https://developers.google.com/safe-browsing/v4/caching

It describes both *positive* match and *negative* match caching. The examples in that document show how those two concepts interact for various cases. As I mentioned before, I think your centralized servers will still immensely benefit from caching. Imagine a link to a known phishing site being sent to an entire company. Many of the people who are tricked could click the link within minutes of receiving it, but only incur one or two API requests from your end.

As far as signaling to chrome/firefox/safari goes, I am not aware of any remote mechanism. Chrome (and I believe the other browsers) will start doing safebrowsing checks in parallel as soon as a resource is requested (including the DNS request). It is possible to disable safebrowsing in these browsers with the appropriate flags/configuration settings (different for each), though I would recommend against that. Since you can only check domains, you will of course miss when a more specific url path is blacklisted. For instance, if someone uploads a malicious file to Github, the block will only cover the malicious file or project while everyone else can still use Github as normal. Safe Browsing also provides other services for browsers, including download protection (https://wiki.mozilla.org/Security/Download_Protection).

To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

Eriq VanBibber

unread,
Apr 25, 2018, 8:18:04 PM4/25/18
to Google Safe Browsing API
Thanks again Ben.

so, we'll start with caching of full hashes for a few minutes, but ultimately will leave this configurable by an admin of the solution.  thank for providing some guidance on that.
I'll also review the caching link you sent...i missed it somehow.

here's a question that may ultimately have no importance or impact, but i'd like to know anyway:  are the hashes in the 'fetch' result expected to be pulled in network-byte-order?  It's sort of like endianess, but i know that these values are not numbers so it may not be relevant as long as i always pull the bytes the same way.  but, if google added features later that provided a single hash, i'd want to know now the byte order of the value.

for now i'm going to code for network-order (as they appear in the byte array).

Regards,
Eriq
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

Eriq VanBibber

unread,
Apr 25, 2018, 9:11:54 PM4/25/18
to Google Safe Browsing API
one last question...is there a test list with removals that can be used to validate how my code is working?

right now all of the work is on assumptions...not a good premise for enterprise software ;-).

for example, is there a threatType="test"/platformType="testplatform"/threatentryType="testType" list?

if not, could there be?

as you state, the frequency of removals is quite low, but i'd hate to remove the wrong one when it does occur.

-Eriq
To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="00BYUB7eAQAJ" rel="nofollow" onmousedown=

Ben Sanders

unread,
Apr 26, 2018, 12:29:54 PM4/26/18
to google-safe-...@googlegroups.com
I said the frequency of updates is low. The response from downloading a list(or update) tells you how long to wait before updating again. That is typically 30 minutes. You will probably have additions and removals most times you update. What I was trying to say before was that time spent updating is small compared to the time spent checking.

There is no test list currently, but you're not the only one interested in that. There are test urls at http://testsafebrowsing.appspot.com/. The first section in particular is relevant to you (some of the others are for download protection, which is different). None of them are domain-only though, and they are never removed from the various lists.

I'm not sure I fully understand your byte-order question. Like you say, the hashes are not numbers (except in the sense that you can represent anything with a number...). The prefixes are the first 4 (or more) bytes from the output of sha256. The full hashes sent back are simply the entire 32 byte value.

To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="00BYUB7eAQAJ" rel="nofollow" onmousedown=

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

Eriq VanBibber

unread,
Apr 27, 2018, 4:02:49 PM4/27/18
to Google Safe Browsing API
Thanks Ben!

I think all my questions are now answered.  Sorry for misunderstanding the frequency element.  I have no concerns on how often to pull the list, even if that changed to an hour.  the product will offer the admins the ability to choose a time interval, but if less than google's required delay it will be ignore and the google imposed delay will be used.  there may be some that only want an update once an hour or so...who knows?

after getting the data now into my sql tables, and testing the sorting, i don't have as much concern as before.  thankfully, SQLite supports blob indexes and can cast a blob to text for sorting such that the lexicographical sort lines up as needed.  

your last statement about byte-order clarified it properly.  until now i did not know that the short hashes were simply the prefixes of the full hashes.  that makes SO much sense now about all of it.  that information should be published in the docs in my opinion.

thanks again.

-Eriq
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="00BYUB7eAQAJ" rel="nofollow" onmousedown=

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

Ben Sanders

unread,
Apr 27, 2018, 4:10:02 PM4/27/18
to google-safe-...@googlegroups.com
Glad I could help! Good luck with your project.

To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="00BYUB7eAQAJ" rel="nofollow" onmousedown=

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

Eriq VanBibber

unread,
Apr 28, 2018, 12:34:54 PM4/28/18
to Google Safe Browsing API
Thanks.  

i did find one last question/validation.


I assume that it is up to me to determine how much of the full hash to use against my local cache for local lookup?
If i know (because of my own tracking) that my local database has 4,5, and 8 byte prefixes, i should start with the longest prefix length first, and if i don't have an 8 byte match, then check the 5 and 4 byte prefixes.
i should then send the largest prefix i have in my local database to google for the 'find' method.

does this sound right?  such is not well detailed in the docs.

i also assume that if i have an 8byte prefix in my local database, that there should be a 4 and 5 byte hash as well.  is it possible to ONLY have a larger prefix and not smaller ones (this doesn't seem right to me)?

-Eriq

To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="00BYUB7eAQAJ" rel="nofollow" onmousedown=

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsing-api+unsub...@googlegroups.com.

Alex Wozniak

unread,
Apr 29, 2018, 10:31:49 PM4/29/18
to google-safe-...@googlegroups.com
Hi Eriq,

Yes, you are responsible for handling those local lookups. You can check out our open source client for one possible data structure to facilitate the hash collision check: https://github.com/google/safebrowsing/blob/master/hash.go#L99

For a given set of local hash prefixes, you will collide with at most one local prefix for a given full hash. This is because the set will not contain any overlapping prefixes. In this case, you will send a request with that prefix if there is a collision.

Across multiple sets of local prefixes, you may collide more than once at various lengths for a given full hash. You're right that our documentation doesn't specify the behavior explicitly for this scenario. Your proposal for sending the longest would work.

Hopefully that helps.

Alex

To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="00BYUB7eAQAJ" rel="nofollow" onmousedown=

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-safe-browsing-api/0ujjVp_NeWc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-safe-browsi...@googlegroups.com.

To post to this group, send email to google-safe-...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-safe-browsing-api.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Google Safe Browsing API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-safe-browsi...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages