I am just puzzled by what I've just found in my database for google
safe browsing v2. I have the same host, Prefix in two different add
chunks.
Chunk_num Host_key prefix
creation_date
31992 0x0021DC6F 0x523AB964
2011-04-21 15:23:41:217
31800 0x0021DC6F 0x523AB964
2011-04021 15:25:02:590
I have the saved data to prove that this is from the same pull from
google down load. I believe it should not have duplicates in
different add chunks. Can someone explain to me if this is okay? On
the other hand, it may be okay since you need to have add chunk number
in sub chunks.
Thank you!
DCS DCS
--
You received this message because you are subscribed to the Google Groups "Google Safe Browsing API" group.
To post to this group, send email to google-safe-...@googlegroups.com.
To unsubscribe from this group, send email to google-safe-browsi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-safe-browsing-api?hl=en.
It seems like that google does not have any restrictions on their data
sending to clients, that makes database design to be inefficent for
client. This is minor issue since we can spend some time to speed it
up.
The main problem from it is when do we have to update the full paths
for these prefixes. If one prefix is deleted from one chunk, and the
other chunk with same chunk is not deleted. Are we supposed to keep
that full path for the prefix or we have to pull it every time?( it
looks like we have to every time, since we don't know what we have is
up to date even from last pull)?
Regards,
DCS DCS
Regards,
Thanks for your time and reply. Since the full path is only depended
on prefix. That mean we don't need to concern with host key or chunk
number for that matter, but only related to prefix for efficiency.
That means this full path is for the same prefix in different chunks
that may even have different host keys from download. And I am just
concerned with one prefix may have multiple full paths. (I guess that
could be the case from basic reasoning). Tell me otherwise.
The problem is that how can you be sure the full paths you have is up to date.
By the way, don't do the sorting, that will slow down the search in SQL.
Regards,
DCS DCS
Hi Sam:
Thanks for your time and reply. Since the full path is only depended
on prefix. That mean we don't need to concern with host key or chunk
number for that matter, but only related to prefix for efficiency.
That means this full path is for the same prefix in different chunks
that may even have different host keys from download. And I am just
concerned with one prefix may have multiple full paths. (I guess that
could be the case from basic reasoning). Tell me otherwise.
I am more concerned with efficiency. We only have max (5 hosts*6
paths) 30 prefixes to query the database. Since duplications in
different chunks for host, and prefix. I don't have any control how
many records will come back to me or how many unique chunk numbers
will be there in the query result. Can you guarantee it's going to be
only 30 records at most in the database not for full hash.
Regards,
DCS DCS
I tested a bunch sites that have malwares. But I do'nt have pages
that contain malwares for testing. Somehow if you have these urls
handy, can you send me some of the urls to me. I just don't find them
in internet.
Thank you!
Peter Zhou
On Tue, Jun 14, 2011 at 2:35 PM, Garrett Casto <gca...@google.com> wrote:
>
>