Any progress in Google Safe Browsing Test?

58 views
Skip to first unread message

Sihai

unread,
Jun 21, 2010, 11:00:19 AM6/21/10
to Google Safe Browsing API
Hi,
Do you and any other people get progress on the google safe browsing
url test?
if I use md5("http://malware.testing.google.test/testing/malware/"), I
can't get teh right md5, but
if I take out "http://" and make it like
md5("malware.testing.google.test/testing/malware/"), it will match
"dc5178cc1a0820bc434c83d2f089f105".

Questions,
1)Should the canonicalize url include "http://"
2)Where/how can we get more sample for the test? That's important,
because I can see the url behide md5.

Thansk,

Simon

MarkD

unread,
Jun 29, 2010, 1:03:39 AM6/29/10
to Google Safe Browsing API
Hi,
I'm still working on my implementation, but I think I can help out
here.

First. After canonicalization you need to enumerate all the prefix/
suffix pairs. See section "Simplified Regular Expression Lookup".
In there you will see that scheme (that's the http:// part) as well as
username, password, and port are disregarded.

Second. You need to use SHA-256 hashing not md5. See section "shavar
List Format"

Here is the diagnostic output of my client.
Note that I treat all prefixes as being 4 bytes long and store
anything else in a separate field.
Strictly speaking the prefix downloaded maybe any whole number of
bytes

I second the call for more test data. Perhaps a separate list which
will have a well defined sequence of updates so we can test our chunk
handling as well as hash lookups.

Mark D


Lookup http://malware.testing.google.test/testing/malware/

Enumeration: malware.testing.google.test/testing/malware/
Hash =
518640453f8b2a5f0d43bc225152f49530be2a40bfe2bab60aaaee7a67b10890
Looking up 51864045
Found full hash
518640453f8b2a5f0d43bc225152f49530be2a40bfe2bab60aaaee7a67b10890 in
list 0

Enumeration: malware.testing.google.test/
Hash =
8d0e716913e10a54652f4c2ae6ef307096b6a77ee7296fd09698749587df9937
Looking up 8d0e7169
No prefix match for 8d0e7169

Enumeration: malware.testing.google.test/testing/
Hash =
dfbadad14a99ef077884f34185252e9cdac3a1840820e89c58b6e4a3e6d1a3ec
Looking up dfbadad1
No prefix match for dfbadad1

Enumeration: testing.google.test/testing/malware/
Hash =
9c38fc920152d02df8b642eb882d9fc6427c3d8b4959b52e8e15203d2e88bab1
Looking up 9c38fc92
No prefix match for 9c38fc92

Enumeration: testing.google.test/
Hash =
b2ae8c6f3287afb9ffb8a7ffdf2794c00eecd184308f27b276acebdc766a2cbe
Looking up b2ae8c6f
No prefix match for b2ae8c6f

Enumeration: testing.google.test/testing/
Hash =
d130efcc14a58b6c0dc6e0f6bd5a9a983312b0ab78486901334d6e73c91227ff
Looking up d130efcc
No prefix match for d130efcc

Enumeration: google.test/testing/malware/
Hash =
d158940c9d1b8071c3ba8e72227097134f5ca9d153019f51f797adbf4490ba9f
Looking up d158940c
No prefix match for d158940c

Enumeration: google.test/
Hash =
29ae2be7fb37c6704bcc61bed84949d8cc7761c5e60bce81223311f97b642d33
Looking up 29ae2be7
No prefix match for 29ae2be7

Enumeration: google.test/testing/
Hash =
5165e348c9f27d59158c679a4ae6a5aeb51098d4bf134839ce7274498298f4c7
Looking up 5165e348
No prefix match for 5165e348

Beaver6813

unread,
Jul 4, 2010, 12:30:49 PM7/4/10
to Google Safe Browsing API
Hi,

The canonicalized URl should include http://, a load of examples have
been put here:
http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization
Basically your application should match all of those ie.
http://host/%25%32%35%25%32%35
should give
http://host/%25%25

I'll post up some MD5's generated from my app on:
http://code.google.com/p/phpgsb/
when I get a chance!

--Sam

Kevin

unread,
Jul 9, 2010, 6:06:30 PM7/9/10
to Google Safe Browsing API
The header of response body I get for looking up 51864045 is goog-
malware-shavar:20045:32.
However, I couldn't able to find add chunk number 20045 from the data
I downloaded using redirect url. Does anyone having the same issue?

On Jun 28, 10:03 pm, MarkD <mark.davoren...@gmail.com> wrote:
> Hi,
> I'm still working on my implementation, but I think I can help out
> here.
>
> First. After canonicalization you need to enumerate all the prefix/
> suffix pairs. See section "Simplified Regular Expression Lookup".
> In there you will see that scheme (that's the http:// part) as well as
> username, password, and port are disregarded.
>
> Second. You need to use SHA-256 hashing not md5. See section "shavar
> List Format"
>
> Here is the diagnostic output of my client.
> Note that I treat all prefixes as being 4 bytes long and store
> anything else in a separate field.
> Strictly speaking the prefix downloaded maybe any whole number of
> bytes
>
> I second the call for more test data. Perhaps a separate list which
> will have a well defined sequence of updates so we can test our chunk
> handling as well as hash lookups.
>
> Mark D
>
> Lookuphttp://malware.testing.google.test/testing/malware/

Beaver6813

unread,
Jul 11, 2010, 1:05:09 PM7/11/10
to Google Safe Browsing API
I get a match for 20045 in goog-malware-shavar. Its chunklen on my end
however is apparently 5282. (and there are quite a few hostkeys within
it).
Have you been continously updating? It could be that your database is
corrupt or incomplete. (It doesn't get everything in the first
update).

--Sam

Kevin

unread,
Jul 11, 2010, 9:18:57 PM7/11/10
to Google Safe Browsing API
I've been continuously updating my database in the last two days.

Currently for goog-malware-shavar, minimum for add chunk number that I
have is 22081 and maximum is 22526.

--Kevin

Beaver6813

unread,
Jul 13, 2010, 11:54:05 AM7/13/10
to Google Safe Browsing API
The minimum I have is 19590, maximum is 22578. I'm writing up a few
test tools at the moment that'll hopefully help people out with
confirming their data etc. What script are you using to process the
chunks?

--Sam

Kevin

unread,
Jul 13, 2010, 1:28:25 PM7/13/10
to Google Safe Browsing API
I'm using Java.

I even try to save the chunks into the file system and grep over it
but couldn't find that chunk id.

-Kevin

Beaver6813

unread,
Jul 13, 2010, 5:35:58 PM7/13/10
to Google Safe Browsing API
Hmm its difficult to debug, could be so many reasons. What are your
request ranges looking like? Are there any gaps i.e. 100-150,152-260
as that could cause the GSB server to give weird responses.

If you want to lookup and hashes in the future to see what chunks they
show up in I've built up a couple of testing tools at http://gsbtool.beaver6813.com/

Let me know how it goes

--Sam

Kevin

unread,
Jul 13, 2010, 6:06:42 PM7/13/10
to Google Safe Browsing API
Here's is the request. It certainly has lots of gaps.

googpub-phish-shavar;a:
58622,59232,59444,59834,59887,60036,60145-60146,60172-60173,60179-60181,60185,60189-60190,60201,60207-60219,60222-60247,60250-60259,60262-60312,60315-60319,60323,60327-60340,60344-60352,60355-60360,60363,60366-60370,60373-60431,60436-60475,60479-60483,60486-60492,60495-60496,60500-60503,60506-60509,60512-60554,60557-60570,60574-60577,60580-60583,60587-60606,60609-60617,60620-60623,60626-60632,60635,60640-60643,60648-60654,60657-60719,60722-60732,60735-60812,60815-60845,60849-60859,60862-60863,60867-60877,60880-60883,60886-60889,60893-60899,60902-60907,60911-60980,60983,60987-61008,61011-61029,61032-61115,61118-61133,61136-61137,61140-61160,61163,61166-61232,61235-61286,61289-61291,61294-61300,61304-61362,61365-61371,61374-61380,61383-61395,61398-61400,61403-61408,61411-61416,61419,61422-61474,61477-61481,61484-61485,61491-61504,61507,61510-61513,61517,61520-61529,61532-61535,61538-61542,61545-61582,61585-61608,61611-61648,61651-61663,61666-61673,61676-61754,61757-61763,61766-61786,61789,61792-61794,61797-61869,61872-61874,61877-61881,61884-61996,62000-62011,62014-62022,62025-62044,62047-62105,62109-62114,62118-62136,62139-62140,62143-62146,62149-62178,62181-62245,62248-62259,62262-62263,62266,62269,62272-62273,62276-62287,62291-62298,62302-62317,62320-62321,62324,62327-62328,62331-62367,62370-62391,62394,62398-62402,62405-62413,62417-62419,62422-62432,62435-62510,62513-62518,62521-62522,62525-62531,62534-62554,62557-62560,62563-62627,62630-62672,62676-62686,62689-62699,62703-62705,62708-62796,62799-62808,62811-62832,62835-62898,62903-62912,62915-62938,62941-62957,62960-62975,62978-63097,63100-63125,63129-63145,63148-63154,63157-63188,63191-63227,63230-63244,63247-63250,63253-63257,63260-63262,63265-63274,63277,63280-63283,63286-63363,63366-63370,63373-63377,63380-63384,63387-63396,63400,63404-63435,63438-63521,63524-63526,63529-63534,63539-63545,63548-63572,63576-63638,63641-63646,63649-63655,63658-63662,63665-63682,63686-63688,63691-63730,63733-63786,63789-63816,63820,63823,63826-63828,63831-63836,63839-63860,63863-63874,63877,63884-63892,63895-63898,63901-63905,63910-63917,63921-63926,63929-63930,63934-63935,63938-63952,63955-63957,63960-63962,63966,63969-63994,63997-64000,64003-64009,64012-64013,64017-64083,64086-64113,64116-64126,64129-64133,64136-64139,64142-64144,64147-64151,64154-64160,64163,64166-64176,64179-64216,64219-64244,64249-64263,64266-64267,64270-64271,64274-64276,64281-64284,64287-64295,64298-64299,64302-64304,64307-64311,64314-64329,64332-64337,64341-64342,64345-64357,64360-64369,64372-64375,64378,64381-64393,64396-64402,64405-64407,64410-64414,64417-64420,64423-64529,64532-64545,64548-64561,64564-64590,64593-64600,64604-64620,64623-64645,64648-64702,64706-64797,64800-64810,64813-64837,64840,64843-64856,64859-64955,64958-64964,64967-64974,64977-64982,64985-64986,64990-65008,65011-65035,65038-65097,65100-65112,65115-65116,65119-65127,65130-65133,65137-65165,65168-65171,65174-65231,65234-65251,65254,65257,65260-65265,65269-65270,65273-65282,65285-65305,65308-65331,65334-65353,65356-65384,65387,65390-65407,65410-65518,65521-65523,65526-65533,65536-65551,65554-65560,65563-65569,65572-65578,65581-65626,65630-65666,65669-65677,65680-65687,65690-65749,65752-65807,65810-66090,66093-66103,66106-66141,66144-66173,66176,66179-66182,66185-66223,66226-66250,66253-66268,66271-66332,66335-66339,66342-66352,66356-66362,66365-66460,66463-66467,66470-66471,66474-66539,66542-66569,66572-66577,66580-66582,66586-66591,66595-66657,66661-66709,66712-66716,66719-66735,66738-66746,66749-66796,66799-66820,66823-66843,66846-66862,66865-66869,66872-66875,66878-66888,66891-66905,66908-66928,66931-66940,66943-66960,66964-66966,66970-66977,66981-67021,67024-67042,67045-67048,67051-67052,67055-67063,67066-67067,67070-67111,67114-67199,67202-67235,67238-67246,67249-67259,67262-67368,67371-67372,67375-67385,67388,67391-67418,67421-67644,67647-67669,67672-67770,67773-67810,67813-67814,67817-67918,67921-68271:s:
1676,1679,1683-1686,1691-1694,1698-1702,1707,1710-1712,1716-1718,1723-1726,1729-1735,1739-1740,1743-1744,1747-1750,1755-1756,1759-1762,1765,1768-1772,1775-1776,1780-1794,1797-1806,1810-1820,1823,1826-1829,1832-1859,1862-1871,1874-1875,1878-1885
goog-malware-shavar;a:
22081-22084,22087-22092,22095-22100,22103-22133,22136-22224,22227-22254,22257-22311,22314-22395,22398-22403,22406-22413,22416,22419-22449,22452-22478,22481-22533,22536-22585:s:
34150-34157,34160-34171,34174-34316,34319-34336,34339-34347,34350-34369,34372-34373,34376-34402,34405,34408-34413,34416-34419,34422-34440,34443-34452,34456-34459,34462-34464,34467-34475,34478-34482,34485-34489,34492-34514,34517-34535,34538-34546,34549-34562,34565-34657,34660-34712,34715-34775,34778-34856

-Kevin

On Jul 13, 2:35 pm, Beaver6813 <s...@beaver6813.com> wrote:
> Hmm its difficult to debug, could be so many reasons. What are your
> request ranges looking like? Are there any gaps i.e. 100-150,152-260
> as that could cause the GSB server to give weird responses.
>
> If you want to lookup and hashes in the future to see what chunks they
> show up in I've built up a couple of testing tools athttp://gsbtool.beaver6813.com/

Beaver6813

unread,
Jul 14, 2010, 5:52:02 AM7/14/10
to Google Safe Browsing API
Wow that sure its a beast of a request! Theres definately something
wrong with the way your script is parsing chunks. (You'll probably
have to clear your database and start again). Are you storing chunks
even if they're empty? (You should as sometimes the server gives out
legit empty chunks.) If you are then the parsing was (and you didn't
reset your database since then) or is not reading correctly.

Sending a request like that to GSB will most probably confuse it as it
will try and fill in the gaps constantly which it doesn't handle very
well. And if they weren't added the first time then they probably
won't get added the second time. I don't usually code in Java but I
don't mind having a look at the parser to see if I can spot any
obvious errors. Send it over to s...@beaver6813.com if you want me to
take a look.

My requests look like:
googpub-phish-shavar;s:1644-1886:a:58622-68332
goog-malware-shavar;s:31010-34872:a:19614-22601
Very rarely do I get more than one set of ranges.

--Sam
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages