Re: Domain lookup in GSB v2

257 views
Skip to first unread message

DumbGuy

unread,
Sep 12, 2012, 6:17:13 AM9/12/12
to google-safe-...@googlegroups.com
This brings up a good point.
 
I was under the impression that GSB was domain-based -- that tracking/reporting of malware or phishing was for the entire domain. Can someone please confirm or deny this?
 
I use the Lookup API, but I'm assuming it's the same data as the DB that most people discuss here.
 
Perhaps I jumped to conclusions on this when I first began experimenting with the Lookup API. For example, we all know that http://www.ianfette.org/ is a seeded test for malware. But I noticed that made up URLs there also test positive for malware:
http://www.ianfette.org/nothinghere.php
http://www.ianfette.org/blah/blah/blah
 
So, in my app I've only been checking at the domain level, based on the above test. Sounds like I've been making some incorrect assumptions, but Maarten's question has now got me wondering how it all fits together.
 
Maybe a domain-level malware flag naturally includes all (fictitious, even) paths/URLs at that domain? But a malware-flagged URL doesn't imply malware at the domain-level?
 
What's the scoop on this?
 
Thanks for any clarifications!
-DG

On Friday, September 7, 2012 12:11:32 PM UTC-7, Maarten wrote:
Separate topic from my previous post...

I would like to be able to determine if _any_ paths on a domain are being blacklisted.  For instance, one of the testing URLs (http://malware.testing.google.test/testing/malware/) has a path on it.  However, if I just lookup the domain (http://malware.testing.google.test/), it comes back as being okay.  The lookup only reports malware when I put in the full URL (http://malware.testing.google.test/testing/malware/).

What would I change in my queries to be able to just issue the domain name and get back a list of all the full URLs that are on the blacklist?

Example:


DumbGuy

unread,
Sep 12, 2012, 6:38:28 AM9/12/12
to google-safe-...@googlegroups.com
FWIW, the same "trickle down" effect on fictitious paths doesn't seem to be limited to just the seeded domain ianfette.org...
 
-DG

Garrett Casto

unread,
Sep 12, 2012, 1:01:44 PM9/12/12
to google-safe-...@googlegroups.com
On Wed, Sep 12, 2012 at 3:17 AM, DumbGuy <mim...@purejts.com> wrote:
This brings up a good point.
 
I was under the impression that GSB was domain-based -- that tracking/reporting of malware or phishing was for the entire domain. Can someone please confirm or deny this?
 

This is not true. See https://developers.google.com/safe-browsing/developers_guide_v2#RegexLookup for more information, but the basic idea is that there are patterns that are blocked (eg "evil.com/path") where subdomains or additional path components are also blocked (eg "www.evil.com/path" or "evil.com/path/index.html"). Note that "evil.com/other" is not blocked in this case.

As for the original question, there is not currently a way to enumerate all bad patterns on a domain. Why do you want this information instead of just looking up individual URLs?
 
I use the Lookup API, but I'm assuming it's the same data as the DB that most people discuss here.
 
Perhaps I jumped to conclusions on this when I first began experimenting with the Lookup API. For example, we all know that http://www.ianfette.org/ is a seeded test for malware. But I noticed that made up URLs there also test positive for malware:
http://www.ianfette.org/nothinghere.php
http://www.ianfette.org/blah/blah/blah
 
So, in my app I've only been checking at the domain level, based on the above test. Sounds like I've been making some incorrect assumptions, but Maarten's question has now got me wondering how it all fits together.
 
Maybe a domain-level malware flag naturally includes all (fictitious, even) paths/URLs at that domain? But a malware-flagged URL doesn't imply malware at the domain-level?
 
What's the scoop on this?
 
Thanks for any clarifications!
-DG

On Friday, September 7, 2012 12:11:32 PM UTC-7, Maarten wrote:
Separate topic from my previous post...

I would like to be able to determine if _any_ paths on a domain are being blacklisted.  For instance, one of the testing URLs (http://malware.testing.google.test/testing/malware/) has a path on it.  However, if I just lookup the domain (http://malware.testing.google.test/), it comes back as being okay.  The lookup only reports malware when I put in the full URL (http://malware.testing.google.test/testing/malware/).

What would I change in my queries to be able to just issue the domain name and get back a list of all the full URLs that are on the blacklist?

Example:


--
You received this message because you are subscribed to the Google Groups "Google Safe Browsing API" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-safe-browsing-api/-/awU9dnXZM-4J.

To post to this group, send email to google-safe-...@googlegroups.com.
To unsubscribe from this group, send email to google-safe-browsi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-safe-browsing-api?hl=en.

DumbGuy

unread,
Sep 13, 2012, 12:19:27 AM9/13/12
to google-safe-...@googlegroups.com
Thank you, Garrett, for the link and clarification -- and the prompt reply to boot. Only being a Lookup API user, I hadn't reviewed the v2 docs.
 
...Time to get tweakin' on my app.
 
-DG

Maarten

unread,
Sep 13, 2012, 11:52:36 AM9/13/12
to google-safe-...@googlegroups.com
On Wednesday, September 12, 2012 1:01:46 PM UTC-4, Garrett Casto wrote:

On Wed, Sep 12, 2012 at 3:17 AM, DumbGuy <mim...@purejts.com> wrote:
This brings up a good point.
 
I was under the impression that GSB was domain-based -- that tracking/reporting of malware or phishing was for the entire domain. Can someone please confirm or deny this?
 

This is not true. See https://developers.google.com/safe-browsing/developers_guide_v2#RegexLookup for more information, but the basic idea is that there are patterns that are blocked (eg "evil.com/path") where subdomains or additional path components are also blocked (eg "www.evil.com/path" or "evil.com/path/index.html"). Note that "evil.com/other" is not blocked in this case.

As for the original question, there is not currently a way to enumerate all bad patterns on a domain. Why do you want this information instead of just looking up individual URLs?

Primarily because I work for a web hosting company.  I know the domains that are served from my servers, but getting a list of all the full URLs would be impractical (there are over 6million domains involved).  What I'm looking to do is to be able to notify my customers when / if any URL in their domain ends up on the blacklist.

I could (in theory) put together a list of every file on the filesystem, but between RewriteRules and potential parameters it would be almost impossible to get a complete list of all the responsive URLs for a given domain.

--Maarten

Julien Sobrier

unread,
Sep 13, 2012, 12:33:59 PM9/13/12
to google-safe-...@googlegroups.com
The local database contains the host prefix. So you could filter your
6 millions domains with the host prefix. You would have to lookup URLs
for these domains only.

Julien Sobrier
> --
> You received this message because you are subscribed to the Google Groups
> "Google Safe Browsing API" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-safe-browsing-api/-/nKo54Fu2vK8J.

Garrett Casto

unread,
Sep 13, 2012, 1:11:10 PM9/13/12
to google-safe-...@googlegroups.com
On Thu, Sep 13, 2012 at 8:52 AM, Maarten <mbroek...@gmail.com> wrote:
The way that this normally works is that you sign up for webmaster tools for all the domains that you control, and you will recieve e-mail notification if we detect any phishing or malware pages on these sites.
 

--
You received this message because you are subscribed to the Google Groups "Google Safe Browsing API" group.
Reply all
Reply to author
Forward
0 new messages