handling dead/potentially malicious links

12 views
Skip to first unread message

Kurt Seifried

unread,
Jul 4, 2022, 2:50:43 PM7/4/22
to GSD Discussion Group
So today while doing some data analysis I ran across Mandriva.com, which is now very very dead and advertising casinos amongst other shenanigans. 


And that domain will never host real content, and appears to be hosting potentially malicious content. As I see it we have a few options:

1) Do nothing
2) Remove all mandriva.com URLs from the data (mostly in other namespaces)
3) Change the URLs to indicate that they are possibly malicious, e.g. the classic "hxxp://" method used for malware sites (mostly in other namespaces)
4) Add meta data to the URL JSON to reflect that they are unsafe (mostly in other namespaces)
5) Add our own data in the GSD namespace to reflect that the following URLs are dead/dangerous

If we do #2/3/4 then we're touching data mostly in the cve.org and nvd.nist.gov namespaces and that creates all kinds of technical problems so I'm inclined to say "no" to that.

I feel like #1 is the lazy way out. But a safe default.

I think #5 is the way to go, but that means we need better URL data than OSV supports, we could just add it in, and anyone using the OSV schema can just ignore our additional data. Plus it sets expectations that we are stewards of the data, which I think is a good thing to expect to some degree. I don't see other organizations doing it or even allowing others to do it for their data.

Thoughts/comments?

Kurt Seifried (He/Him)
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance

Emily Fox

unread,
Jul 5, 2022, 12:12:54 PM7/5/22
to GSD Discussion Group, ksei...@cloudsecurityalliance.org
I think #5 is the best path forward.  I think for us, adding it in makes sense, however we should probably let OSV know they should support identifying unsafe URLs, because this problem is only going to get worse as people retire and decide to sell those domains.

Kurt Seifried

unread,
Jul 5, 2022, 1:28:21 PM7/5/22
to Marcin Antkiewicz, GSD Discussion Group
So ... this is a scale problem as well. There's loooots of unsafe links due to domainrot/etc. 

My thinking after a quick perusal of some data is we need ideally a primary list of unsafe domains with a few main buckets:

a) expired (so they don't work, and they may become malicious in future, especially because SEO backlinks)
b) clearly no longer under original vendor control (e.g. mandriva.com) and maybe the content is gone as well?
c) serving unsafe content (e.g. mandriva.com)
d) still under original vendor control, but the security content has since been replaced (e.g. one CNA replaced all their advisories with marketing pages) so not unsafe, but useless.

which also speaks to mirroring the content when we process it (if nothing else so we can show what we based our data on, see previous email threads for this).

I do NOT support using the database_specific field unless we design a process for using it as an experimental placeholder with a path to officially integrate popular/useful things into the schema. We need to encourage things to grow/improve the schema, I know OSV wants a lightweight thing, and to that, I would suggest what I suggested for the CVE JSON schema which was the core "MUST" schema (e.g. ID, description, vendor, products, vuln type, sources) and the nice to have "SHOULD/CAN" schema (CVSS, CPE, etc.). 

I mean unless you're downloading those URLs or pushing them to customers to download or click on, it doesn't really matter. But for the people that do, it does matter. The same can be said for descriptions, and so on.


Kurt Seifried (He/Him)
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance

On Tue, Jul 5, 2022 at 10:15 AM Marcin Antkiewicz <mar...@kajtek.org> wrote:
I feel like #1 is the lazy way out. But a safe default.

I think #5 is the way to go, but that means we need better URL data than OSV supports, we could just add it in, and anyone using the OSV schema can just ignore our additional data. Plus it sets expectations that we are stewards of the data, which I think is a good thing to expect to some degree. I don't see other organizations doing it or even allowing others to do it for their data.

Thoughts/comments?

+1 on #5. OSV supports a top level 'database_specific' field, I would add an object there indicating which reference is superseded or invalidated. Archive.org retained the content: https://web.archive.org/web/20150523052634/http://www.mandriva.com/en/support/security/advisories/advisory/MDVSA-2009:146/?name=MDVSA-2009:146

Marcin Antkiewicz

Emily Fox

unread,
Jul 5, 2022, 4:36:53 PM7/5/22
to GSD Discussion Group, ksei...@cloudsecurityalliance.org, GSD Discussion Group, Marcin Antkiewicz
Love the groupings - lets do this!

recommend the below to clean up them up a bit and order by potential concern:
a) content moved
b) new owner
c) expired
d) unsafe

is it possible that a URL will have multiple of these at a given time or should this serve more as the lifecycle of URLs?

I don't see any reason why OSV shouldn't support a MUST and nice-to-have schema.  (SBOM is set up to do that...) it grants the flexibility to continue to address new use cases as we discover them (or remember that they are a problem to be addressed).

Josh Bressers

unread,
Jul 5, 2022, 10:12:00 PM7/5/22
to Emily Fox, GSD Discussion Group, ksei...@cloudsecurityalliance.org, Marcin Antkiewicz
On Tue, Jul 5, 2022 at 4:36 PM Emily Fox <themoxie...@gmail.com> wrote:
Love the groupings - lets do this!

recommend the below to clean up them up a bit and order by potential concern:
a) content moved
b) new owner
c) expired
d) unsafe

is it possible that a URL will have multiple of these at a given time or should this serve more as the lifecycle of URLs?

I'm not a fan of premature optimization, but I suspect we will need multiple tags. My suspicion is someday there will be many more of these.
 

I don't see any reason why OSV shouldn't support a MUST and nice-to-have schema.  (SBOM is set up to do that...) it grants the flexibility to continue to address new use cases as we discover them (or remember that they are a problem to be addressed).


I am also a fan of such a model. There will never be a one size fits all schema, but there will always be some minimal details.

--
     Josh

Marcin Antkiewicz

unread,
Jul 8, 2022, 11:29:43 AM7/8/22
to Kurt Seifried, GSD Discussion Group
I feel like #1 is the lazy way out. But a safe default.

I think #5 is the way to go, but that means we need better URL data than OSV supports, we could just add it in, and anyone using the OSV schema can just ignore our additional data. Plus it sets expectations that we are stewards of the data, which I think is a good thing to expect to some degree. I don't see other organizations doing it or even allowing others to do it for their data.

Thoughts/comments?

+1 on #5. OSV supports a top level 'database_specific' field, I would add an object there indicating which reference is superseded or invalidated. Archive.org retained the content: https://web.archive.org/web/20150523052634/http://www.mandriva.com/en/support/security/advisories/advisory/MDVSA-2009:146/?name=MDVSA-2009:146

Marcin Antkiewicz
Reply all
Reply to author
Forward
0 new messages