Mirroring security data - Was: Adding the GrypDB data into GSD

12 views
Skip to first unread message

Josh Bressers

unread,
Apr 6, 2022, 4:15:51 PMApr 6
to Weston Steimel, GSD Discussion Group, Oliver Chang
On Wed, Apr 6, 2022 at 10:51 AM Weston Steimel <weston....@gmail.com> wrote:
OSV uses the source identifiers as the primary id for the record when available and then uses the aliases and related fields to reference other identifiers.  The currently used prefixes and original sources of data are identified at https://ossf.github.io/osv-schema/#id-modified-fields.  I have thought it'd be really cool to get all of the various linux distro feeds also exporting OSV format to osv.dev at least, but haven't yet had time to work on that.  Perhaps that is something Oliver has already been considering? 

This mail had me questioning why GSD wasn't doing something like this, and I think there are a few reasons we didn't want to do this.

First, there are a lot of potential vulnerability/advisory sources. Too many to try to track. By using a GSD ID then adding an alias that can be anything, we don't have to add explicit support for new identifier sources.

The other piece of this, the most important piece, is if you only pull from a source, the data is immutable. The GSD doesn't want to be immutable, we want everyone to be able to help contribute data to everything. I think trying to recycle an existing ID, when you are modifying the underlying data, will not be appreciated. I can also see instances where it creates confusion if one source disagrees with another source and the same identifier is used.

I think this is also why we want lots of IDs instead of trying to minimize IDs. If we have one ID for a linux distribution advisory, and that advisory contains 3 CVE IDs, how do we add commentary if we try to overload the CVE IDs by adding the advisory ID as an alias in 3 different places?

I'm going to go with one GSD ID per "thing" where our definition of thing is pretty simple. There are plenty of integers :)

Thanks all for the input!

-- 
     Josh

Oliver Chang

unread,
Apr 6, 2022, 7:07:36 PMApr 6
to Josh Bressers, Weston Steimel, GSD Discussion Group
On Thu, 7 Apr 2022 at 06:15, Josh Bressers <jo...@bress.net> wrote:
On Wed, Apr 6, 2022 at 10:51 AM Weston Steimel <weston....@gmail.com> wrote:
OSV uses the source identifiers as the primary id for the record when available and then uses the aliases and related fields to reference other identifiers.  The currently used prefixes and original sources of data are identified at https://ossf.github.io/osv-schema/#id-modified-fields.  I have thought it'd be really cool to get all of the various linux distro feeds also exporting OSV format to osv.dev at least, but haven't yet had time to work on that.  Perhaps that is something Oliver has already been considering? 

We're hoping to do this for Debian soon (https://github.com/ossf/osv-schema/issues/24), but we will need a lot of help for other distros. If you (Weston), or any others in this group are interested, this is work that would be rewarded as part of Linux Foundation's https://sos.dev program :) 
 

This mail had me questioning why GSD wasn't doing something like this, and I think there are a few reasons we didn't want to do this.

First, there are a lot of potential vulnerability/advisory sources. Too many to try to track. By using a GSD ID then adding an alias that can be anything, we don't have to add explicit support for new identifier sources.

The other piece of this, the most important piece, is if you only pull from a source, the data is immutable. The GSD doesn't want to be immutable, we want everyone to be able to help contribute data to everything. I think trying to recycle an existing ID, when you are modifying the underlying data, will not be appreciated. I can also see instances where it creates confusion if one source disagrees with another source and the same identifier is used.

A model we've been trying to promote with OSV is for vulnerability databases to publish their own repo of community editable advisories (e.g. GitHub, Python, Rust). This splits the work, and more closely reflects how open source development in general is distributed (and scales).

I think there's some value to recycling IDs -- it reduces indirection. If a database wants to add its own commentary (and uses the OSV format), it could potentially add this to the `database_specific` fields in there, and easily keep everything else in sync from the original source, rather than fork and introduce fragmentation. 

Kurt Seifried

unread,
Apr 7, 2022, 12:27:33 AMApr 7
to Oliver Chang, Josh Bressers, Weston Steimel, GSD Discussion Group
Another aspect to consider is the Venn diagram aspect of how people/projects/etc group vulnerabilities. I don't mean just a cluster of CVEs that varies a bit by vendor, but some vendors do a unique identifier per each instance, some do a more CVE approach of "same vuln type, same versions, same fix, into the bucket you go!" and some just glue all sorts of random stuff together. Then on the consumption side things aren't all equal, e.g. one project may consume it as an update, another might backport the specific fix, and another may simply slap a compensating control on it. 

Having the ability to arbitrarily slice, dice, glue and other mangle id's into various subsets is very powerful, and most importantly, I think, it's way easier to join stuff together than it is to take it apart, which goes to the "many smaller files or one big one" discussion. 

Also, it's easier right now to simply decompose the advisories to what vendors provide (e.g. unique CVEs), and as we build better systems that can parse the advisories, look at the commits and so on we can start to decompose it further (e.g. to the commit level), giving finer granularity. I can't imagine as time goes on and we get better at this people will want "larger" blobs of data that they themselves have to decompose if needed. 

-Kurt
 
Reply all
Reply to author
Forward
0 new messages