Data enrichment from CNAs (and URL only ids)

9 views
Skip to first unread message

Kurt Seifried

unread,
Mar 14, 2022, 12:41:34 PM3/14/22
to GSD Discussion Group
So I updated a pile of GSD's (ones with CVE aliases) from a set of trusted sources (mostly CNAs):

archlinux
debian (CNA)
mageia
redhat (CNA)
suse (CNA)
ubuntu (CNA)

using some simple scripts:


TL;DR: one set of scripts generates a CSV file of CVE,URL and another script walks the GSD files and updates the ".GSD.references[]" array. 

Some comments/problems:

Commits with 10,000+ file changes are impossible to review. by a human. We need a better CI/CD pipeline, I accidentally deleted a few files, Josh Bressers script recreated them. I accidentally created a few improperly formed files, fixed in https://github.com/cloudsecurityalliance/gsd-database/pull/2288
So no deletions without double checking, confirm file creation is proper (e.g. filename/path), that sort of thing.

There are a LOT of CVEs in public use (over 400) from CNAs that are still in reserved state (years later in some cases). There's a handful missing entirely from the database. Attached is a text file. (side note: there are large "gaps" in the CVE ID's, e.g. out of the one thousand CVE-2021-2xxx only has 459 public CVEs, so more than half are still missing. So when we find data that links to CVEs (or other databases) that we can't confirm because the data is missing (or not visible, e.g. paywalled) we just have to assess the trust level of the source and use it or not, and ideally give some source hints/data on how it got there. 

This will be an increasing problem as we branch out, there are basically 3 sets of data:

1) Public general, there used to be OSVDB, now there is basically just CVE and some CERTS. Most references can be verified, but requires a lot of manual processing
2) Public ecosystem specific, e.g. https://github.com/rubysec/ruby-advisory-db these are almost all trustworthy and correct
3) Private, there are tons of private databases, who knows what they have or don't have, the quality here varies enormously

This was discussed on the call more, but TL;DR: We need sourcing and attribution/analysis so people can figure out if they want to use this data or not. 

Kurt Seifried (He/Him)
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance
GSD-public-data-but-reserved-CVE.txt
Reply all
Reply to author
Forward
0 new messages