So I updated a pile of GSD's (ones with CVE aliases) from a set of trusted sources (mostly CNAs):
using some simple scripts:
TL;DR: one set of scripts generates a CSV file of CVE,URL and another script walks the GSD files and updates the ".GSD.references" array.
So no deletions without double checking, confirm file creation is proper (e.g. filename/path), that sort of thing.
There are a LOT of CVEs in public use (over 400) from CNAs that are still in reserved state (years later in some cases). There's a handful missing entirely from the database. Attached is a text file. (side note: there are large "gaps" in the CVE ID's, e.g. out of the one thousand CVE-2021-2xxx only has 459 public CVEs, so more than half are still missing. So when we find data that links to CVEs (or other databases) that we can't confirm because the data is missing (or not visible, e.g. paywalled) we just have to assess the trust level of the source and use it or not, and ideally give some source hints/data on how it got there.
This will be an increasing problem as we branch out, there are basically 3 sets of data:
1) Public general, there used to be OSVDB, now there is basically just CVE and some CERTS. Most references can be verified, but requires a lot of manual processing
3) Private, there are tons of private databases, who knows what they have or don't have, the quality here varies enormously
This was discussed on the call more, but TL;DR: We need sourcing and attribution/analysis so people can figure out if they want to use this data or not.