So ... this is a scale problem as well. There's loooots of unsafe links due to domainrot/etc.
My thinking after a quick perusal of some data is we need ideally a primary list of unsafe domains with a few main buckets:
a) expired (so they don't work, and they may become malicious in future, especially because SEO backlinks)
b) clearly no longer under original vendor control (e.g. mandriva.com
) and maybe the content is gone as well?
d) still under original vendor control, but the security content has since been replaced (e.g. one CNA replaced all their advisories with marketing pages) so not unsafe, but useless.
which also speaks to mirroring the content when we process it (if nothing else so we can show what we based our data on, see previous email threads for this).
I do NOT support using the database_specific field unless we design a process for using it as an experimental placeholder with a path to officially integrate popular/useful things into the schema. We need to encourage things to grow/improve the schema, I know OSV wants a lightweight thing, and to that, I would suggest what I suggested for the CVE JSON schema which was the core "MUST" schema (e.g. ID, description, vendor, products, vuln type, sources) and the nice to have "SHOULD/CAN" schema (CVSS, CPE, etc.).
I mean unless you're downloading those URLs or pushing them to customers to download or click on, it doesn't really matter. But for the people that do, it does matter. The same can be said for descriptions, and so on.