Hi,
On Sun, 15 Aug 2021, Josh Bressers wrote:
> Hi all,
>
> I spent a bit more time today with JSON-LD. This is truly one of those things where the more I learn the less I know.
>
> So anyway, here's what looks like 15 minutes of effort but took me longer than I want to admit :)
>
https://github.com/joshbressers/uvi-tools/tree/json-ld/json-ld
>
> I used Node.jsfor this experiment, mostly because I can. I detest all languages equally, be sure to tell me why your favorite is best :P
> We'll probably want lots of language examples in the future.
Looks good to me. Ideally, we want to support as many languages as
possible, but Node (well Javascript in general) is basically the language
that people use to interact with the Web so makes sense to start there.
> I took a Linux Kernel CVE ID, CVE-2021-38208, and created some JSON-LD that links to the NVD json. I'm loading the NVD JSON via the jsonld library, which works (tm). I'm not sure if this is considered
> kosher in the JSON-LD world.
It works for now, but yes, we probably want to have a gateway service
which translates the CVE4/CVE5 stuff to something more palatable.
> I think in order to load data like from github we will need to build some tooling that knows how to look up certain things, like a kernel git repo for example. I'm thinking we should build some sort
> of vulnerability library to help with this.
Yes, and then we would implement a gateway for that as well, which handles
the translation, like with the CVEs.
> So anyway
>
> I think I've realized a few things today about linking all of this together, feel free to fork this into multiple email threads if that makes more sense.
>
> I don't want to duplicate any data that exists somewhere else in the graph. So for example a description already exists in the NVD data and a description exists in the Kernel git commit. Unless
> there's a new description for us to add, we can use one of those. I think it's very common for many vulnerability data sets to heavily duplicate today (this makes sense given what we have today).
> Duplication annoys me.
>
> Is there an easy way to pick a "category" or some other identifier type? In my example it's pointing directly at a CVE ID. That's going to be a very different set of data than findings from a fuzzer,
> or a collection of Alpine security advisories. Or is it?
We can use compounding for this, something like:
{
"type": ["Vulnerability", "Kernel"],
...
}
Alternatively, we can have a separate subtype field if the compounding
approach seems unnatural:
{
"@context": [
"
https://uvi.whatever/ns/uvi",
{
"type": "@type",
"subtype": "uvi:subtype",
"Kernel": "uvi:Kernel"
}
],
"type": "Vulnerability",
"subtype": "Kernel",
}
That would allow tooling to prefer the kernel vulnerability data over the
NVD data, or whatever.
> Do we want to track absolutely everything with a new identifier? If we look at how OSV is handling this the other ecosystem data is imported then re-exported with the ecosystem identifier. Maybe this
> is a question we ignore for now.
>
> So tear this up, I know it sucks but it's meant to help drive discussion.
It looks perfectly fine to me as a starting point.
Ariadne