CVE publishes a duplicate of a GSD - what should the GSD do?

42 views
Skip to first unread message

Kurt Seifried

unread,
Feb 3, 2022, 11:25:01 PMFeb 3
to GSD Discussion Group
So we have a situation where CVE published an ID after the GSD ID was published, creating a duplicate of ours. 

From the gsd-database repo (https://github.com/cloudsecurityalliance/gsd-database)

$ cd 2022/1000xxx/
$ git log GSD-2022-1000006.json

commit be5271f3766a85f01b4a51a88b818ed7fe6d64f2
Author: Josh Bressers <jo...@bress.net>
Date: Thu Jan 20 21:04:49 2022 -0600

Fix the namespaces

commit 967ea5c2a6ee70e24c6f17e8d7dad7fec11d9f86
Author: Kurt Seifried <ku...@seifried.org>
Date: Fri Jan 7 21:20:21 2022 -0800

Created GSD-2022-1000006.json

=========================

From the cvelist repo (https://github.com/cveproject/cvelist)

$ cd 2021/42xxx/
$ git log CVE-2021-42392.json

commit d03873140ec6356cd51b0109ee1beddb9ec83a08
Author: CVE Team <cve-r...@mitre.org>
Date: Wed Jan 19 23:01:12 2022 +0000

"-Synchronized-Data."

commit 6f006cbb55fd521630771af08c52ed4ecbced75e (origin/release/20220112-1700)
Author: CVE Team <cve-r...@mitre.org>
Date: Wed Jan 12 17:01:06 2022 +0000

"-Synchronized-Data."

commit 0729feeb33094d4613e11757ad04a29839417ba8 (origin/release/20220107-2300)
Author: CVE Team <cve-r...@mitre.org>
Date: Fri Jan 7 23:01:05 2022 +0000

"-Synchronized-Data."

commit 7248e9b9112fe8570909e57e6b80b78d32b3969b (origin/release/20211014-1900)
Author: CVE Team <cve-r...@mitre.org>
Date: Thu Oct 14 19:00:58 2021 +0000

"-Synchronized-Data."

=========================
Ok, their git commits are not so helpful so I manually checked:

d03873140ec6356cd51b0109ee1beddb9ec83a08
added a url to CVE-2021-42392.json

6f006cbb55fd521630771af08c52ed4ecbced75e
added a url to CVE-2021-42392.json

0729feeb33094d4613e11757ad04a29839417ba8
basic CVE data added to CVE-2021-42392.json

7248e9b9112fe8570909e57e6b80b78d32b3969b
set to RESERVED state, no info in CVE-2021-42392.json, so this was part of an assignment block, CNAs often reserve big blocks in advance (while i was doing CVEs at Red Hat I often asked for blocks of 500 CVEs at a time). 

=========================

So basically what happened is this: I saw a JFrog advisory, added it to GSD because I didn't see it in CVE data, and then 2 hours later CVE published their data (not bad for them, usually they're a few days behind, like the current Samba CVE-2021-44142, I added it to GSD 2 days ago, and CVE still hasn't published any data, e.g. https://www.cve.org/CVERecord?id=CVE-2021-44142 and it's been half a week now).

So what should we do, we have several options as I see it:

1) leave both entries, add a note to each in the GSD data section cross-linking them
2) reject the older GSD-2021-44142 and adopt a simple "first to publish" rule, so GSD-2022-1000006 is the primary
3) reject GSD-2022-1000006 since we can update our data more easily and update GSD-2021-44142 with the missing data from GSD-2022-1000006
4) other options?

Also I'd like to point out that whatever we decide for this case doesn't have to be a hard rule, e.g. we can do this on a case by case basis, or adopt a rule for a while and see how it works out. So what do you the community think we should do?


Kurt Seifried (He/Him)
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance

Weston Steimel

unread,
Feb 4, 2022, 7:34:57 AMFeb 4
to Kurt Seifried, GSD Discussion Group
Perhaps using the aliases field from OSV?  

Or what if one of the documents just became a symlink to the other?

Anyways, just a couple of initial thoughts.  Certainly keen to hear what others think.

Thanks,
--Weston Steimel

--
You received this message because you are subscribed to the Google Groups "GSD Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsd+uns...@groups.cloudsecurityalliance.org.

Emily Fox

unread,
Feb 4, 2022, 12:00:05 PMFeb 4
to GSD Discussion Group, weston....@gmail.com, GSD Discussion Group, ksei...@cloudsecurityalliance.org

I like the idea of the symlink.  It allows folks that are already familiar with the GSD ID to have the CVE as well.

Kurt Seifried

unread,
Feb 4, 2022, 12:11:40 PMFeb 4
to Emily Fox, GSD Discussion Group, weston....@gmail.com
One comment: symbolic links only work on filesystems. If someone imports this data to a NoSQL store then they lose that clue that "FileA is FileB", they only know they have two identical files. Do they need to be kept in synch or? If something like a symbolic link was to be used there would also need to be a data level (e.g. in the json file) clue of the relationship, e.g. the relationship data I've been toying with, such as:

{
  "relationships": [
    {
      "type": "child_of",
      "target": "GSD-0123-01234567"
    },
    {
      "type": "duplicate_of",
      "target": "GSD-0123-01234555"
    }
  ]
}

which leads to some questions: do we need to list all relationships in both files, e.g. does a "child of" need a corresponding "parent of" in the other one (discovery is certainly easier otherwise you have to crawl the entire dataset to see if anything points at the GSD you're interested in)? Do we assume all relationships are only to/from this GSD (I don't think we need to over-engineer it to allow GSD-A to list a relationship between GSD-B and BSD-C)?

I'm more inclined to keep this purely at the data level (e.g. a JSON entry) rather than start mixing where we store data (symbolic links, etc.), we already rely upon git for change tracking/commit messages (effectively the changelog for each GSD) which at least for now people can parse with tools if they do an import. I suspect (but might be wrong) that having all the data in JSON is the best way to go longer-term because it fixes the situation where some of the important data that exists in git (external to the JSON entries) and makes moving the data around different systems easier. 

Kurt Seifried (He/Him)
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance

Paul Scarrone

unread,
Feb 4, 2022, 12:49:40 PMFeb 4
to Kurt Seifried, Emily Fox, GSD Discussion Group, weston....@gmail.com
This is merging into https://jsonapi.org/ territory and because of that I would say that the concern should be a separate file that can be combined later to produce a unified response or ingested into a DB. More of a concern of future tooling than of the stored data.

That said, I think we should keep metadata on "conflicts/collisions" somewhere in git.

An alternative thought is if we detect a conflict/collision because git represents the record of change we could make a unilateral decision to change the file's identifier retroactively. It could be inferred through history the cause of the conflict and the original file that was renamed without much pain. That does require a little more automation though and some branching on conflict. This gets summarily more annoying if there is a large batch. I could see that happening because of an outage in CVE ingestion due to any number of service outages which might render days of delay for data ingest.

Keeping a separate linking file does provide the normal way that graph datasets are managed for relationships though. The question is it more important for the filesystem to be human readable or for the relationship to be reasoned about? It seems to me in many cases we are leaving a lot of the metadata in git anyways, for now.
--
@PaulSCoder

Nate Burton

unread,
Feb 4, 2022, 12:49:46 PMFeb 4
to Kurt Seifried, Emily Fox, GSD Discussion Group, weston....@gmail.com
I like the idea of the CVE equivalent GSD identifier (i.e CVE-2021-42392 -> GSD-2021-42392) being a drop-in alternative for the CVE.  For example if I wanted to build tooling that pulled in the maximal amount of information about a vulnerability knowing the CVE identifiers I could start with the GSD equivalent of the CVE (GSD-2021-42392) and it would contain the superset of the CVE plus additional crowdsourced information.  

To me this would imply we should either:
  • add some json in GSD-2021-42392 to link the duplicate GSD entry GSD-2022-1000006, and vice versa
  • or come up with automation to detect dupes like GSD-2022-1000006 and migrate the data back into GSD-2021-42392 and then delete or mark GSD-2022-1000006 as duplicate
The former (which maps to your option 1. leave both entries, add a note to each in the GSD data section cross-linking them) seems easier in the short-term and a more consistent pattern as I sense duplicates will be inevitable based on timing.

I don't think we should go with your option 2. reject the older GSD-2021-44142 and adopt a simple "first to publish" rule, so GSD-2022-1000006 is the primary -- as I think that results in less consistency about where to expect data.

Thanks,

Nate



Nate Burton

Princ Technical Security Engineer
Cloud, Container, Compute Security
Paranoids
He/him

M 4108199829

Josh Bressers

unread,
Feb 4, 2022, 6:32:54 PMFeb 4
to Nate Burton, Kurt Seifried, Emily Fox, GSD Discussion Group, weston....@gmail.com
I think this is a great discussion.

Duplicates are going to happen, I really like this concept of a "symlink" from one ID to another.

I think the best state is to have ONE identifier as the place that stores the details, not having data spread over multiple identifiers.

GSD-2022-1000006 is a great example. While the GSD existed first, the CVE mirror has the NVD and CVE namespaces in it. As these are read only in the GSD ecosystem, I think it would make the most sense to move all relevant content to that ID. I would be comfortable with a loose rule that if a GSD has a CVE ID, we default to that ID because GSD is very nimble.

We could then leave the existing json to look like

{
  "GSD": {
    "id": "GSD-2022-1000006",
    "alias": "GSD-2021-42392"
  }
}

This should make it very obvious that this ID is a duplicate. I'm not super worried about the exact details (we can bikeshed over that later). I think the important decision for now is to decide that we should treat a duplicate as pointer somewhere else, which I think we all agree given the responses.

-- 
     Josh

Kurt Seifried

unread,
Feb 19, 2022, 11:47:09 PMFeb 19
to Josh Bressers, Nate Burton, Emily Fox, GSD Discussion Group, weston....@gmail.com
On Fri, Feb 4, 2022 at 4:32 PM Josh Bressers <jo...@bress.net> wrote:
I think this is a great discussion.

Duplicates are going to happen, I really like this concept of a "symlink" from one ID to another.

I think the best state is to have ONE identifier as the place that stores the details, not having data spread over multiple identifiers.

GSD-2022-1000006 is a great example. While the GSD existed first, the CVE mirror has the NVD and CVE namespaces in it. As these are read only in the GSD ecosystem, I think it would make the most sense to move all relevant content to that ID. I would be comfortable with a loose rule that if a GSD has a CVE ID, we default to that ID because GSD is very nimble.

We could then leave the existing json to look like

{
  "GSD": {
    "id": "GSD-2022-1000006",
    "alias": "GSD-2021-42392"
  }
}

This should make it very obvious that this ID is a duplicate. I'm not super worried about the exact details (we can bikeshed over that later). I think the important decision for now is to decide that we should treat a duplicate as pointer somewhere else, which I think we all agree given the responses.

-- 
     Josh


Also, the case of "I got a GSD and then asked for a CVE which they assigned", which is exactly what I did with  


and to see what would happen if I asked for a CVE, must to my surprise they assigned one with 24 hours:


which has the GSD reference at the end as you can see. I suspect a lot of people will ask for a GSD/CVE at the same time and by virtue of our automation/tooling/process, we'll usually be faster than CVE. 


Kurt Seifried

unread,
Feb 28, 2022, 12:23:32 PMFeb 28
to Josh Bressers, Nate Burton, Emily Fox, GSD Discussion Group, weston....@gmail.com
On Sat, Feb 19, 2022 at 9:46 PM Kurt Seifried <ksei...@cloudsecurityalliance.org> wrote:
 


On Fri, Feb 4, 2022 at 4:32 PM Josh Bressers <jo...@bress.net> wrote:
I think this is a great discussion.

Duplicates are going to happen, I really like this concept of a "symlink" from one ID to another.

I think the best state is to have ONE identifier as the place that stores the details, not having data spread over multiple identifiers.

GSD-2022-1000006 is a great example. While the GSD existed first, the CVE mirror has the NVD and CVE namespaces in it. As these are read only in the GSD ecosystem, I think it would make the most sense to move all relevant content to that ID. I would be comfortable with a loose rule that if a GSD has a CVE ID, we default to that ID because GSD is very nimble.

We could then leave the existing json to look like

{
  "GSD": {
    "id": "GSD-2022-1000006",
    "alias": "GSD-2021-42392"
  }
}

I think duplicates are common enough they should be a first-class citizen so along with id/alias "duplicate" which is a list of one or more other IDs, and would supersede alias, e.g.:

{
  "GSD": {
    "id": "GSD-2022-1000006",
    "duplicate_of": ["GSD-2021-42392"]
  }

-Kurt
Reply all
Reply to author
Forward
0 new messages