data conversion - missing data?

1 view
Skip to first unread message

Kurt Seifried

unread,
Sep 4, 2022, 12:20:11 PMSep 4
to GSD Discussion Group
So there's a bunch of data in GSD, and I'm working on converting it to OSV format. 

One common pattern in CVE and NVD for example, is to set data to:

"" (empty)
" " (single space)
"n/a" (not applicable, although from a brief look, it is applicable, and it appears they put "n/a" instead of "" or " " because why not?

I want. to standardize data and not have magic values in our data, so we have a few options:

1) Just put whatever is literally there and pass the mess down to the next person, easy but not useful
2) Standardize on "" (so convert " " and "n/a" to "") and it's clear that there should be something there, but isn't a chance for people to enrich the data
3) Use something like null, true or false, but I feel like these are confusing and not really representative of what should be here, e.g. NULL makes sense for a key that actually should be blank/not exist, and we want to show we explicitly are setting it to NULL, true and false .. "it's true, there should be a value here?" (sort of joking, sort of serious?)

My thought, for now, is in the short term, just pass whatever data exists, and longer-term go with "" and see if it causes problems or not.
Kurt Seifried (He/Him)
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance

Josh Buker

unread,
Sep 4, 2022, 2:26:16 PMSep 4
to Kurt Seifried, GSD Discussion Group
Imo, any data pulled in from another source should stay 1:1 with that source. We can monkey with it in the GSD namespace(s), but not in the original.

As for the data conversion to OSV/GSD format, my preference on empty values would in this order be: NULL/nil, empty string/array, literally anything else.

Kurt Seifried

unread,
Sep 4, 2022, 5:17:07 PMSep 4
to Josh Buker, GSD Discussion Group
On Sun, Sep 4, 2022 at 12:26 PM Josh Buker <jbu...@cloudsecurityalliance.org> wrote:
Imo, any data pulled in from another source should stay 1:1 with that source. We can monkey with it in the GSD namespace(s), but not in the original.

Correct, I'm not talking about the original. I'm talking about the conversion to OSV, so the {osv:{}} space in a GSD file.
 
As for the data conversion to OSV/GSD format, my preference on empty values would in this order be: NULL/nil, empty string/array, literally anything else.

Right, but what counts as an empty value? ""? " "? "n/a"? I would argue these are all empty, especially if you count them; there are tens of thousands.

I think going with "" for "empty but needs to be filled" and finally NULL for when it should be NULL, e.g. we looked at it, and this key actually should be empty, so rather than omit it (and make people wonder if we missed it) we intentionally set it to NULL. 

Josh Bressers

unread,
Sep 4, 2022, 6:25:27 PMSep 4
to Kurt Seifried, Josh Buker, GSD Discussion Group


Right, but what counts as an empty value? ""? " "? "n/a"? I would argue these are all empty, especially if you count them; there are tens of thousands.

I think going with "" for "empty but needs to be filled" and finally NULL for when it should be NULL, e.g. we looked at it, and this key actually should be empty, so rather than omit it (and make people wonder if we missed it) we intentionally set it to NULL. 



I'm not sure we should make that distinction using data types. If there is data that is incomplete that should be noted as tags, not as an empty string. 

-- 
     Josh

Kurt Seifried

unread,
Sep 4, 2022, 10:39:57 PMSep 4
to Josh Bressers, Josh Buker, GSD Discussion Group
I agree, but for example with a product name in affected how do we do that without breaking the OSV schema? E.g. if we have:

"name": "value"

how do we modify value? make it a dictionary with meta data? Apple some meta data on top?
 

-- 
     Josh

Reply all
Reply to author
Forward
0 new messages