OSV.dev's data quality initiatives in 2024

11 views
Skip to first unread message

Andrew Pollock

unread,
May 8, 2024, 9:37:55 PMMay 8
to osv-discuss
Hello,

One of my areas of focus this year is on OSV.dev's data quality story, broadly bucketed into data ingestion and downstream user benefits. The desired end state of this work is described below:

Data ingestion
  • There is a human readable authoritative definition of what the minimum quality bar looks like for an an OSV record that is acceptable for import by OSV.dev
    • Where possible, integrated into the JSON schema definition and the schema documentation

  • Tooling exists for OSV record creators to validate that they meet the minimum quality bar at record creation time

    • Where possible, relying on JSON schema validation

  • The minimum quality bar is sustainably enforced by OSV.dev
  • Records below the quality bar are not imported into or exported by OSV.dev

  • OSV record providers have a machine-readable way to reason about their existing published records that do not meet this quality bar
Downstream user benefits
  • OSV.dev downstream users have a way to reason about records that are absent from OSV.dev because of failure to meet the minimum quality bar

  • OSV.dev downstream users have a clearly defined user journey to make corrections to OSV records served by OSV.dev with minimal overhead by all parties
If any of this is of interest, or you have any questions or concerns, feel free to start a conversation (preferably on-list).

regards

Andrew

--


Andrew Pollock

Software Engineer, Google Open Source Security Team | apol...@google.com

Google LLC


This email is confidential. If you are not the right addressee, please inform the sender and please erase this email including any attachments.

Reply all
Reply to author
Forward
0 new messages