Enforced Source Code Location in Package Repos?

14 views
Skip to first unread message

Kit Plummer

unread,
May 6, 2021, 7:46:46 AM5/6/21
to David Wheeler, mil...@googlegroups.com
One thing I’m struggling with in my OSS research is why pacakage repos don’t enforce and validate source code URLs and license checks.

At a minimum at least make the Source URL an explicit field in the metadata.  Most package manager allow devs to arbitrarily submit project and source URLs.  And don’t offer any adjudication.  This wouldn’t be that big a deal if devs did their due diligence and validated things themselves.

The biggest problem is that without explicit fields in metadata there is no “good” way to automate the validation.

I’m some basic analysis of NPM well over 1/3 of packages either don’t point to their source, or have an invalid URL.  

Curious what you guys think about this.  Thoughts?

Kit

John Scott

unread,
May 6, 2021, 7:52:06 AM5/6/21
to Kit Plummer, David Wheeler, mil...@googlegroups.com
I’m sorta shocked that no one has tried to force clean up the package managers, just to enforce simple rules like:
- don’t let folk download software with high or critical CVEs (or at least put an extra step, speed bump in front of the download)

-------------------------------------------
John Scott
--
--
You received this message because you are subscribed to the "Military Open Source Software" Google Group.
To post to this group, send email to mil...@googlegroups.com
To unsubscribe from this group, send email to mil-oss+u...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/mil-oss?hl=en
 
www.mil-oss.org

---
You received this message because you are subscribed to the Google Groups "Military Open Source Software (Mil-OSS)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mil-oss+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mil-oss/CAHUyvnqCRK3ngvUafVvxSf163xF9ANg9tz%3DqrsAQAB9-kCEC1w%40mail.gmail.com.

John Janek

unread,
May 6, 2021, 11:38:48 AM5/6/21
to mil...@googlegroups.com
I'd say I'm surprised, but I'm not really. I'd even crosshatch it with last update. How many of those packages have been abandoned?

I wonder if you set up a public blacklist - like the spam lists - in JSON. "Here are the repos which don't point back to source, which haven't been touched in X days, who have 1 contributor" whatever the high risk indicators are. Once the data is collected, then you just need to ask people to consume it. Which is easier. No one wants to do the heavy list, but I bet if you put out risky packages ears would perk up.

--

Kit Plummer

unread,
May 6, 2021, 11:45:26 AM5/6/21
to mil...@googlegroups.com
That is an interesting thought.  Thinking about it.  A publicly available, “bus-factor” black list.  I’d need to make all the analysis data publicly available, which is a bit of a challenge, because it is a lot.

Kit

John Janek

unread,
May 6, 2021, 12:08:25 PM5/6/21
to mil...@googlegroups.com
I don't think so, I mean you could if you wanted to, I suppose. More important would be just to publish the methodology openly (maybe even as a spec itself, that'd be meta), and then the final determination.

But there's nothing that says you couldn't do method, analysis, list, too. But I'd break it up into different datasets for consumption.

Jim Kinney

unread,
May 6, 2021, 1:44:58 PM5/6/21
to mil...@googlegroups.com, John Janek
If the repo can't provide the source, block the package or block the repo. At some point a code review will be required.

My $0.02 - a tool that by default downloads itself automatically from off-site repos on startup is dangerous. NPM and docker are on the top of my list of locations to block at the outbound firewall for production systems (and devops hate me).
--
Computers amplify human error
Super computers are really cool

Kit Plummer

unread,
May 6, 2021, 2:17:41 PM5/6/21
to mil...@googlegroups.com
I think the challenge here is that the package managers would be expected to adjudicate each package during their deployment/upload to the repo.  I think there is value in that, though have not had much luck talking to central repo operators.  

I think the two pieces of metadata here are license and source url.  If you claim OSS, then you MUST have a source url that validates against a real repo.  From that there is the challenge of what is the validation - a commit hash, or something else.  For dynamic languages that are source in the package you could possibly compare code.  But for binary packages I'm not sure what that validation is?

Reply all
Reply to author
Forward
0 new messages