Beyond SPDX

60 views
Skip to first unread message

Nick Vidal

unread,
Sep 5, 2023, 2:42:07 PM9/5/23
to clearly...@googlegroups.com
Hi everyone,

In today's meeting, there was a request from GitHub for ClearlyDefined to have more licensing coverage, beyond SPDX. In other words, there's a need to cover not just open source licenses, but proprietary ones as well. GitHub is in the process of setting a local harvest, and they plan on sharing this licensing metadata with others.

This is a common request that I have already heard from other community members. I would like to hear feedback from everyone about this. If this is something that you would also be interested in, please let us know.

Thanks,
Nick

Josh Berkus

unread,
Sep 5, 2023, 2:46:19 PM9/5/23
to Nick Vidal, clearly...@googlegroups.com
On 9/5/23 11:41, Nick Vidal wrote:
> This is a common request that I have already heard from other community
> members. I would like to hear feedback from everyone about this. If this
> is something that you would also be interested in, please let us know.

I can see the value, but it sounds like an impossible task. How many
thousands of proprietary licenses are there across the industry?

--
-- Josh Berkus
Kubernetes Community Architect
OSPO, OCTO

Jeff McAffer

unread,
Sep 5, 2023, 3:33:02 PM9/5/23
to Josh Berkus, Nick Vidal, clearly...@googlegroups.com
The requirement is real but as Josh says, the task is challenging. IIRC there are two main challenges: automation and namespacing. Right now most of the scanners have some set of regular expressions, templates, or some such they use to identify given text as a particular license. For arbitrary text, that's hard to do. For some of the scenarios you can rely on just having a discoverable identifier (no need for matching), for others, not so much and automation is essential.

Namespace management is just hard. Some have proposed using internet domains (ala Java package naming) but many licenses are, or evolve to be, not "owned" by a particular organization.

In the past a few folks have discussed using the SPDX "LicenseRef-" syntax and some auto generated hashing of unrecognized license text. That combined with an alias registry gets you the ability to have automated detection and a human-readable, manageable namespace.

The idea is that unrecognized license text is hashed and then referenced as "LicenseRef-XYZABC123" (or some such). Off the bat all such licensed packages are correlated and so can be "cleared" by legal teams together and collaboratively. Over time curators may come to see that hash as the "FooBar" license and then register an alias for the hash. Then "LicenseRef-XYZABC123" and "LicenseRef-FooBar" are then interchangeable (with the latter being more user-friendly"). Variants of FooBar with different hashes can also be aliased to FooBar. It is even possible that FooBar ends up being a recognized SPDX license. If it retains that name, all's good, if it gets a new name, register and alias for that too. 

On the practical user side, sticking with SPDX valid syntax allows for the continued use of SPDX tooling and integrations.

Jeff

--
You received this message because you are subscribed to the Google Groups "clearlydefined" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clearlydefine...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clearlydefined/557158f1-ebae-4928-a1c2-644b1ad00f93%40redhat.com.

Eric Schultz

unread,
Sep 5, 2023, 5:52:03 PM9/5/23
to clearly...@googlegroups.com
Maybe other folks are interested in this but as a community member I'm personally a bit turned off by making it easier to use and create proprietary software. I use ClearlyDefined and participate to make open source software easier to make and maintain. From my point of view, if anything, I DON'T want to make it easier to make proprietary software.

Others mileage may vary, of course, but I don't think I'm alone among current or potential community members.

Eric

Philippe Ombredanne

unread,
Sep 29, 2023, 11:39:50 AM9/29/23
to Nick Vidal, clearly...@googlegroups.com
Hi Nick:
FWIW, ScanCode which is used in ClearlyDefined has the largest open
database of licenses that I know of this side of the galaxy quadrant;)
https://scancode-licensedb.aboutcode.org/ tracks both open source and
proprietary licenses including. About 2,000 of them. ScanCode can
detect exactly and approximately a large range of known and unknown
licenses thanks to its small language model based on ~ 30,000 license
samples.
So this is already in ClearlyDefined and all these licenses come with
a stable SPDX LicenseRef. It would be stupid not to leverage this or
to create something else for this.

--
Cordially
Philippe Ombredanne

Nick Vidal

unread,
Sep 29, 2023, 1:46:56 PM9/29/23
to Philippe Ombredanne, clearly...@googlegroups.com
Hi Philippe,

That's a great resource, thanks for sharing. I'll be forwarding this info to GitHub.

Kind regards,
Nick

E. Lynette Rayle

unread,
Feb 5, 2024, 3:50:17 PM2/5/24
to clearlydefined
I copied this discussion to clearlydefined/service/Issue#1008 which proposes use of LicenseRef to support licenses beyond SPDX.  If you have additional comments, it would be great to get them in that issue.  This will give transparency to our decision making process.

Thanks Nick and everyone who commented.  It's great to see this discussion happening.

Regards,
E. Lynette Rayle
Reply all
Reply to author
Forward
0 new messages