accessing set of packages where declared != concluded

1 view
Skip to first unread message

Luis Villa

unread,
Oct 23, 2025, 1:04:52 PM10/23/25
to clearly...@googlegroups.com
Is there a good way (via API or otherwise) to get a list of packages where concluded is not null, and where declared != concluded? The purpose is to create a testing data set to help understand when/how those differ - doesn't need to be super-comprehensive, a random sampling of 50-100 would be completely fine.

--

Luis Villa | Sonar

VP, Legal: Product and Policy

Chat? Book me | Urgent? Slack me | TZ: US-Pacific 

sonar.com | lu.is | Pronouns: he/him


Nick Vidal

unread,
Oct 23, 2025, 3:29:36 PM10/23/25
to Luis Villa, Philippe Ombredanne, clearly...@googlegroups.com
Hi Luis,

I'm putting you in touch with Philippe, who might be able to help you. We are working to make the ClearlyDefined data more accessible and Philippe is leading that effort. 

Kind regards, 
Nick 


--
You received this message because you are subscribed to the Google Groups "clearlydefined" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clearlydefine...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/clearlydefined/CANR-xpRo469huL_UrW%3D8eYPQVWHAMmE6pB2LC1ujPKGyTOEvaA%40mail.gmail.com.

Philippe Ombredanne

unread,
Nov 4, 2025, 3:13:43 PM11/4/25
to Nick Vidal, Luis Villa, clearly...@googlegroups.com
Hi Luis!

> On Thu, Oct 23, 2025, 2:04 PM 'Luis Villa' via clearlydefined <clearly...@googlegroups.com> wrote:
>>
>> Is there a good way (via API or otherwise) to get a list of packages where concluded is not null, and where declared != concluded? The purpose is to create a testing data set to help understand when/how those differ - doesn't need to be super-comprehensive, a random sampling of 50-100 would be completely fine.

You best bet would be to look at curations in
https://github.com/clearlydefined/curated-data/tree/master/curations
In the common case, the curation would be the concluded and the
declared the license detected by ScanCode
Note that the difference could be:
- bug in scancode (that could be already corrected)
- missing license info
- lack of clarity in the declared license if detected correctly
- ambiguity with complex license
- all of the above!

If you can share your experiment results, that would be awesome
--
Cordially
Philippe Ombredanne
AboutCode.org
Package URL (PURL), ScanCode, DejaCode, PurlDB and VulnerableCode
Book a call at https://cal.com/pombreda
Reply all
Reply to author
Forward
0 new messages