Adding two licenses to a dataset

275 views
Skip to first unread message

linda.re...@dans.knaw.nl

unread,
Apr 7, 2025, 7:29:45 AMApr 7
to Dataverse Users Community
Following up on several discussions about licenses already taking place in the Dataverse Community (see for example this issue: https://github.com/IQSS/dataverse/issues/4391), we (DANS) have also started to look into this issue.    
At the moment licenses are set at dataset or version level, and only one license is possible. This means that in a dataset which contains a mixed bag of open and restricted files, all open files are bound by the restricted license. 
We propose a dataset could have two licenses, one applying to open and one to restricted access files. These two licenses are specified at dataset level. Which license applies to a file depends on whether the file has restricted access or not. This makes it easy for users to specify the appropriate license via both UI and API, without having to manually select a license for each file. Only minimal alterations are needed to achieve this change.
 
We're currently looking for insights and feedback from the community. What do you think about this solution?

 - Linda Reijnhoudt (DANS-KNAW)

o.be...@fz-juelich.de

unread,
Apr 7, 2025, 10:02:51 AMApr 7
to Dataverse Users Community
Hi Linda!

I have been thinking about this topic a few times over the past year, as some of our scientists have related questions and I talked with others at conferences that showed me it is quite a common use case to combine different licenses in one publication. This is especially true when we talk about software publications, for which support has been recently added. In software publications it is completely normal to have different parts under different licenses when things are inherited from older code, forked, remixed, etc or talking about different areas of the code like test data or documentation.

Now I'm wondering if your ideas and others that asked for support of file level licenses might align with my idea of introducing support for the REUSE framework (https://reuse.software) into Dataverse.
It's a well renowned standard, even used in industry contexts. Also, it comes with a community and ecosystem of tools supporting it to make applying the licenses easier and also checking the compliance in automated ways.

Maybe a Dataset in Dataverse could have a REUSE "license" and contain (as defined in the REUSE spec) the necessary license files and annotations within the contained files. That would still leave the option to index these licenses for all the files in a later step of an implementation, also making the licenses searchable. The support can be introduced gradually, maybe starting with a "license" file, continuing with a compliance check, and so on.

Note: This still leaves the edge case of dual or multi licensing not being expressable in Dataverse, but I'm not sure if this is a hard requirement for anything. (And it didn't sound like you asked for that.)

Cheers,
Oliver

Philip Durbin

unread,
Apr 7, 2025, 10:41:11 AMApr 7
to dataverse...@googlegroups.com
Speaking of what's expressible, does anyone know how DataCite handles multiple licenses within the same dataset?

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/6be6078b-fa5c-4ecc-ada3-863d405fc01en%40googlegroups.com.


--

Richard Dennis

unread,
Apr 7, 2025, 10:59:38 AMApr 7
to Dataverse Users Community

Dear Linda, Oliver, and Dataverse Community,

Thank you for raising this important topic and for the thoughtful proposals on improving licensing flexibility within Dataverse.

The dual licensing model Linda proposed and the REUSE framework integration suggested by Oliver offer promising avenues to enhance license granularity, improve usability, and align with open science practices. I agree that these proposals address real challenges, particularly for datasets that contain a mix of open and restricted content or composite materials such as software, documentation, and test data.

That said, I believe it is essential that we further ground these discussions in the actual needs of the research community. Have researchers actively requested file-level or differentiated licensing in your respective institutions or projects? If so, in what disciplines or use cases?

From my experience, there is often a gap between what developers implement and what researchers find intuitive or necessary in their day-to-day workflows. It would be helpful to gather more concrete feedback or user stories to understand better how widespread this need is and to determine whether a significant portion of the Dataverse user base would adopt and benefit from such functionality.

While Dataverse can indeed be used as a software repository in certain contexts, it is fundamentally a research data repository. As such, we should carefully consider whether development efforts should prioritize features outside its core mission. With limited resources, there is always the risk of diverting attention from foundational enhancements that would benefit the broader user community.

In summary, I fully support further exploration of both proposals, especially if accompanied by clear user-driven requirements. I would recommend:

  • Conducting a targeted needs assessment across institutions and disciplines,

  • Identifying specific use cases where these licensing models add value,

  • Evaluating implementation costs relative to broader strategic priorities for Dataverse.

Thank you again for initiating this valuable discussion. 

Best regards,

Richard Dennis

University of Copenhagen

Sherry Lake

unread,
Apr 7, 2025, 11:23:52 AMApr 7
to dataverse...@googlegroups.com
Datacite "rights" is 0-n (none or more): https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf

Not sure how, if the specific license can be linked to the specific files?

--
Sherry


Dorothea Iglezakis

unread,
Apr 7, 2025, 12:26:59 PMApr 7
to Dataverse Users Community
Dear Linda,

in our instance, we also have several datasets with more than one license, most of them, because they consists partly of software code and partly of data. At the moment, we handle this situation by defining a custom license that states, which parts of the data are are licenced under which license, see as an example darus-1851, darus-3303. This is not optimal, because the licenses are not machine readable (at least not without effort). So we would be very interested to find a way to assign two or more licenses to a dataset, being able to clearly state, which license belongs to which part of the dataset without having to assign a license to each file.

But your solution (binding the license to the restriction status of a file) would not help us, because our mixed datasets normally do not have any file restriction.

Kind regards,

Dorothea Iglezakis (DaRUS)

linda.re...@dans.knaw.nl schrieb am Montag, 7. April 2025 um 13:29:45 UTC+2:

Dorothea Iglezakis

unread,
Apr 8, 2025, 8:11:37 AMApr 8
to Dataverse Users Community
If File-DOIs are activated, each file could have its own rightslist. The dataset could have a list of all rights that occur in the dataset.

Vaidas Morkevičius

unread,
Apr 9, 2025, 5:01:27 PMApr 9
to dataverse...@googlegroups.com
Dear Linda,

Just as DaRUS, we use custom licenses with descriptions of how files and metadata are licensed. This is suboptimal and we would surely be interested in alternatives (multiple licenses, specification of licences for each file in the dataset or similar).

Best wishes,
--
Vaidas Morkevičius


linda.re...@dans.knaw.nl

unread,
Jun 2, 2025, 7:48:02 AMJun 2
to Dataverse Users Community

Thank you for the feedback on our proposal. We understand that, for many use cases within the Community, our initial solution may not be suitable. We're therefore working on a new proposal that will offer greater flexibility while staying aligned with the current workflow.

Our revised approach proposes setting one default license at the dataset level, which would automatically apply to all newly uploaded files. Additionally, users would have the option to override this default and assign a different license at the individual file level, similar to how file restrictions or embargoes are currently handled. This approach does not limit the number of different licenses used within a dataset and does not place restrictions on the accessibility. 

We have also explored other options, including the REUSE protocol suggested by Oliver. However, we believe this updated solution better aligns with Dataverse's overall design, and provides a more immediate and user-friendly way to manage licenses through the user interface.

Does this revised approach address your use cases?

Linda Reijnhoudt (DANS-KNAW)

Oliver Bertuch

unread,
Jun 2, 2025, 7:52:20 AMJun 2
to 'linda.re...@dans.knaw.nl' via Dataverse Users Community

Our revised approach proposes setting one default license at the dataset level, which would automatically apply to all newly uploaded files. Additionally, users would have the option to override this default and assign a different license at the individual file level, similar to how file restrictions or embargoes are currently handled. This approach does not limit the number of different licenses used within a dataset and does not place restrictions on the accessibility.

Sounds reasonable to me!

Also seems to be extendable at a later point to automatically ingest REUSE definitions in an upload using the existing file ingest framework (currently used to analyzed statistical files etc).

Cheers,
Oliver

-- 
-------------------------------------------------------------------------------------
Curious visitors are welcome on campus on Sunday, September 7 from 10:00 to 17:00.
More at: www.tagderneugier.de/en

Oliver Bertuch
Forschungszentrum Jülich GmbH
Zentralbibliothek / Central Library
Forschungsdatenmanagement / Research Data Management
Entwicklung von Forschungssoftware / Research Software Engineering

52425 Jülich
+49 2461 61-85370
https://www.fz-juelich.de/zb

Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Dr. Stephanie Bauer (stellv. Vorsitzende),
Prof. Dr. Ir. Pieter Jansens, Prof. Dr. Laurens Kuipers
-------------------------------------------------------------------------------------

Philip Durbin

unread,
Jun 2, 2025, 9:44:13 AMJun 2
to dataverse...@googlegroups.com
Sounds reasonable to me as well. People have asked for per-file licensing for a while so I think it would be a popular feature!

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

Philipp Conzett

unread,
Jun 2, 2025, 3:50:21 PMJun 2
to Dataverse Users Community
Hi Linda,

Thank you for working on this! Could you please elaborate on what you mean by "default license at the dataset level"? Is the idea to have a license at dataset level (as is the case today) and at file level?

Best,
Philipp

linda.re...@dans.knaw.nl

unread,
Jul 29, 2025, 10:20:23 AMJul 29
to Dataverse Users Community

Dear Dataverse Community,

At DANS-KNAW, we are working on adding support for file-level licensing, instead of just dataset-level. To find a solution that works more broadly, we are looking for diverse use cases and current work-arounds within the Dataverse platform.

If you are interested in this topic, we would love to hear from you, including but not restricted to those community members that already responded to our previous question. Please reach out to us at info at dans.knaw.nl (with file licensing in the title) so we can have a conversation to better understand your needs. We would like to discuss what functionality you would like to see, how you are currently handling this, and what your users need. 

Feel free to share this invitation with others who might have relevant input!

Regards,
Linda Reijnhoudt, DANS-KNAW

Christian Bischof

unread,
Aug 27, 2025, 5:38:33 AM (10 days ago) Aug 27
to Dataverse Users Community

Hi,

in general, the most of our documentation files are cc-by licensed, and the restricted data files are licensed for scientific use. At the moment, we have quite long custom dataset terms e.g. https://data.aussda.at/dataset.xhtml?persistentId=doi:10.11587/LBAHIZ But, we plan to use Standard License Terms https://guides.dataverse.org/en/6.2/api/native-api.html#license-management-api , so we will have only a link to the licence texts e.g. https://aussda.at/en/aussda-scientific-use-licence-for-data-and-cc-by-for-documentation/.

To have a predefined license (cc-by) for documentation files and additional licenses for data files sounds great.

Best Christian

linda.re...@dans.knaw.nl

unread,
Sep 1, 2025, 3:25:50 AM (5 days ago) Sep 1
to Dataverse Users Community
Dear Dataverse Users Community, 

You're invited to join an online discussion on the topic of file level licenses, taking place on Thursday, September 4th from 3:00 PM to 4:00 PM CEST.
If you're interested, we'd be happy to have you join the conversation.
More details can be found in this Google Doc

Kind regards,
Linda Reijnhoudt (DANS-KNAW)

Philip Durbin

unread,
Sep 4, 2025, 7:48:34 AM (2 days ago) Sep 4
to dataverse...@googlegroups.com
Thanks for the invitation! I'm planning on joining. Sounds like it's in a little over an hour. See you there!

We also have a thread going in Zulip: #community > Support for Licenses on the File Level @ 💬

Reply all
Reply to author
Forward
0 new messages