Multiple License Support Proposal Request for Community Feedback

105 views
Skip to first unread message

James Myers

unread,
Mar 24, 2021, 1:11:43 PM3/24/21
to Dataverse Users Community

All,

As I mentioned in the community calls, several of us have been working to refine a proposal from DANS that would allow admins to configure the Dataverse software to support additional/alternate licenses for Datasets. In doing this, we’ve reviewed related issues and email discussions, and considered backward compatibility issues, user interface impacts, and the potential for future enhancements. The resulting proposal is below.

 

In brief:

 

The proposed work will make it possible for site administrators to configure their Dataverse installation with additional/alternate pre-defined license choices. Admins will also be able to decide whether custom licenses can be created (i.e. by adding entries for some of the existing entries on the Terms tab), and whether there is a default license and, if so, which license it is. Users will be able to select from a list of configured licenses prior to publishing a Dataset version or to enter custom terms (if custom licenses are allowed). The chosen license will also be more visible to both Dataset creators and those browsing or downloading from a Dataset– i.e. on the dataset page, in the publish dialog.

 

The purpose of this email is to given the community a heads-up that work is proceeding on a frequently requested feature and to solicit feedback from community members regarding a) whether there are any concerns about this work being merged into a future Dataverse version, and b) whether there are ‘friendly amendments’ where small amounts of work could make the implementation more useful. Given that the proposed work is generally backward compatible (admins can choose to only support CC-0 or custom terms going forward), we’re hopeful that there won’t be significant concerns, but we certainly want to know if there are. With respect to friendly amendments – please look closely at the “Limitations” section in the text below which addresses several areas where the rationale for scoping the work as presented were made and where several previously requested enhancements (such as file-level licenses) are discussed in terms of both work-arounds and potential future development.

 

The text below provides further detail about the proposed work. It in turn is drawn from a living google doc where earlier discussion and comments are also captured. If you have comments about the proposed work, feel free to respond to this email and/or add comments to the linked document.

 

Thanks to the team at IQSS, to those who’ve contributed to prior discussions and submitted issues on this topic, and to DANS both for supporting the effort and helping to refine the proposal!

 

Cheers,

  -- Jim

 

PS – This is one example of many new enhancements being proposed for Dataverse where we’re trying to support a process that includes community input while also considering the impacts on Dataverse design and usability and the requirements/timelines/and scope limitations of those able to do the work. Towards that end, we’ve tried to have a process that is open but also relied on a smaller volunteer group to make progress towards a concrete proposal(s) before making a final broad request for feedback. I’m hoping that a similar process can be used in further community-member sponsored work and in working groups as well, so I’d appreciate feedback about the process as well and where we might improve going forward. (Email me, comment in the community working group slack or just start a new email thread for the group so we can keep the topic separate from the discussion of multiple licenses.)

 

Multiple License Support

 

Current State: 

Dataverse allows only CC-0 or custom terms to be specified for a Dataset. There is broad interest in supporting additional licenses in a managed, machine readable way.

 

Consensus Proposal: 

As part of its efforts, DANS has worked with IQSS and GDCC to define an update to support multiple licenses that it will implement and contribute to the Dataverse community. 

 

Changes: 

  • The primary visual changes will appear on the Dataset Terms tab where the option to select CC-0 or enter ‘Terms of Use’ will be replaced with the ability to select a configured license from a list, with an optional configuration allowing a ‘Custom’ entry that will allow users to type new terms.
  • The display of a license will include a name, optional icon, short description, and URL, with the expectation that the URL will point to a webpage with the complete license terms.
  • In addition, since several of the current text inputs on the Terms tab may conflict with a predefined license, these fields will now only show when a ‘Custom’ license is chosen. These fields are:
    • Terms of Use (which was only shown when CC-0 was not selected)
    • Confidentiality Declaration (which always shows in the current release, as is the case with the rest of these fields)
    • Special Permissions
    • Restrictions
    • Citation Requirements
    • Depositor Requirements
    • Conditions
    • Disclaimer
  • A custom license is the combination of entries in these fields and the URL defined for a custom license will redirect to show these entries (e.g. redirecting to the Terms tab for the dataset).
  • The ‘Terms of Access’ field, where wording will be adjusted to clarify that they apply only to restricted files (i.e. “Terms of Access for Restricted Files”), and all of the descriptive fields on the current Terms page will remain editable regardless of license. 
  • Similarly, other changes to default text will emphasize License and/or Data Use Agreement (DUA), not say ‘Waiver’, etc.
  • The selected license will show on the Terms tab, in the pop-up download dialog where users will be asked to ‘Accept’ the terms to continue, and will display on the main Dataset page.
  • Admins will use new API calls, similar to that for external tools, to register, update, or delete specific licenses. Deleting a license will only be possible if it is not used in any Dataset. An additional option, to register a license as ‘inactive’ will allow existing published Dataset versions to use a given license while prohibiting new Datasets/new versions of Datasets from using it. 
  • Without admin action, the default license will remain CC-0 and CC-0 will remain the only license supported ‘out-of-the-box’, along with the ability to define a custom license using the set of terms listed above. 
  • Admins will be able to not allow creation of a custom license as an option. The default will be to allow them as is the case now.
  • Admins will be able to specify a different default license if desired, or to not have a default license. The additional displays of the selected license on the dataset page and in the publish dialog will make the current selection (or the fact that no license has been chosen) more obvious to the user.
  • Compatibility: Legacy use of the Terms tab entries will be interpreted as a custom license. If a CC-0 waiver was selected and additional Terms fields (from the subset listed above) were filled out, CC-0 information will be prepended in the Terms of Use field. Admins will be encouraged to review any cases where Terms entries were made and potentially change those datasets to use a relevant standard license (specifically, it appears common for people to have added ‘CC-By’ or other license in the ‘Terms of Use’ field and those could easily be changed to indicate use of CC-By from the standard list. Updating the dataset in this way would make the display of the license and the metadata exports clearer.)

Benefits:

  • Adds a widely requested feature to support multiple license options for Datasets
  • Avoids potential conflicts between user-entered terms and license terms
  • Allows local admins to define which licenses are allowed and whether users can create a custom license.
  • Licenses are ‘machine-readable’ and they can be included in metadata exports in standard ways. Custom licenses are also given a unique URL that can be used in exports and they still retain the structure provided by the multiple input fields on the Terms page (‘Terms of Use’ and those listed above)
  • The ‘out-of-the-box’ configuration will retain CC-0 as the only and default standard license and will still allow entering custom terms instead.

Limitations

  • Registered licenses apply to a Dataset and all of its files/there is no direct support for assigning different licenses to different files. As a workaround, administrators can define licenses that include provisions for different types of files (e.g. restricted and unrestricted files or those with specific tags). Similarly, if custom licenses are allowed, users can reference licenses in their text (i.e. the ‘Terms of Use’ entry could be ‘CC-0 for ‘Documentation’ files and CC-By for ‘Data’ file.)
    • Rationale: Supporting licenses at the file level was seen as requiring a much larger effort, with the workarounds above providing somewhat better support for this use case than the current release. The proposed changes are also expected to be compatible with/reduce the additional effort required for supporting file-level licenses if/when that is developed.
  • Licenses must be registered via API: Use of an external service to provide a list of open source licenses that an admin can choose from would be a logical extension of the current mechanism. Similarly, an admin user interface rather than API would be possible.
    • Rationale: These types of additions were not added to the proposal simply because they add to the required effort and because they should be straight-forward additions if pursued in follow-on efforts.
  • Custom licenses are Dataset specific: Adding the same text in the Terms tab inputs for different datasets will result in two custom licenses (different URLs) with the same content. Similarly, it is not possible for custom terms added for one dataset to be reused in another by referencing the URL. Workarounds include setting Terms in a Dataset template, or for an admin to assemble custom terms into a new license that could then be registered and added to the list of choices.
    • Rationale: Leaving things this way is mostly an issue of additional effort being needed if terms don’t have a 1-1 relationship with datasets as they do now. However, there was also concern that automatically making new custom licenses as visible as, and as easy to reuse as, the configured standard licenses was not a  great option, so exactly how custom licenses should be treated would also require more thought and discussion.
  • The list of available licenses is global within a Dataverse instance and not per Dataverse collection. 
    • Rationale: This was also seen as something that could be added in future work.
  • Allowing a ‘Terms of Access’ field with a standard license choice could lead to conflicts.
    • Rationale: There was discussion as to whether terms for restricted files should only be allowed when selecting ‘Custom’, but the fact that doing this without also disallowing file restriction itself could be confusing as well (if I select CC-By and there are restricted files, what does that mean?). In the end, allowing this field to always be available and potentially adjusting its name, tooltip help and description in the guide to indicate its purpose and relationship to the selected license would be a better choice. It was also understood that, if this is a particular concern in the community, an option to ‘not allow restricted files to have additional terms’ with standard licenses could be added.
  • Some repositories would like to require users to select a license prior to publication. The proposed design does not enforce this, i.e. it does not disable the publish button or cause publication to fail if the repository does not specify a default license and the user has not entered anything.
    • Rationale: Additional thought/discussion will be needed to decide how a no license case should be handled, so the current design just proposes to address this by making the license (or a lack thereof) more visible - on the dataset page itself (not just in the Terms tab) and in the publish dialog. Further, as in the current Dataverse release, admins will be able to specify a default license if they wish to ensure that all publications have a license. A future addition could add a mechanism to enforce having the user specify a license before they are allowed to publish.

 

fa...@kb.dk

unread,
Jul 26, 2021, 6:54:13 AM7/26/21
to Dataverse Users Community
Hi everyone,

I was just reading about the  new requirements for open and FAIR research data in the Horizon Europe framework programme. In the Annotated Grant Agreement, it says that data must be published with the latest version of CC0 or CC-BY (or equivalent), whereas metadata must always must be published with CC0 (or equivalent). See https://ec.europa.eu/info/funding-tenders/opportunities/docs/2021-2027/common/guidance/aga_en.pdf, page 160

I wonder how it would work in practice when users want to use different licenses for metadata and files and it seems related to the first limitation mentioned above:
Registered licenses apply to a Dataset and all of its files/there is no direct support for assigning different licenses to different files. As a workaround, administrators can define licenses that include provisions for different types of files (e.g. restricted and unrestricted files or those with specific tags). Similarly, if custom licenses are allowed, users can reference licenses in their text (i.e. the ‘Terms of Use’ entry could be ‘CC-0 for ‘Documentation’ files and CC-By for ‘Data’ file.)

Does anyone have experience with that? Or any other good ideas?

Best regards,
Falco

Philip Durbin

unread,
Aug 11, 2021, 10:39:30 AM8/11/21
to dataverse...@googlegroups.com
I don't have much to add except that Philipp Conzett from DataverseNO started a discussion about metadata licensing here: https://github.com/IQSS/dataverse/issues/6888

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/ef6af4ed-2025-4319-97ef-0b0749eb2c04n%40googlegroups.com.


--

Philipp at UiT

unread,
Aug 14, 2021, 12:26:47 PM8/14/21
to Dataverse Users Community
Hi Falco,

My interpretation is that the license(s) specified in the dataset Terms tab are about the data in a dataset, not the metadata as entered in the metadata schema(s). Currently, there is no machine-readable way to specify metadata license in Dataverse. That's why I open the issue Phil mentioned above (https://github.com/IQSS/dataverse/issues/6888). If you by metadata mean metadata included in/being a file, there has been a request for being able to specify license/terms of use at file level, see e.g. https://github.com/IQSS/dataverse/issues/1753 and https://github.com/IQSS/dataverse/issues/4391. I couldn't find any more current issue. Maybe Jim has some more up-to-date news.

Best, Philipp

James Myers

unread,
Aug 16, 2021, 4:42:14 PM8/16/21
to dataverse...@googlegroups.com

I guess the one comment I have is that I think metadata has always been assumed to be released as CC0 and metadata export formats that support indicating the metadata license should include the CC0 on metadata, but I’m not sure whether the ones that support it actually have that info now, or whether they might assume the same license as the dataset in the code when they shouldn’t (as the dataset can only have CC0/none until the multiple license work is merged.). For example, the OAI-ORE format has explicit sections separating metadata/license info about the metadata from metadata/license info about the dataset itself, but I see the current version doesn’t include a CC0 license as it should (it does show a creator (the Dataverse’s branding name) and modification date (when the export is done) for the metadata that are not the same as the dataset’s creator/publication date.).

 

To move #6888 forward, it would probably be useful to have a list of which export formats, and what elements should be used for it so we can add them (probably easier to test if we get the multiple licenses work in first so we can see the licenses are different).

 

W.r.t. different licenses on different files, there’s no work yet to support that beyond what will be possible in the multiple-licenses PR which would allow you to specify a ‘custom license’ who’s terms would list the application of different standard licenses to individual files. (An improvement from the current case in some ways, but still not really machine readable).

 

  -- Jim

Reply all
Reply to author
Forward
0 new messages