April 6th deadline RE: Multiple License Support Proposal Request for Community Feedback

56 views
Skip to first unread message

James Myers

unread,
Apr 1, 2021, 5:38:26 PM4/1/21
to dataverse...@googlegroups.com

All,

Thanks for the comments on the proposal on how to support multiple licenses in the Dataverse software. I wanted to follow up with a request that any further comments/questions be submitted by April 6th. (Comments are always welcome, but to impact the planned work rather than follow on efforts, we need them soon.)

 

Also - while the basic proposal is to add support for new licenses, there are changes proposed for the Terms tab and on default settings that could impact current practices and may be important to you. To make that clearer, I've included a few 'FAQ' items below (also in the google doc) that address some specific situations and explain how they would be handled with the proposed changes. Questions/comments/suggestions about these details are also welcome.

 

Lastly, since we also have the community meetings scheduled next week, we have an opportunity to discuss/answer questions there as well.

 

Thanks,

--Jim

 

 

Multiple License Frequently Asked Questions List:



1.    In our repository, users have sometimes selected a CC-0 license and then specified additional terms in fields such as Confidentiality Declaration, Restrictions, Citation Requirements, Depositor Requirements, Conditions, and Disclaimer. How will these existing datasets be handled?

a.            Will it be possible to continue this practice, i.e. selecting a standard license and then specified additional terms in fields such as Confidentiality Declaration, Restrictions, Citation Requirements, Depositor Requirements, Conditions, and Disclaimer?



A.            In reviewing existing Datasets, we've seen that most uses of these Terms tab fields either conflict with the meaning of CC-0 and/or address aspects that CC-0 doesn't cover. The intended meaning in both cases appears to be that the dataset is ‘licensed under CC-0 except where its terms are superseded by the specific entries in the listed fields.’ In either case, characterizing such datasets as using a CC-0 license, particularly in machine-readable metadata, as is done in the current Dataverse release, is not appropriate. 

 

With the proposed changes, these fields on the Terms tab will only be available if the user selects 'Custom License'. If an administrator does not allow 'Custom' as an option, the listed terms fields would not be available (and thus users would not be able to add terms that undermine/alter with the allowed licenses). 

 

For existing datasets, we propose to address these cases with an automated script that would a) switch them to having a 'Custom' license and b) to prepend a statement in the 'Terms of Use' field such as "This Dataset was published with a Creative Commons CC-0 license modified by the following terms: " (final language TBD, suggestions welcome). This will work regardless of whether an admin allows a Custom license for new datasets or not (see FAQ #5)

 

For new publications, 

·         If a Dataverse repository allows Custom licenses, users can use similar language (to that shown above for the existing data case) in the "Terms of Use' field to indicate their intent to modify an existing standard license.

·         If a Dataverse repository does not allow Custom licenses to be defined by users, it would still be possible for an administrator to define such a compound license and register it as one of the available standard license options (see FAQ #6 below)



2.            The "Disclaimer" field doesn't seem like one that would conflict with a license - shouldn't it be allowed with a standard license?



A.            It's possible that there are important use cases for this and we could consider modifying the proposal (now, or in a future update) to enable adding a Disclaimer with a standard license. That said, it is expected that many licenses will address this (in fact, CC-0 provides a disclaimer in section 4.b), so this would re-open the door to conflicts.



3.            Our users have datasets in which CC-0 has been selected but restricted files have additional constraints that are described in the Terms of Access field. How can this case be handled?



A.            The ability to enter 'Terms of Access' that apply to restricted files continues to be allowed with the proposed changes, regardless of which license is selected. The proposal does include changing the name of this field to "Terms of Access for Restricted Files" to clarify that it applies only to restricted files. While recognizing that 'Terms of Access' can also conflict with a license in a similar manner to the fields discussed above, restricting files itself presents a potential conflict (what does it mean if a dataset has a 'CC-0' license and a file is restricted (and access requests aren't allowed)?) Given that, we felt that keeping a 'Terms of Access for Restricted Files' entry available would be better than allowing restriction but not showing this field.



4.            Will there be a default set of allowed or recommended licenses included with in the Dataverse software release?



A.            For backward compatibility, the proposal is to only include 'CC-0' by default and to then provide a set of possible/recommended licenses with example commands to show how to register them. (This would be similar to how previewers are managed - they are not included in the Dataverse software release, but there is a GDCC github page where you can cut/paste a list of curl commands that will register a standard set of previewers.) If the community develops a recommended list(s) of licenses, it would be straight-forward to provide the example commands to register them in a given repository. (Also note that there are comprehensive lists of open source licenses (e.g. SPDX ) and the ability for an admin to select a license from such a list has already been discussed as potential future work.)



5.            We are migrating datasets with license X into our Dataverse repository but we don't want to allow users to create/publish new datasets/versions of datasets with that license. How do we manage this?



A.            The proposed work includes the concept of whether a license is 'active' or not. Only 'active' licenses will appear in the list from which users can make a selection. Changing a license from 'active' to 'inactive' allows a repository to retire an obsolete license - previously published dataset versions would still use this license but future publication would require use of one of the currently allowed choices. Similarly, the proposal specifies that the admin API calls to import/migrate datasets allow specification of licenses that are 'inactive', thus allowing import of datasets with a given license from other repositories without making that license available for ongoing use.



6.            Our Dataverse repository contains special collections with specific terms or a "Data Use Agreement". How can these be handled?



A.            There are two options:

·         An admin can create a predefined 'Collection X License' that can be made available as choices when selecting a license. The expectation in the design is that such a license would be hosted on an external webpage (and thus has a URL) and that it would be registered with a Dataverse instance by specifying a name/short description/URL/ and optional logo. (Such a license could potentially be assembled by concatenating the existing terms fields into a single document.) In this use case, it would also be possible to select this particular license in a template for a given Dataverse collection, thus making it the default for Datasets created within the special collection.

·         If custom licenses are allowed, an existing set of Terms could be copied into any Dataset or added to a Dataverse collection metadata template to provide default values for any Dataset created in the collection.

 

The second option is essentially what is possible in Dataverse today. The former option provides a potential improvement in that the special collection license would be given a standard URL and it would be possible to recognize, in a machine-readable way, that the two datasets from the collection share the same license.

 

It should also be noted that while we are primarily using the term 'license' when referring to what can be selected in the current documentation, we are considering using the term "License/DUA" or "License/Data Use Agreement" for the license field. Nominally, the name/description of registered licenses could use the term "License" or "Data Use Agreement" as desired.




7.            Our users have a requirement to set different licenses on different files in a dataset. How can this be handled?



A.            Providing full support for specifying a license per file was deemed out of scope for now. There are two options as a work-around:

·         If custom licenses are allowed, a user can add "Terms of Use' that indicate specific licenses for specific datafiles (or types of datafiles, e.g. those with a 'Documentation' tag).

·         An admin can create a predefined 'mixed' license(s) that can be made available as choices when selecting a license.

 

The primary limitation of these approaches is that machine-readability is limited. For example, consider a dataset with a custom license with "Terms of Use" stating "CC-0 for Documentation files and CC-By for Data files"). In metadata exports, this dataset would be seen as having a custom license with a URL that would direct to the HTML page with the Terms shown (nominally the Dataset page with the Terms tab shown in the current proposal) rather than any direct reference to CC-0 or CC-By.



8.            Our users have indicated that files are embargoed in some of the fields on the Terms tab (and plan to manually update the dataset to un-restrict files at the correct date). How can that be handled with the proposed changes?



A.            We hope that the proposed implementation of embargoes in the Dataverse software will provide a better way to manage this. (That update may even be in the same Dataverse release as support for multiple licenses.) In the meantime/only considering this proposal, such a dataset would be considered to have a custom license and wouldn't require further changes. (See FAQ #1 about automatic changes to datasets that use CC-0 and custom terms). When the embargo ends, if there are no other custom terms, the terms for the post-embargo version could be updated to include a standard CC-0 (or other) standard license.



9.            Why are you making this so complex? :-)



A.            :-) Hopefully the complexity is really coming from the complexity of licensing itself, in combination with the rich functionality available in Dataverse, rather than something this proposal is injecting. That said, the proposed work is trying to make it easier for administrators to restrict users to standard licenses if desired and to avoid the potential for conflicting terms that exists today. In practice, we hope that the choice to select a standard license or enter custom terms will not be any harder to use than having a CC-0 waiver and several potentially conflicting fields visible.

 

 

From: dataverse...@googlegroups.com [mailto:dataverse...@googlegroups.com] On Behalf Of James Myers
Sent: Wednesday, March 24, 2021 1:12 PM
To: Dataverse Users Community
Subject: [Dataverse-Users] Multiple License Support Proposal Request for Community Feedback

 

All,

As I mentioned in the community calls, several of us have been working to refine a proposal from DANS that would allow admins to configure the Dataverse software to support additional/alternate licenses for Datasets. In doing this, we’ve reviewed related issues and email discussions, and considered backward compatibility issues, user interface impacts, and the potential for future enhancements. The resulting proposal is below.

 

In brief:

 

The proposed work will make it possible for site administrators to configure their Dataverse installation with additional/alternate pre-defined license choices. Admins will also be able to decide whether custom licenses can be created (i.e. by adding entries for some of the existing entries on the Terms tab), and whether there is a default license and, if so, which license it is. Users will be able to select from a list of configured licenses prior to publishing a Dataset version or to enter custom terms (if custom licenses are allowed). The chosen license will also be more visible to both Dataset creators and those browsing or downloading from a Dataset– i.e. on the dataset page, in the publish dialog.

 

The purpose of this email is to given the community a heads-up that work is proceeding on a frequently requested feature and to solicit feedback from community members regarding a) whether there are any concerns about this work being merged into a future Dataverse version, and b) whether there are ‘friendly amendments’ where small amounts of work could make the implementation more useful. Given that the proposed work is generally backward compatible (admins can choose to only support CC-0 or custom terms going forward), we’re hopeful that there won’t be significant concerns, but we certainly want to know if there are. With respect to friendly amendments – please look closely at the “Limitations” section in the text below which addresses several areas where the rationale for scoping the work as presented were made and where several previously requested enhancements (such as file-level licenses) are discussed in terms of both work-arounds and potential future development.

 

The text below provides further detail about the proposed work. It in turn is drawn from a living google doc where earlier discussion and comments are also captured. If you have comments about the proposed work, feel free to respond to this email and/or add comments to the linked document.

 

Thanks to the team at IQSS, to those who’ve contributed to prior discussions and submitted issues on this topic, and to DANS both for supporting the effort and helping to refine the proposal!

 

Cheers,

  -- Jim

 

PS – This is one example of many new enhancements being proposed for Dataverse where we’re trying to support a process that includes community input while also considering the impacts on Dataverse design and usability and the requirements/timelines/and scope limitations of those able to do the work. Towards that end, we’ve tried to have a process that is open but also relied on a smaller volunteer group to make progress towards a concrete proposal(s) before making a final broad request for feedback. I’m hoping that a similar process can be used in further community-member sponsored work and in working groups as well, so I’d appreciate feedback about the process as well and where we might improve going forward. (Email me, comment in the community working group slack or just start a new email thread for the group so we can keep the topic separate from the discussion of multiple licenses.)

 

Multiple License Support

 

Current State: 

Dataverse allows only CC-0 or custom terms to be specified for a Dataset. There is broad interest in supporting additional licenses in a managed, machine readable way.

 

Consensus Proposal: 

As part of its efforts, DANS has worked with IQSS and GDCC to define an update to support multiple licenses that it will implement and contribute to the Dataverse community. 

 

Changes: 

  • The primary visual changes will appear on the Dataset Terms tab where the option to select CC-0 or enter ‘Terms of Use’ will be replaced with the ability to select a configured license from a list, with an optional configuration allowing a ‘Custom’ entry that will allow users to type new terms.
  • The display of a license will include a name, optional icon, short description, and URL, with the expectation that the URL will point to a webpage with the complete license terms.
  • In addition, since several of the current text inputs on the Terms tab may conflict with a predefined license, these fields will now only show when a ‘Custom’ license is chosen. These fields are:
    • Terms of Use (which was only shown when CC-0 was not selected)
    • Confidentiality Declaration (which always shows in the current release, as is the case with the rest of these fields)
    • Special Permissions
    • Restrictions
    • Citation Requirements
    • Depositor Requirements
    • Conditions
    • Disclaimer
  • A custom license is the combination of entries in these fields and the URL defined for a custom license will redirect to show these entries (e.g. redirecting to the Terms tab for the dataset).
  • The ‘Terms of Access’ field, where wording will be adjusted to clarify that they apply only to restricted files (i.e. “Terms of Access for Restricted Files”), and all of the descriptive fields on the current Terms page will remain editable regardless of license. 
  • Similarly, other changes to default text will emphasize License and/or Data Use Agreement (DUA), not say ‘Waiver’, etc.
  • The selected license will show on the Terms tab, in the pop-up download dialog where users will be asked to ‘Accept’ the terms to continue, and will display on the main Dataset page.
  • Admins will use new API calls, similar to that for external tools, to register, update, or delete specific licenses. Deleting a license will only be possible if it is not used in any Dataset. An additional option, to register a license as ‘inactive’ will allow existing published Dataset versions to use a given license while prohibiting new Datasets/new versions of Datasets from using it. 
  • Without admin action, the default license will remain CC-0 and CC-0 will remain the only license supported ‘out-of-the-box’, along with the ability to define a custom license using the set of terms listed above. 
  • Admins will be able to not allow creation of a custom license as an option. The default will be to allow them as is the case now.
  • Admins will be able to specify a different default license if desired, or to not have a default license. The additional displays of the selected license on the dataset page and in the publish dialog will make the current selection (or the fact that no license has been chosen) more obvious to the user.
  • Compatibility: Legacy use of the Terms tab entries will be interpreted as a custom license. If a CC-0 waiver was selected and additional Terms fields (from the subset listed above) were filled out, CC-0 information will be prepended in the Terms of Use field. Admins will be encouraged to review any cases where Terms entries were made and potentially change those datasets to use a relevant standard license (specifically, it appears common for people to have added ‘CC-By’ or other license in the ‘Terms of Use’ field and those could easily be changed to indicate use of CC-By from the standard list. Updating the dataset in this way would make the display of the license and the metadata exports clearer.)

Benefits:

  • Adds a widely requested feature to support multiple license options for Datasets
  • Avoids potential conflicts between user-entered terms and license terms
  • Allows local admins to define which licenses are allowed and whether users can create a custom license.
  • Licenses are ‘machine-readable’ and they can be included in metadata exports in standard ways. Custom licenses are also given a unique URL that can be used in exports and they still retain the structure provided by the multiple input fields on the Terms page (‘Terms of Use’ and those listed above)
  • The ‘out-of-the-box’ configuration will retain CC-0 as the only and default standard license and will still allow entering custom terms instead.

Limitations

  • Registered licenses apply to a Dataset and all of its files/there is no direct support for assigning different licenses to different files. As a workaround, administrators can define licenses that include provisions for different types of files (e.g. restricted and unrestricted files or those with specific tags). Similarly, if custom licenses are allowed, users can reference licenses in their text (i.e. the ‘Terms of Use’ entry could be ‘CC-0 for ‘Documentation’ files and CC-By for ‘Data’ file.)
    • Rationale: Supporting licenses at the file level was seen as requiring a much larger effort, with the workarounds above providing somewhat better support for this use case than the current release. The proposed changes are also expected to be compatible with/reduce the additional effort required for supporting file-level licenses if/when that is developed.
  • Licenses must be registered via API: Use of an external service to provide a list of open source licenses that an admin can choose from would be a logical extension of the current mechanism. Similarly, an admin user interface rather than API would be possible.
    • Rationale: These types of additions were not added to the proposal simply because they add to the required effort and because they should be straight-forward additions if pursued in follow-on efforts.
  • Custom licenses are Dataset specific: Adding the same text in the Terms tab inputs for different datasets will result in two custom licenses (different URLs) with the same content. Similarly, it is not possible for custom terms added for one dataset to be reused in another by referencing the URL. Workarounds include setting Terms in a Dataset template, or for an admin to assemble custom terms into a new license that could then be registered and added to the list of choices.
    • Rationale: Leaving things this way is mostly an issue of additional effort being needed if terms don’t have a 1-1 relationship with datasets as they do now. However, there was also concern that automatically making new custom licenses as visible as, and as easy to reuse as, the configured standard licenses was not a  great option, so exactly how custom licenses should be treated would also require more thought and discussion.
  • The list of available licenses is global within a Dataverse instance and not per Dataverse collection. 
    • Rationale: This was also seen as something that could be added in future work.
  • Allowing a ‘Terms of Access’ field with a standard license choice could lead to conflicts.
    • Rationale: There was discussion as to whether terms for restricted files should only be allowed when selecting ‘Custom’, but the fact that doing this without also disallowing file restriction itself could be confusing as well (if I select CC-By and there are restricted files, what does that mean?). In the end, allowing this field to always be available and potentially adjusting its name, tooltip help and description in the guide to indicate its purpose and relationship to the selected license would be a better choice. It was also understood that, if this is a particular concern in the community, an option to ‘not allow restricted files to have additional terms’ with standard licenses could be added.
  • Some repositories would like to require users to select a license prior to publication. The proposed design does not enforce this, i.e. it does not disable the publish button or cause publication to fail if the repository does not specify a default license and the user has not entered anything.
    • Rationale: Additional thought/discussion will be needed to decide how a no license case should be handled, so the current design just proposes to address this by making the license (or a lack thereof) more visible - on the dataset page itself (not just in the Terms tab) and in the publish dialog. Further, as in the current Dataverse release, admins will be able to specify a default license if they wish to ensure that all publications have a license. A future addition could add a mechanism to enforce having the user specify a license before they are allowed to publish.

 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/MN2PR07MB7343A2B1D338DC247D0474B8BF639%40MN2PR07MB7343.namprd07.prod.outlook.com.

Julian Gautier

unread,
Apr 2, 2021, 12:51:41 PM4/2/21
to Dataverse Users Community
This is awesome! Thanks as always Jim! Tania and I discussed the proposal with folks at the Harvard Dataverse Repository and thought it might be helpful for the community to see their feedback, too. Generally they were happy about the proposed changes. Here's the summary:
  • If you restrict a file, they prefer that you MUST either enter information in the "Terms of Access for Restricted Files" field OR enable access request (using the Access Request feature). Using the Access Request feature isn't always possible because some collections have a separate process for requesting access to restricted files. So in those cases information about that process should be available, and using the Terms of Access for Restricted Files should be required.
  • Datasets should always include information about how the data can/can't be used. They prefer that depositors not be able to publish datasets with no license or no custom terms, instead of only making it more apparent to the depositor that they're about to publish a dataset with no information about how it can and can't be used. This is happening now and it causes confusion.
  • When choosing a license from the list, it would be helpful if the UI somehow provides more information about the license.
  • When someone finds a dataset with a custom license/DUA, they see "Custom terms" on the dataset page. Some collections limit use to university affiliates, and it would be helpful if they saw that upfront as well.


Amber Leahey

unread,
Apr 6, 2021, 10:41:44 AM4/6/21
to Dataverse Users Community
Hi all, I commented in the Github issue for this but I think this would be a great feature , the current workflow we have setup now uses the templates feature so that any new depositor is offered the options to select a pre-defined template with the name of the CC license and a Custom option should they choose it. It would be great to make this more "required" and to make users more aware of the default template as CC0 before publishing (heard from our community on that last point, some have suggested a pop up that appears after clicking the publish button to ensure licensing is correct / appropriate beforehand). 
Here is a screenshot of the deposit form w/CC templates setup:
Screen Shot 2021-03-31 at 1.05.48 PM.png 

Sebastian Karcher

unread,
Apr 6, 2021, 12:24:06 PM4/6/21
to dataverse...@googlegroups.com
Hi,
Thanks for working on this. I have reviewed this for QDR and this looks great for us, in particular the ability to specify a custom license which populates the terms (I hope I understood that correctly). We remain interested in file-level licensing, but completely understand this is out of scope for this.

Sebastian



--
Sebastian Karcher, PhD
www.sebastiankarcher.com
Reply all
Reply to author
Forward
0 new messages