Handling datasets that won't comply with licensing rules after v5.10

35 views
Skip to first unread message

Philipp at UiT

unread,
Apr 23, 2022, 9:10:39 AM4/23/22
to Dataverse Users Community
We've started preparing our Dataverse installation to be ready for multiple license support (see release notes for v5.10, in particular handling datasets that won't comply with licensing rules after v5.10. I've a couple of questions relating to the section "Handling Datasets that No Longer Comply With Licensing Rules" in the release note for v5.10:

1. What do the migration/update commands included in the release notes actually change? 1) The cell value(s) in the database table for the dataset version(s) in question, meaning that changes are made without there being created a new version of the dataset(s); or 2) Creating a new version of the dataset(s) with values that comply with the new licensing rules?

2. What about datasets where the latest published version of the dataset complies with the new licensing rules, but the same dataset has one or several earlier versions which do not comply with the new licensing rules: Are these datasets fine for migration? I guess this depends on the answer of question 1 above: If they're not fine, the migration/update commands change database table values without creating a new version of the dataset?

Best,
Philipp

James Myers

unread,
Apr 25, 2022, 7:54:52 AM4/25/22
to dataverse...@googlegroups.com

The automated changes are in the database. They affect all dataset versions and don’t create a new version.

 

Nominally, they do not change the semantics, i.e. if you had a CC0 waiver and additional terms, the auto update gives you a custom license who’s terms are CC0 plus your original terms.

Similarly for the no terms or license case, the auto-update gives you a custom license stating that no license/terms have been specified. You may not like the exact text in the auto-update, which would be a reason to make manual changes, but you don’t really have to worry that the auto-update is changing the licensing on old versions in any meaningful/semantics sense (perhaps lawyers would disagree and be concerned that a blank terms of use field is not the same as text stating that no terms were specified).

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e954df84-6e94-422c-a818-9676f7c90cfen%40googlegroups.com.

Philipp at UiT

unread,
Apr 26, 2022, 3:01:05 AM4/26/22
to Dataverse Users Community
Thanks for clarifying, Jim!
Philipp

Philipp at UiT

unread,
Jun 18, 2022, 7:32:05 AM6/18/22
to Dataverse Users Community
I've been doing some more work on preparing our installation to be ready for multiple license support and found out that in more than half of the CC0 + non-empty terms cases there are no other terms or restrictions displaying in the Terms tab, but there is some text stored in the corresponding fields in the database. As I curated some of these datasets, I'm almost certain that these datasets had some terms specified in addition to CC0 in the initial DRAFT state of the dataset, but that I convinced the depositor that CC0 and the reminder about the Community Norms as well as good scientific practices would be enough to ensure proper citation, and I then removed any other text from the Terms fields before the dataset was published. Still it seems now that the original text contained in these fields is still stored in the database.

I guess that in cases like these, we safely can remove any content in the Terms fields in the database?
Could you provide the SQL update query for such cases? Thanks!

Best, Philipp

James Myers

unread,
Jun 20, 2022, 11:04:15 AM6/20/22
to dataverse...@googlegroups.com

Philipp,

I’m not sure I understand. The terms displayed should be coming from the database so I’m not sure how you would see them in the db and not in the display. The terms are per version, so if you have more than one version you may need to be checking that you’re looking at the same one in the db and UI. The only other thing I can think of is that some of the fields in the termsofuseandaccess are still allowed when using a license (i.e. the ones that still show on demo.dataverse.org when you edit terms and have a license selected.) Those would be visible in the db but wouldn’t get caught by the upgrade scripts in the release notes (they only check for the fields that appear when you select custom terms).

 

The release notes do have queries to find all the dataset versions where there is a conflict and then queries to null all relevant termsofuseandaccess fields for the give termsofuseandaccess id(s) you want. I think those should be all that’s needed, but again, I’m not sure I’m following what you’re seeing.

Philipp at UiT

unread,
Jun 24, 2022, 9:38:35 AM6/24/22
to Dataverse Users Community
Thanks, Jim! Sorry, my bad! I only now realize that the field "Terms of Use" doesn't display unless you click "Edit Terms Requirements" and unselect "No, do not apply CC0".

But I have another follow-up question:
For datasets that are derived from other sources, where to put information about what license/ToU that these sources were reused under? Currently, we do this in the Terms of Use field after we have described the license for the derived dataset. For an example, see https://doi.org/10.18710/VMUP44. In this example, we chose a CC BY-NC 4.0 license.

Once we have activated multiple license support, we won't be able to put this information into the Terms of Use field. Should we put it into the Data Sources field in the Citation Metadata schema instead? I tink at least for CC BY licenses, this information should actually be part of the citation information.

Best, Philipp

James Myers

unread,
Jun 24, 2022, 10:36:17 AM6/24/22
to dataverse...@googlegroups.com

Philipp – that’s an interesting example. Your idea of moving that type of info to the Data sources field seems reasonable to me (as someone with no legal training).

 

In general, I think a main decision point would be whether anything you want to say affects the end user or not. I.e. if in your case the user is only bound by CC-by-nc, then using that license and finding other places to put the info makes sense. If there’s something in the current terms of use that would restrict the user beyond the CC-by-nc terms, it would probably be better to do something like using custom terms to start and saying in the ‘Terms of Use’ field that  ‘this is licensed under CC-by-nc except for …’ (essentially what the default update does). If Custom Terms hides the generally open nature too much, a third option would be to create a custom license(s) (as QDR has done) and use those.

 

For info that is informative but doesn’t affect the license, I think either the Terms fields that remain visible (e.g. like Original Archive, or Data Access Place, if and as appropriate)  or metadata fields like Data Sources could be appropriate. Since the fields in the terms tab are fairly specific/constrained by DDI, more freeform info probably fits better in metadata somewhere.

 

I also see in your example that there’s an interesting twist in that one might be able to use the data commercially if one gets permission from the two sources that restrict that. This seems like a possible alternative license which we can’t easily describe.

 

Hope that helps,

Reply all
Reply to author
Forward
0 new messages