metadata field for retention period

84 views
Skip to first unread message

Laura Huisintveld

unread,
May 11, 2022, 3:49:14 AM5/11/22
to Dataverse Users Community
Dear all,

In our Dataverse instance (DataverseNL), we have some datasets on medical topics. Due to our national legislation, these medical datasets should be deleted after a defined period. (10 or 15 years). Therefore it would be useful to have a metadata field to store information about the retention period, so the datasets that are reaching the end of the retention period can be easily found. Are there any others who are interested in this metadata field?
Should it be incorporated into an already existing block? (the citation block?)

If there is no interest in the community, we might have to proceed to creating a custom block with this field in it.

Kind regards,
Laura


Dieuwertje Bloemen

unread,
May 12, 2022, 7:27:58 AM5/12/22
to Dataverse Users Community
Hi,

We briefly discussed it with our team (KU Leuven RDR) and are also interested in this field as retention periods are also part of our national legislation. We thought something along the lines of a start retention period date and then a controlled list of possible retention periods to choose from. This would make it easier for our archiving/preservation system to be integrated with our repository and to keep track of them. And for us, it would be an interesting addition to the main citation block.

Kind regards,
Dieuwertje

Barbosa, Sonia

unread,
May 12, 2022, 11:39:16 AM5/12/22
to dataverse...@googlegroups.com
I am totally on board with this and can join any conversations around this need. We even have cases where the terms expire and change, not all directly related to "deleting" the data but more on expired restrictions/ToU, and having a field to help capture this would be great.



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/3757b500-4be8-4542-b3e8-87ab69b0baf1n%40googlegroups.com.


--



Sonia Barbosa
Manager of Data Curation, The Harvard Dataverse Repository
Manager of the Murray Research Archive, IQSS
The Dataverse Project
Data Science
Harvard University

Visit our Harvard Dataverse support website: https://support.dataverse.harvard.edu/
Need to deposit data? Visit http://dataverse.harvard.edu
Harvard Library RDM services: https://hlrdm.library.harvard.edu/network
All Harvard Dataverse Repository inquiries should be sent to:  sup...@dataverse.harvard.edu
All software inquiries should be sent to: sup...@dataverse.org

Interested in sharing sensitive data? Coming soon to Harvard Dataverse: http://datatags.org/
All test Dataverse Collections should be created in our demo environment: https://demo.dataverse.org/


Laura Huisintveld

unread,
Feb 9, 2023, 7:48:51 AM2/9/23
to Dataverse Users Community
Hi all, 
We have created a Github issue with our idea about files with a retention period in Dataverse: 
https://github.com/IQSS/dataverse/issues/9375

Kind regards, Laura
Op donderdag 12 mei 2022 om 17:39:16 UTC+2 schreef sbar...@g.harvard.edu:

paul...@dans.knaw.nl

unread,
Feb 9, 2023, 8:05:56 AM2/9/23
to Dataverse Users Community
Hi all, 
Just to make sure; we suggest something different from adding an extra date metadata field. 
Instead we propose something similar to the embargo, but instead of making the file available after a period, we block it from being downloaded and or viewed. 

Regards, 
Paul

Op woensdag 11 mei 2022 om 09:49:14 UTC+2 schreef Laura Huisintveld:

Dieuwertje Bloemen

unread,
Feb 9, 2023, 9:15:20 AM2/9/23
to Dataverse Users Community
Hi,

I think the idea of adding a retention period would be great. Though for me, it wouldn't be strange to have it as a metadata field on the dataset level.
But if I'm understanding correctly, what you propose is basically the reverse of the file embargo function? Entering a date at which the file will become unavailable and providing the reason why?

Kind regards,
Dieuwertje

paul...@dans.knaw.nl

unread,
Feb 9, 2023, 9:36:31 AM2/9/23
to Dataverse Users Community
Hi Dieuwertje, 

You are right, it is kind of a reverse embargo. 
Because (like embargo) it is automatically changing the availability of the file we suggest to also make the functionality similar. 
Setting it on file level is important for us, we have datasets for which only some files must be 'retended', leaving the dataset in a much more useful state. 
The automation is also important, otherwise you would have to remember to search for the datasets that must be 'retended'. 

Kind regards, 
Paul

Op donderdag 9 februari 2023 om 15:15:20 UTC+1 schreef dieuwertj...@kuleuven.be:

Sebastian Karcher

unread,
Feb 9, 2023, 9:56:06 AM2/9/23
to dataverse...@googlegroups.com
I like this in principle, but would making the file unavailable actually comply with mandates to delete/destroy data, which I understand these are? 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.


--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Laura Huisintveld

unread,
Feb 10, 2023, 7:14:41 AM2/10/23
to Dataverse Users Community
Hi Sebastian,

We do not want to delete/destroy data automatically, therefore we added in our proposal that some kind of action (like the pre- and post publish workflows) should be configured as well. This action could be to send an e-mail to the admin of the dataverse with a notifcition (cal for action), or another automated process.

Kind regards,
Laura

Op donderdag 9 februari 2023 om 15:56:06 UTC+1 schreef sebastiank...@u.northwestern.edu:

Dieuwertje Bloemen

unread,
Feb 15, 2023, 7:41:15 AM2/15/23
to Dataverse Users Community
Hi Laura,

I asked around for what our university's policy on retention periods is, and it seems to be a bit different. Instead of an overall maximum retention period, it appears that there is a minimum retention period. So, not in all cases would all the data be deleted once the mandatory retention period has passed. For example, if further research continues on it that was disclosed in some way in the informed consent form, it could be that the data has to be retained. Though I'm still looking into the precise rules that are in place as it seems to be a bit iffy to figure it out exactly.
I can image this would make the idea of a reversed embargo not very practical for our university's set-up.
So, I think we would first have to explore with the different dataverses around the world what the actual retention policy is, before a feature can be developed. I can imagine that the rules outside the EU could differ even more.

Kind regards,
Dieuwertje

Laura Huisintveld

unread,
Feb 24, 2023, 5:22:04 AM2/24/23
to Dataverse Users Community
Hi Dieuwertje,

Thanks for your input!
If I understand correctly, you would like to store the date that indicates that after this date the minimum retention period has passed? And that no action should happen automatically when that date was passed? And would this minumum retention period apply to a dataset as a whole? I guess you could then also filter on deposit date or publishing date to see which dataset has been in your repository longer than x amount of years? 

Our retention period would apply specifically for files. I guess we could also add a metadata field at file level to add a date, and then use a similar method as Philip mentioned in the GitHub issue (https://github.com/IQSS/dataverse/issues/9375#issuecomment-1427042553) - use a cron job to check these date fields. 

If there are other use cases, I hope to hear about them on this forum! Feedback is still welcome.

Kind regards,
Laura


Op woensdag 15 februari 2023 om 13:41:15 UTC+1 schreef dieuwertj...@kuleuven.be:

Dieuwertje Bloemen

unread,
Feb 27, 2023, 4:50:47 AM2/27/23
to Dataverse Users Community
Hi Laura,

That's the way we would probably use it. Though it could be used however a repository instance wishes. The publication data couldn't be used, I think as a start date of the retention period, because the retention period starts on the moment of data collection or end of the project I believe, and that doesn't necessarily coincide with pulblication date. But I just wanted to give our use case to open up the discussion.
I also think that executing it on a file level would be great, but I'm not sure how realistic it is to execute this with the variable interpretation of what the passing of the retention period actually means/indicates.

Kind regards,
Dieuwertje

Reply all
Reply to author
Forward
0 new messages