destroying datasets

238 views
Skip to first unread message

Philip Durbin

unread,
Aug 30, 2017, 1:52:36 PM8/30/17
to dataverse...@googlegroups.com
Hi Erin,

Good question. I'm changing the subject of this email to be about destroying datasets. This feature isn't even documented so it should be used with great care. If a DOI or Handle has been published, we always want there to be at least a "tombstone" that it once existed through the process of deaccessioning[1]. Destroying a dataset really deletes it for real from the database and the file system.

In development, I use the destroy feature all the time because I'm constantly creating test datasets. I just checked with Sonia, a curator for Harvard Dataverse, and she destroys datasets that are test datasets (please use https://demo.dataverse.org instead for testing) or spam. The idea is that the DOI should be freed up to be used for real data. She uses a script with a superuser API token.

I hope this helps,

Phil


On Wed, Aug 30, 2017 at 12:45 PM, Erin MacPherson <Erin.Ma...@dal.ca> wrote:
Hi Phil,
I wanted to ask a related question - the "destroy" API can be used in published datasets on other than "test" data correct? Is this what you would recommend in the occasional case where someone has published sensitive or confidential data and it needs to be removed? Or do you just recommend deaccessioning?  I believe it was mentioned in June at the meeting that we can technically delete it if no one has already downloaded it - otherwise we would de-accession. I just need to clarify the wording for our Terms. 
Thank you,
Erin


On Wednesday, August 30, 2017 at 10:33:32 AM UTC-3, Philip Durbin wrote:
Thanks for calling in, Courtney. Here are the notes I took during yesterday's call at https://docs.google.com/document/d/1GD6eXEpdKQPBkZby_xKmPop5Q9mOuX5ksjQmVKEwhho/edit?usp=sharing

Phil

2017-08-29 Dataverse Community Call

Agenda

* Community Development Efforts
* Community Topics
 
Attendees

* Danny Brooke (IQSS)
* Phil Durbin (IQSS)
* Chris Perry (IQSS)
* Pete Meyer (HMS)
* Courtney Mumma (TDL)

Notes

* Community Development Efforts
   * Pinned topic "Which GitHub issues are being worked on by the Dataverse community?" on Google Group: https://groups.google.com/d/msg/dataverse-community/X2diSWYll0w/ikp1TGcfBgAJ
   * "Dev Efforts by the Dataverse Community" spreadsheet: https://docs.google.com/spreadsheets/d/1pl9U0_CtWQ3oz6ZllvSHeyB0EG1M_vZEC_aZ7hREnhE/edit?usp=sharing
   * Amber from Scholars Portal now has write access to the spreadsheet. Others are welcome to request write access as well.
* Community Topics
   * (Danny) Department of Transportation thread - "US Department of Transportation - Data Repositories Conformant with the DOT Public Access Plan": https://groups.google.com/d/msg/dataverse-community/nImLPepZMvs/nk_p-yR9CAAJ
      *  (Danny) I'll send you Jon's message
   * (Courtney) Testing 4.7.1 with Ryan's team. Will pass along feedback. 9 data liaisons.
   * (Danny) We've merged most of the S3 work for the next release. We're still looking at a performance issue. This is slated for 4.8.
   * (Courtney) Question from Nick about deleting files in a deaccessioned dataset.
      * (Phil) You can use the "destroy" API endpoint with a superuser API token to destroy test data. It isn't documented but here's the code we use in our test suite: https://github.com/IQSS/dataverse/blob/v4.7.1/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java#L551
      * (Phil) There's a related issue: Guides: Add information on deleting datasets, dataverses and destroy command - https://github.com/IQSS/dataverse/issues/2593
   * (Pete) How many users at TDL self-publish vs using the "submit for review" feature.
      * (Courtney) It's up to each of the institutions we host for.
   * (Danny) Is running a fork still necessary for TDL?
      * (Courtney) We just started testing 4.7.1.

On Mon, Aug 28, 2017 at 4:55 PM, <danny...@g.harvard.edu> wrote:
Hi all,

Please join us for the Dataverse Community Call tomorrow (8/29) at Noon EDT! Details:


Thanks!

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/470641e6-c7dc-4ee0-883c-151101c3cab7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/7eddf0fa-8d3b-43ac-8fb7-4f99d8a4f97d%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Erin MacPherson

unread,
Aug 31, 2017, 9:47:21 AM8/31/17
to Dataverse Users Community, philip...@harvard.edu
Thank you Phil this is very helpful!
Have a great day,
Erin
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Nick Lauland

unread,
Sep 15, 2017, 2:31:42 PM9/15/17
to Dataverse Users Community
Hi Phil!

Not sure this is the right place... Here at TDL we've made excellent use of the destroy api, but there are still a couple stubborn test items that are stuck.

The datasets include geo related data and, here's part of the stack trace when doing a destroy:

Caused by: org.postgresql.util.PSQLException: ERROR: update or delete on table "dvobject" violates foreign key constraint "fk_worldmapauth_token_datafile_id" on table "worldmapauth_token"
Detail: Key (id)=(615) is still referenced from table "worldmapauth_token".

My guess is that a while ago when testing geoconnect these items may have gotten registered more than once.

I'm thinking deleting all records related to the dvobjects in question sounds like it'd work, but before edting the database by hand, I really wanted to see if this had come up before.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Sep 15, 2017, 2:37:24 PM9/15/17
to dataverse...@googlegroups.com
It sounds highly related to this issue:

Destroy Dataset: Cannot destroy a dataset that has a mapped shape file - https://github.com/IQSS/dataverse/issues/4093

We talked about that issue on Wednesday during sprint planning. As you can tell, it's a foreign key constraint so I suppose you have my blessing to hack on your database if you're comfortable with that.

I hope this helps,

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Philip Durbin

unread,
Oct 24, 2017, 11:41:44 PM10/24/17
to dataverse...@googlegroups.com
You can destroy datasets like this but the API token must belong to a superuser:

curl -H 'X-Dataverse-key: 2395bbb2-5ac6-40b1-b15d-0308bc0053d2' -X DELETE https://demo.dataverse.org/api/datasets/:persistentId/destroy?persistentId=doi:10.5072/FK2/3NQEGS

Good luck with your launch! Don't forget about getting in touch to get your installation added to the map at dataverse.org: http://guides.dataverse.org/en/4.8/installation/config.html#putting-your-dataverse-installation-on-the-map-at-dataverse-org

On Tue, Oct 24, 2017 at 10:25 PM, Jacky Wong <wongka...@gmail.com> wrote:
Dear Philip, Sonia

We are getting ready to launch our dataverse @ https://researchdata.nie.edu.sg 

However, there are some test dummy datasets which we unfortunately published in our production server. 

We want to "destroy" these test dummy datasets. I see from your note below Sonia has a way to do so. Can you / Sonia share the way to go about doing this please?

Thank you. 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Jacky Wong

unread,
Oct 26, 2017, 2:29:26 AM10/26/17
to Dataverse Users Community
Dear Philip, 

Thank you so much for the exact API command. It is exactly what I needed and works so well !

Yes, we are at the final stages of preparation. Target to launch in Jan 2018. And I will surely add the software to the map at dataverse.org

Thank you so much for all the good work you and team has done! :)
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Oct 26, 2017, 7:34:38 AM10/26/17
to dataverse...@googlegroups.com
Great! I'm glad that worked for you. I also just added that curl command to https://github.com/IQSS/dataverse/issues/2593 so people can find it there as well.

I've been meaning to reorganize the API Guide and make it easier to follow.

Again, good luck on your launch!

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages