Deleting test data

30 views
Skip to first unread message

Péter Király

unread,
Nov 15, 2018, 12:05:54 PM11/15/18
to dataverse...@googlegroups.com
Dear Dataverse Community,

so far we tested Dataverse in Göttingen, and within weeks we would
like to launch it as a more visible and usable service (still as a
beta service). During the preparation period we had lots of testers
who uploaded and sometimes published unwanted materials just to help
us to test the system. Some of the users in this period has real data
published along academic papers. We would like to clear the database,
and delete the test data, but keep the important ones.

The documentation suggests that we should drop the whole database and
recreate a fresh one, but apparently with this step we would loose the
important materials as well. Once a data were published, we could only
do the deaccession action, which - I guess keeps the data.

The third way (and that's why I've tried to understand the database
schema) would be manipulate Dataverse with SQL commands, delete the
unwanted files from the storage, and then reindex what is left. I hope
that we are not the first who try to do something like that.
Do you happen to have any script for this task?

Best,
Péter

--
Péter Király
software developer
GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal
http://linkedin.com/in/peterkiraly

danny...@g.harvard.edu

unread,
Nov 15, 2018, 12:29:27 PM11/15/18
to Dataverse Users Community
Hi Péter,

Great to hear about the upcoming launch. There is a destroy command that's not well documented. It should help you in regards to the test content. Phil wrote about it here:

Philip Durbin

unread,
Nov 15, 2018, 1:39:26 PM11/15/18
to dataverse...@googlegroups.com
Hi Péter,

Your story reminds me a lot of when we were pushing beta releases to https://demo.dataverse.org as we were leading up to the release of Dataverse 4.0. Realistic looking datasets were being added by a couple of people here and we didn't want them to have to manually re-enter all that metadata after a database drop. That's when we (but I can't take any credit for it) invented the "native JSON (Dataverse-specific)" format mentioned at http://guides.dataverse.org/en/4.9.4/admin/metadataexport.html and http://guides.dataverse.org/en/4.9.4/user/dataset-management.html#supported-metadata so that we could serialize the datasets we wanted to preserve to JSON format, drop the database, create the datasets again using the API at http://guides.dataverse.org/en/4.9.4/api/native-api.html#create-a-dataset-in-a-dataverse (which was also invented at this time), and then re-upload the files. I believe this only works well when there is a single version of a dataset but in theory it might work to download all the version history of a dataset and add each version one by one.

I imagine your situation happens a lot. You're nearing launch. You're telling your users to try out the system. There's a mix of stuff you want to keep and stuff you want to delete. (Or "destroy", as Danny indicated. See also https://github.com/IQSS/dataverse/issues/2593 ). Other folks in the community who have gone through this might have some advice.

I hope this helps,

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d0f7ef1a-679f-43a6-a06f-95cbe4d4315b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Péter Király

unread,
Nov 15, 2018, 2:58:54 PM11/15/18
to dataverse...@googlegroups.com
Dear Danny and Phil,

thanks for your answer. I'll try these methods tomorrow. The
destroy/delete API's example is about dataset. Does it work for
dataverse as well?

Best,
Péter
Philip Durbin <philip...@harvard.edu> ezt írta (időpont: 2018.
nov. 15., Cs, 19:39):
> To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8Fp8qb9sC%2BDkbRAz%2Bq6%2BZJOLy_YaMZNuOGT74pCtc2ftQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



--

Philip Durbin

unread,
Nov 15, 2018, 3:08:31 PM11/15/18
to dataverse...@googlegroups.com
As long as dataverses are empty (i.e. you have deleted or destroyed the datasets first), you can delete them, even if they are published, even from the GUI.


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages