Move datasets between Dataverse instances

580 views
Skip to first unread message

Valentina Pasquale

unread,
Oct 16, 2020, 4:07:08 AM10/16/20
to dataverse...@googlegroups.com
Hi Dataverse users,

I was wondering if someone knows a way (shared script, Dataverse function, etc.) to move datasets (in draft version) between different Dataverse instances? I know this is possible between dataverses in the same instance through the Admin dashboard.
Correlated to that, I was also looking for tools to export draft datasets from Dataverse and move them to other open-access repositories (e.g. Zenodo or disciplinary repositories) for publication.

Thanks for the help!

All best,

Valentina Pasquale

Research Data Management Specialist
valentina...@iit.it

www.iit.it




Philip Durbin

unread,
Oct 19, 2020, 11:46:19 AM10/19/20
to dataverse...@googlegroups.com
Hi Valentina,

Moving datasets from one Dataverse installation to another has come up two other times recently.

In the Sep 8th community call, Kaitlin Newson from Scholars Portal asked about it and some notes are captured here: https://docs.google.com/document/d/125BiVNJ-dmHRef0oOYfNc7vXOgQ0IupqKaeaPNwWkl0/edit?usp=sharing

On Oct 14th Don Sizemore from UNC and I talked about it in IRC: http://irclog.iq.harvard.edu/dataverse/2020-10-14#i_129992

My take on it is that it's straightforward to harvest metadata from one installation of Dataverse to another using OAI-PMH but actually moving datasets is more work. You need to get the metadata over using the import API which accepts either Dataverse's native JSON format or DDI (an XML-based standard): http://guides.dataverse.org/en/5.1.1/api/native-api.html#import-a-dataset-into-a-dataverse

Neither one of these import methods handles versions. (The DDI-based migration from DVN 3 to Dataverse 4 did handle versions.)

Then you'll need to put the files data into place.

Then you'll need to think about the DOI. Do you give the dataset a new DOI? If so, do you put the old DOI in the new dataset under the "Other ID" field? Don't forget to repoint the old DOI to the new one.

There's some cool stuff that's been talked about like BagIt import.

Finally, there's been some great work at DANS and other places toward importing or migrating datasets from non-Datavese systems to Dataverse. I believe that a good starting point is https://github.com/AUSSDA/pyDataverse

I hope this helps!

Phil

p.s. For your question about exporting drafts... I would start by figuring out a common format, a format that Dataverse can export and that the other tool can import. Here's a list of supported export formats: http://guides.dataverse.org/en/5.1.1/user/dataset-management.html#metadata-export-formats

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CANiF6ZU26cK1q-6T0Hs4MABmDcbrvcFpbB_8Ef8DEJ7hSAG1xw%40mail.gmail.com.


--

Valentina Pasquale

unread,
Nov 2, 2020, 10:48:09 AM11/2/20
to dataverse...@googlegroups.com
Hi Philip, hi all,

many thanks.
My question was focused on unpublished datasets, because we would like to set-up an institutional Dataverse repository and make it available to scientists for internal use (e.g. to store and document data already during research). At the same time, we would like to give them the possibility to publish their final datasets on an external open access repository directly from our Dataverse instance, or through a connector that would allow to export datasets from our private Dataverse (thus in draft version) and move them to another repository for publication. The destination repository could be a disciplinary repository or for instance a public Dataverse repository like Harvard Dataverse (by paying a fee if required). We realized this is not a standard workflow, but we think that managing a public installation would be much more demanding for us at this stage than taking advantage of already existing open access repositories for publication. At the same time, we would like to support our scientists for the storage (and preservation) of private datasets.

We were wondering if someone is already testing or using a similar workflow. We think it should be straightforward to integrate two Dataverse instances, instead of integrating Dataverse with repositories based on different platforms (such as Zenodo).

Thank you again for your precious help.

All best,

Valentina





Philip Durbin

unread,
Nov 2, 2020, 11:52:47 AM11/2/20
to dataverse...@googlegroups.com
Hi Valentina,

Yes, I agree that this copying draft datasets (and their files) from one Dataverse server to another Dataverse server should be relatively straightforward as long as you are comfortable with programming.

I would probably do this with pyDataverse but there is also a new client library in Javascript (which I haven't used).

The basic pattern would be something like this:

- Download the draft dataset as native JSON.
- Download all the files from the draft dataset.
- Create a new dataset in the target installation using the native JSON.
- Upload the files to the new dataset.

Now, there are many more details I'm glossing over such as preserving descriptions, tags, etc. for the files (as well as file hierarchy), but this is all possible. You'll also want to ensure that the target installation has the same metadata fields enabled. You might need to include some manual steps, such as setting terms*.

I hope this helps. Please keep the questions coming and I hope others jump in with their experiences.

Thanks,

Phil

* There's an open issue about setting terms via API at https://github.com/IQSS/dataverse/issues/5899


Stefan Kasberger

unread,
Nov 5, 2020, 9:32:10 AM11/5/20
to Dataverse Users Community
Hi Valentina,

as the developer of pyDataverse and someone, who has already migrated a Dataverse instance to another one, here some thoughts for that.

In general: You have to map the downloaded JSON from the source dataset to the JSON format needed for the upload.
In PyDataverse: so far, this functionality is not implemented, but it should make it way easier to to what you want, than to start from scratch (as there is an API module, and a data structure for Dataverses, Datasets and Datafiles). The missing piece, is to map the Download JSON structure to the .set() function (https://pydataverse.readthedocs.io/en/develop/reference.html#pyDataverse.models.DVObject.set). Once it is inside the pyDataverse data structure for a dataset or datafile, you have all the functionality of the module to your hands (then it gets easy). :)
Use the develop branch of pyDataverse for this (for both: usage or development). I will release a new version of pyDataverse in the next weeks, which functionalities already available in there. Also check out the develop documentation at https://pydataverse.readthedocs.io/en/develop/

If you plan to use pyDataverse for this and work out a mapping from Download JSON to Upload JSON, please get in touch with me, so we can share efforts. I will implement this feature the future, but it is not clear right now, when.


Cheerz, Stefan

Night Owl

unread,
Nov 19, 2020, 5:01:51 PM11/19/20
to Dataverse Users Community
Stefan, I am interested in trying pyDataverse to move about 10 datasets between dataverse instances. Would love any help you might be willing to give on its use and the json mapping.

Night owl

Stefan Kasberger

unread,
Dec 7, 2020, 6:01:54 AM12/7/20
to Dataverse Users Community
Hi Night Owl,

please write me an Email to stefan.k...@univie.ac.at, so we can talk about this in more detail. In general: the incoming dictionary must be mapped to the flat structure of pyDataverse.
I am on this feature, so maybe we can work together.

Cheerz

Jamie Jamison

unread,
Aug 25, 2021, 3:05:13 PM8/25/21
to Dataverse Users Community
Just wanted to be sure I understand this.  The Migration/Publish API moves the metadata.  Dataset files have to be moved separately?

Thanks,

Jamie

James Myers

unread,
Aug 25, 2021, 3:43:22 PM8/25/21
to dataverse...@googlegroups.com

Jamie,

 

Yes. The migrate API is really two calls (see https://guides.dataverse.org/en/latest/developers/dataset-migration-api.html) and, in between you can/have to add files and can add any additional metadata (e.g. if you wanted to add something about the fact that they’ve been moved to an existing/custom metadata field.).

 

(That design allows you to use normal or direct uploads for files. Also, since, once the dataset has been created, there isn’t any difference between uploading files to a migrated versus newly created dataset, new migrate API calls weren’t needed for uploading files. FWIW: If you have a Bag created from the original Dataverse, DVUploader can perform all the steps together. If not, you get a zip of the datafiles, once you’ve done the first migrate API call to create the dataset, you should be able to upload the zip (with normal upload) and have Dataverse unzip the files, or unzip the zip file locally and use DVUploader to upload them (pyDataverse as well, but DVUploader supports direct upload if you need that).

 

If you have suggestions for making the guides clearer/more detailed, please send an issue/PR!

 

-- Jim

Ken Mankoff

unread,
Mar 2, 2022, 9:43:38 AM3/2/22
to Dataverse Users Community
Dear Valentina,

I am wondering if you have developed these scripts and are willing to share? I have found the other recent thread on data migration ( https://groups.google.com/g/dataverse-community/c/LJuWjkeb9gE/m/DijRhysqBQAJ ) but not any actual scripts pre-built to do the work.

Thanks,

   -k.

Kaitlin Newson

unread,
Mar 2, 2022, 10:15:24 AM3/2/22
to Dataverse Users Community
At Scholars Portal we developed scripts to migrate data between instances (but they do not use the newer migration APIs). You can see them here and there are some more details in the README: https://github.com/scholarsportal/dataverse-migration-scripts

Valentina Pasquale

unread,
Mar 10, 2022, 5:59:28 AM3/10/22
to dataverse...@googlegroups.com
Dear Ken,
we haven't implemented that functionality yet. At that time I was looking for a temporary solution to move unpublished datasets from our Dataverse instance to other open archives for publication, but we decided in the end to enable publication directly in our Dataverse.
I am sorry I cannot help you.
All best,
Valentina

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

Nikos Askitas

unread,
Oct 18, 2023, 8:02:43 AM10/18/23
to Dataverse Users Community
We are currently developing a python module to do just that. It is a good state at the moment and could be helfpul for others in a relatively short amount of time. Currently it is being developed to serve our own dataverse project but we are thinking about publishing the python module on github if others can benefit from it. If you can help me evaluate the demand we will put more energy in development and shorten release time. Please let me know .


On Friday, October 16, 2020 at 10:07:08 AM UTC+2 Valentina Pasquale wrote:

Philip Durbin

unread,
Oct 18, 2023, 10:02:16 AM10/18/23
to dataverse...@googlegroups.com
Hi Nikos,


Thanks!

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

Nikos Askitas

unread,
Oct 18, 2023, 10:26:33 AM10/18/23
to Dataverse Users Community
thank you Philip. I will personally put more energy into releasing this asap. stay tuned. N.

Nikos Askitas

unread,
Nov 22, 2023, 8:00:03 AM11/22/23
to Dataverse Users Community
We wrote a little python module to do such things. Let me know whether it helps: https://github.com/iza-institute-of-labor-economics/idsc.dataverse

Philip Durbin

unread,
Nov 29, 2023, 3:10:33 PM11/29/23
to dataverse...@googlegroups.com

Nikos Askitas

unread,
Nov 30, 2023, 5:48:42 AM11/30/23
to dataverse...@googlegroups.com
Thanks Philip, we send requests to suitably add it to both lists. N.

Reply all
Reply to author
Forward
0 new messages