[dataverse-client-python] Duplicating a dataset : problems with adding fields to or updating metadata of a dataset

70 views
Skip to first unread message

Romain MOUGIN

unread,
Sep 14, 2015, 11:39:01 AM9/14/15
to Dataverse Users Community
Hi there !

I'm a new user of dataverse and I'm trying to make a new function with the aim of duplicating a dataset (only the metadatas, of course). In order to do this, I'm using the dataverse python client, but I got some problems with it. To me, I got 2 solutions for my project:
- getting the metadatas and use the "update_metadata" function directly, after having modified some fields outside the metadataBlocks
- getting the metadata json, parsing it and use the "create_dataset" function of the python api to create a new dataset with all the fields recovered

For both idea, I got a problem. The first one doesn't work for two reasons: the "dataset.updatemetadata()"  function doesn't work, even with the metadatas of the same dataset. I tried to update the metadatas of a published dataset with his own dataset (the one the function "dataset.get_metadata()" gave to me), and this doesn't work. That's why the "dataset.create_draft()" function doesn't seem to work too, because it does the same thing, and ... no draft version appear :(
The second one doesn't work too for me. I can create a new dataset with basic infos, like multiple author, for example, but I can't specify specifics infos. For example, I can give a list of authors, but I didn't find anyway to specify author affiliations or Identfiers. I can give at best a list of values for author, but I can't give a dictionnary with more specifics values, for example. I even tried to use the utils.add_field() directly in order to add fields to the dataset. Maybe I'm not using the right keys. Are they different from the Json generated ? The strange thing here is that "utils.add_field()" is the one used to create the dataset from "dataverse.create_dataset()" no ?

I think for both my problems, the problem would be "me", so I'd like to know if some of you have work with the dataverse python client, and if they succeed in adding specific keys and values to a dataset, and if it would be possible to see a functionnal example of this.
All my tests were made on the dataverse-demo (my dataverse is here: https://dataverse-demo.iq.harvard.edu/dataverse/grimdataverse), with the API-token given to me. Is there any rights restrictions on this website ? That could explain some things, maybe ?

If you need some more infos to help me, like my codes, datas example, etc, I would be glad to give you this :)

Ps: Sorry for my terrible english here, I'm just a french dude trying to figure out what he has done wrong here :(

Philip Durbin

unread,
Sep 14, 2015, 1:56:29 PM9/14/15
to dataverse...@googlegroups.com
Hi Romain!

Thank you for your interest in Dataverse and for trying out both the API and the Python library!

It's quite possible that you are seeing the bug at https://github.com/IQSS/dataverse/issues/2441 but to test this can you please try https://beta.dataverse.org which has a fix for that issue that's coming in Dataverse 4.2? You'll need to sign up for a new account and generate a new API token.

If you are *still* seeing the problem on the "beta" site, can you please create an issue under https://github.com/IQSS/dataverse-client-python and show some of your code (but not your API token, of course)? If nothing else, we can at least make sure the documentation is clear and that there's a test for the problem you're reporting.

Thanks!

Phil

p.s. If it helps, I have a shell script that I think does something similar to what you're talking about. It downloads the JSON for a dataset, makes a small change to the dataset title in the JSON file, and updates the dataset using the updated JSON file: https://github.com/IQSS/dataverse/blob/v4.1/scripts/search/tests/edit-dataset-finch1


--------------------------------------------------------------------------
Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter les conditions d'usage.
Pour les consulter rendez-vous sur :

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e5cfc650-e2aa-4968-abc1-4a4026ce52cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Romain MOUGIN

unread,
Sep 15, 2015, 3:58:16 AM9/15/15
to Dataverse Users Community, philip...@harvard.edu
Thank you for your help ! I'm gonna try the beta website and see if it works on it or not.

I'll be back soon.

Romain MOUGIN

unread,
Sep 15, 2015, 4:56:42 AM9/15/15
to Dataverse Users Community, philip...@harvard.edu
Ok, now I can properly update a dataset metadatas, that's great ! I'm just having a strange problem : I modified the metadatas of a dataset (the title field, with value "Dataset1"). It did a draft version of the dataset (name "changedTitle"). I deleted this draft version ("changedTitle"). Now, I can't found the dataset named "Dataset1" with his title "Dataset1". I can only found a dataset named "changedTitle" but ... in my dataverse, I only see one dataset named "Dataset1" (you can see it here : https://beta.dataverse.org/dataverse/grimdataverse). When I list all the datasets of my dataverse with the python lib, I only got one dataset called "changedTitle" in it. What magic trick happened here ? 0o

Philip Durbin

unread,
Sep 15, 2015, 8:30:34 AM9/15/15
to dataverse...@googlegroups.com
I just made myself a "superuser" on the beta site so I can see draft versions of your dataset, and it looks like the draft with a title of "changedTitle" was never deleted. I can still see it (screenshot attached).

So maybe deleting the draft didn't work? Did you see an error?

Please feel free to join in at http://chat.dataverse.org to chat about this if you want. It's very strange if you weren't able to delete a draft.

Phil

On Tue, Sep 15, 2015 at 4:56 AM, Romain MOUGIN <romain...@sciencespo.fr> wrote:
Ok, now I can properly update a dataset metadatas, that's great ! I'm just having a strange problem : I modified the metadatas of a dataset (the title field, with value "Dataset1"). It did a draft version of the dataset (name "changedTitle"). I deleted this draft version ("changedTitle"). Now, I can't found the dataset named "Dataset1" with his title "Dataset1". I can only found a dataset named "changedTitle" but ... in my dataverse, I only see one dataset named "Dataset1" (you can see it here : https://beta.dataverse.org/dataverse/grimdataverse). When I list all the datasets of my dataverse with the python lib, I only got one dataset called "changedTitle" in it. What magic trick happened here ? 0o
--------------------------------------------------------------------------
Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter les conditions d'usage.
Pour les consulter rendez-vous sur :

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
changedTitle_-_Grim_Dataverse_(testing_purpose)_Dataverse_-_2015-09-15_08.25.39.png

Romain MOUGIN

unread,
Sep 15, 2015, 11:08:45 AM9/15/15
to Dataverse Users Community, philip...@harvard.edu
I just realised there is indeed a "draft version" in the versions resume of the dataset "Dataset1". The problem is, I had deleted it, and after that it disappeared of my dataverse, cause I couldn't see it anymore in my datasets list. But, in the "Dataset1" versions resume, it sill exists, and I can access it so ... when I deleted the draft version from the dataverse's datasets list, the draft version became ... hidden ? I understand it like that, cause the draft finally still exists and is accessible via the versions resume of the main dataset. I had no "visible" error when I deleted it, and it was done manually through the website, not my scripts.

I just made a test by creating manually a dataset, publish it, then create a draft version and finally delete it. I can no longer access the deleted draft version. Same thing with a dataset created with my scripts, same steps, same result. It seems that what I did with my python code at the beginning broke something, and I can't say how and why right now, because I can't reproduce the error.

I will try to reproduce it, that's really strange. I won't delete my dataset "Dataset1", cause we can clearly see the problem with it, in case you want to investigate further on this.

Anyway, thanks again for your help, and I will try to come on your chat when I can. I'm working on multiple projects at my job, and Dataverse is at the beginning only. I will be lurking on your chat next week I think, when I'll be at full time on this. See ya !

Philip Durbin

unread,
Sep 15, 2015, 1:00:24 PM9/15/15
to dataverse...@googlegroups.com
We would be very interested in any code you can give to reproduce the bug. https://beta.dataverse.org is a bit of a moving target, however. You created your "Dataset1" when it was running v. 4.2 build 39-e866445 and now it's running build 62-d8de9ea.

We also just reindexed that installation so the "changedTitle" draft "card" on the homepage is back. Again, this is because the draft was never deleted from the database (which is the bug, really). After reindexing, Solr is back in sync with the database.

Please feel free to try to reproduce bugs on https://apitest.dataverse.org as well, which runs the current released version (Dataverse 4.1). If you want, you could try adding a test to https://github.com/IQSS/dataverse-client-python/blob/master/dataverse/test/test_dataverse.py ! :)

Phil

--------------------------------------------------------------------------
Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter les conditions d'usage.
Pour les consulter rendez-vous sur :

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Romain MOUGIN

unread,
Sep 22, 2015, 8:42:23 AM9/22/15
to Dataverse Users Community, philip...@harvard.edu
Hi there !

I'm back to my dataverse project, so I'm back here again. I haven't reproduced the last time bug (the draft version deleted but not "deleted" thing), so I don't have any news for you about that :S

On the other side, I succeeded in duplicating datasets and parsing the metadata's json in order to change what I want. I now have a question about the metadatas : is it possible, or it will be possible in the future, to "create" custom fields in the datasets metadatas ? Some of my fields could not be available on the actual dataverse version, for example. What if I need new fields ? What can I do for this case ?

Philip Durbin

unread,
Sep 22, 2015, 9:01:41 AM9/22/15
to dataverse...@googlegroups.com
If you look at the spreadsheet linked from http://guides.dataverse.org/en/4.1/user/appendix.html you'll see a number of tabs for "custom" metadata. Is this what you had in mind? Hopefully, you can use one of the existing metadata fields listed there...

- Citation Metadata
- Geospatial Metadata
- Social Science & Humanities Metadata
- Astronomy and Astrophysics Metadata
- Life Sciences Metadata

... but if not, perhaps you could help the Dataverse team define a new metadata block (or improve an existing one), which we ship as TSV files (exported from the spreadsheet mentioned above): https://github.com/IQSS/dataverse/tree/v4.1/scripts/api/data/metadatablocks

Phil



--------------------------------------------------------------------------
Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter les conditions d'usage.
Pour les consulter rendez-vous sur :

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Romain MOUGIN

unread,
Sep 22, 2015, 9:21:38 AM9/22/15
to Dataverse Users Community, philip...@harvard.edu
Exactly what I was "trying" to talk about, thank you :) But, for my datasets, I only have the "citation Metadatas" available. I can't access the Geospatial or Astrophysics sheets, for example. Maybe am I looking at the wrong place ? Or is it not implemented yet ? I haven't tried to update metadatas with a json corresponding to that format, should I proceed like that ?

Philip Durbin

unread,
Sep 22, 2015, 9:30:21 AM9/22/15
to dataverse...@googlegroups.com
I've lost track of which server you're testing with but if you provide a link to the dataverse you created, it would be helpful.

In that dataverse, you should have control over which metadata fields can be used in your dataverse. We document this as "you can update the metadata elements used for datasets within the dataverse, change which metadafields are hidden, required, or optional, and update the facets you would like displayed for browsing the dataverse" at http://guides.dataverse.org/en/4.1/user/dataverse-management.html#general-information but I'll also attach a screenshot of what it looks like when you click "Edit" and then "General Information" for your dataverse.

Phil

On Tue, Sep 22, 2015 at 9:21 AM, Romain MOUGIN <romain...@sciencespo.fr> wrote:
Exactly what I was "trying" to talk about, thank you :) But, for my datasets, I only have the "citation Metadatas" available. I can't access the Geospatial or Astrophysics sheets, for example. Maybe am I looking at the wrong place ? Or is it not implemented yet ? I haven't tried to update metadatas with a json corresponding to that format, should I proceed like that ?
--------------------------------------------------------------------------
Tous les courriers électroniques émis depuis la messagerie de Sciences Po doivent respecter les conditions d'usage.
Pour les consulter rendez-vous sur :

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Finches_Dataverse_-_2015-09-22_09.27.51.png

Romain MOUGIN

unread,
Sep 22, 2015, 9:45:30 AM9/22/15
to Dataverse Users Community, philip...@harvard.edu
Thanks a lot, I was wondering if it was something from dataverse 3 or dataverse 4. My bad, it was on dataverse 4, just wasn't able to find it. Sorry for disturbing for my blindness about that :D

For my tests, I'm working on your beta website (https://beta.dataverse.org/), waiting for the 4.2 release. I will ask to my people if the metadatas models correspond to what they need for their metadatas.

Thanks again and see you soon ;)
Reply all
Reply to author
Forward
0 new messages