OAI harvesting from Zenodo failures

83 views
Skip to first unread message

Valentina Pasquale

unread,
Nov 16, 2021, 9:55:23 AM11/16/21
to Dataverse Users Community
Hi Dataverse users,

I am trying to harvest metadata from Zenodo to Dataverse, using a harvesting client.
I have some failures and I am now trying to understand what goes wrong.
Settings: metadata format = 'oai_dc', archive type = 'Generic OAI archive'. I don't know if these are the best ones, as I haven't tried anything else so far. I have used the settings once mentioned in another post.

1) <message>Exception processing getRecord(), oaiUrl=https://zenodo.org/oai2d, identifier=oai:zenodo.org:2549479, edu.harvard.iq.dataverse.api.imports.ImportException, Failed to import harvested dataset: class edu.harvard.iq.dataverse.util.json.ControlledVocabularyException (Value 'eng' does not exist in type 'language')</message>

I feel this exception depends on the fact that Zenodo does not control values in the field 'language' and accepts free text. Suggested text is e.g. 'eng'. In fact, if one types 'eng' and has enough time to wait, after a while a drop-down menu appears where one can select "English", but "eng" is also accepted. If the value is selected from the drop-down, then import to Dataverse runs smoothly. So, I think that the only way to avoid this would be to correct metadata in Zenodo before importing into Dataverse (where the 'language' value is controlled). Any other ideas?

2) for some records I got these other two errors:
Error calling GetRecord - GetRecord request failed. HTTP error code 502
Error calling GetRecord - GetRecord request failed. HTTP error code 504
Do you think those could depend on time-out issues? (it seems Zenodo is very slow in replying...)

3) when setting the harvesting client, some times the list of available sets is completely empty, some other times it contains only part of the OAI sets (i.e. Zenodo communities) and in this case a warning is displayed, saying that not all sets have been retrieved due to time-out problems. Do you think that these issues could be solved anyway? e.g. by changing any setting?

Thanks for the help!

Best wishes,

Valentina Pasquale valentina...@iit.it
IIT Dataverse (Istituto Italiano di Tecnologia)

 






danny...@g.harvard.edu

unread,
Nov 23, 2021, 9:53:03 AM11/23/21
to Dataverse Users Community
Hi Valentina, thanks for the questions! I think you're probably correct on 1), and the other two issues will need some more investigation, perhaps working with the Zenodo team. Are you still experiencing all of these issues, or have any been resolved? 

Can you create an issue in Github for 1)? We could possible add some code for better handling, or at least a better error message. 

- Danny

Valentina Pasquale

unread,
Dec 10, 2021, 9:20:11 AM12/10/21
to dataverse...@googlegroups.com
Hi Danny, hi Philip,

Many thanks for your continued support.
It looks like I cannot reproduce any of those errors for the moment, unless we fix https://github.com/IQSS/dataverse/issues/8290 and https://github.com/IQSS/dataverse/issues/8289
I think problem 1) could be solved in Zenodo (though annoying): I do not think Dataverse should accept other values than the ones in the associated controlled vocabulary. 
For problem 2), I cannot really tell whether this depended on issues on the Zenodo server side, so I would need to do more tests (that I cannot do now). Problem 3) is the one already reported in GitHub issue #8289.

All best,

Valentina



--
You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/BawEJFN1zZ4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/63d93440-280a-4578-99c4-ddc9f32b4333n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages