Metadata send by dataverse to DataCite

141 views
Skip to first unread message

Michel Bamouni

unread,
Jul 4, 2019, 5:18:34 AM7/4/19
to Dataverse Users Community
Hi,

We use DataCite as DOI provider in our Dataverse installation.
We had notice that when we publish a dataset on dataverse, dataverse doesn't send all the metadata to DataCite but only those which are mandatory for DataCite.

We want to know if it's planned that Dataverse send all metadata to DataCite in it future versions?

Best regards,

Michel

Péter Király

unread,
Jul 4, 2019, 5:22:02 AM7/4/19
to dataverse...@googlegroups.com
Hi Michael,

It is a good topic, I'd also add a specific issue. I found that the
files were exported as datasets, and thus they "pollute" my ORCID
profile (which were automatically updated).

Best,
Péter

Michel Bamouni <olimi...@gmail.com> ezt írta (időpont: 2019. júl.
4., Cs, 11:18):
> --
> You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
> To post to this group, send email to dataverse...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e8cf0d3a-2ade-43f4-b2b9-1fcdc20c47c7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Péter Király
software developer
GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal
http://linkedin.com/in/peterkiraly

Philipp at UiT

unread,
Jul 5, 2019, 7:24:29 AM7/5/19
to Dataverse Users Community

I found that the files were exported as datasets

Note that this is due to the issue I have described earlier; see GitHub issue #5086.

Best, Philipp


torsdag 4. juli 2019 11.22.02 UTC+2 skrev Péter Király følgende:
Hi Michael,

It is a good topic, I'd also add a specific issue. I found that the
files were exported as datasets, and thus they "pollute" my ORCID
profile (which were automatically updated).

Best,
Péter

Michel Bamouni <olimi...@gmail.com> ezt írta (időpont: 2019. júl.
4., Cs, 11:18):
>
> Hi,
>
> We use DataCite as DOI provider in our Dataverse installation.
> We had notice that when we publish a dataset on dataverse, dataverse doesn't send all the metadata to DataCite but only those which are mandatory for DataCite.
>
> We want to know if it's planned that Dataverse send all metadata to DataCite in it future versions?
>
> Best regards,
>
> Michel
>
> --
> You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Gautier, Julian

unread,
Jul 5, 2019, 9:35:20 AM7/5/19
to dataverse...@googlegroups.com
Hi Michel,

Thanks for bringing this up, Michel! We plan that Dataverse will send DataCite all dataset metadata that can be mapped to DataCite's metadata fields. I think these are the Dataverse Github repo's most relevant issues: https://github.com/IQSS/dataverse/issues/2917https://github.com/IQSS/dataverse/issues/2778 and https://github.com/IQSS/dataverse/issues/5889. We're particularly interested in 2778, so my current focus is helping the Dataverse community agree on how to send metadata about related journal articles and other research objects, since its very related to the Make Data Count work being done now.

When I look at the DataCite XML exports from DataCite Search (like this one), I see that installations running the latest Dataverse versions, including Data INRA, send more than the required metadata but like you wrote Dataverse could be sending more. Some of the metadata mapping work was already done when Dataverse started exporting OpenAIRE metadata (https://github.com/IQSS/dataverse/issues/4257). And Dataverse forks by our friends at Consorcio Madroño (https://edatos.consorciomadrono.es) and the Qualitative Data Repository (https://data.qdr.syr.edu/dataverse/main) added other fields to their DataCite metadata exports.

But as far as I know most Dataverse installations running the more recent versions of Dataverse actually send DataCite the same amount of metadata upon DOI registration and when publishing new dataset versions. QDR is the exception - it sends related publication metadata. (If datasets were published with DataCite DOIs before Dataverse started sending additional metadata to DataCite, and new versions of those datasets haven't been published since, DataCite might not be sent the additional metadata, like the relationship between datasets and their files that have DOIs, unless additional steps are taken; see https://github.com/IQSS/dataverse/issues/5144. I think you can see this problem by comparing metadata DataCite has from Data INRA for this older dataset versus this newer dataset.)

Peter, it's a great point that ORCID profiles display files of datasets the same way they display the datasets, so there can be lots of noise. If/when Dataverse sends DataCite file metadata that says that it's a file (like resourceType = Datafile or DataDownload), wouldn't ORCID need to do something so that it automatically ignores those things (and you don't have to hide or remove the records from your profile manually)?

--
Julian Gautier
Product Research Specialist, IQSS

Michel Bamouni

unread,
Jul 22, 2019, 8:45:09 AM7/22/19
to Dataverse Users Community
Hi Julian,

If I understand, when a dataset has some files, these files are send to DataCite like relative identifiers of this dataset and that's was false?
I also understand that the recent versions of dataverse send more metadata to DataCite. Can you tell you what version (if you has the version number naturally) of dataverse do it?

best regards,

Michel

Julian Gautier

unread,
Jul 22, 2019, 12:41:18 PM7/22/19
to Dataverse Users Community
Hi Michel,

If I understand, when a dataset has some files, these files are send to DataCite like relative identifiers of this dataset and that's was false?

You're right. When a Dataverse repository registers DataCite DOIs for a file, the file metadata sent to DataCite includes information about what dataset the file is a part of.

File metadata sent to DataCite:
...    
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">doi:10.7910/DVN/G7NNUL</relatedIdentifier>
...

where doi:10.7910/DVN/G7NNUL is the DOI of the dataset. 

This extra metadata was added in Dataverse 4.9.2 (https://github.com/IQSS/dataverse/issues/4782). 

I'm not sure what's false. Could you write more about what you mean?

I also understand that the recent versions of dataverse send more metadata to DataCite. Can you tell you what version (if you has the version number naturally) of dataverse do it?

As far as I can tell, 4.9.2 was the last time that information was added to the metadata that Dataverse sends to DataCite.

Dataverse 4.11 includes a fix to the file metadata that Dataverse sends to DataCite: https://github.com/IQSS/dataverse/issues/5546. So another community member improved how Dataverse formats the file metadata it sends to DataCite.

Dataverse 4.12 includes a fix for how Dataverse sends metadata when a repository is missing information that DataCite requires, like author (https://github.com/IQSS/dataverse/issues/5559).

I think the code that controls what metadata is sent to Datacite is called DOIDataCiteRegisterService.java, and a history of changes made to that code is at https://github.com/IQSS/dataverse/commits/c62af8f4b73337f1be8b61aef9e5d42a0990ab36/src/main/java/edu/harvard/iq/dataverse/DOIDataCiteRegisterService.java

I hope this helps!

Julian

Michel Bamouni

unread,
Jul 23, 2019, 4:57:35 AM7/23/19
to Dataverse Users Community
Hi Julian,

Thanks for the return,

I think that it was false to have files of a dataset appear as related data in DataCite because I think that this block should contain informations about dataset realated publication or dataset related datasets. Related publication and related datasets are in citation metadata block.
Below are the snapshot of theses metadata
Best regards,

Michel

Julian Gautier

unread,
Jul 23, 2019, 11:03:24 AM7/23/19
to dataverse...@googlegroups.com
Ah, I think I get it. DataCite uses its relatedIdentifier property for any kind of relationship between a dataset and something else. The DataCite schema documentation lists 30-something terms for indicating types of relationships. (PDF, page 46).

...
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="DOI" relationType="HasPart">1234/AA123456/ABCDEF</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsSupplementTo" resourceTypeGeneral="Text">10.1000/BB01234</relatedIdentifier>
...

So this XML could come from a repository that uses HasPart to indicate the file's relationship to the dataset. And the repository is using IsSupplementTo to indicate any relationship a dataset has with a research article (with the resourceTypeGeneral="Text").

The Github issue https://github.com/IQSS/dataverse/issues/2778 is about making it easier for Dataverse installations to send to DataCite information about the relationship between datasets and articles (or other "publications").

Issue https://github.com/IQSS/dataverse/issues/5998 is about related datasets. It is its own GitHub issue because we would also use different relationTypes to indicate the relationships between two datasets. (And also because by default, the related dataset field is a single text box, so one proposal is adding the kind of structure you show in your screenshot.)

We want Dataverse admins to be able to choose how their depositors define the relationships between their datasets and other research objects. But the only way to customize the metadata in the citation metadata block is to fork Dataverse. So first we need to figure out how Dataverse admins can customize their metadata without forking Dataverse, which is part of what https://github.com/IQSS/dataverse/issues/6030 is about.

There's some overlap in how DataCite defines the relationTypes, and data repositories that are using them now aren't using them the same ways. So I'm hoping that the Dataverse community can agree on how to use each of the relationTypes, even though I think some Dataverse repositories will want to use only one or a few and some will want to use many. Soon I plan to post in https://github.com/IQSS/dataverse/issues/2778 proposed definitions for relationTypes for related publications.

More generally, the Dataverse crosswalk shows how Dataverse fields are mapped to DataCite properties for the DataCite export (column H) and the OpenAIRE export (column I) (the OpenAIRE export uses the DataCite schema). The Github issue https://github.com/IQSS/dataverse/issues/5889 is about syncing or merging these exports. You can see that in the OpenAIRE export, a lot of mapping as already been done. I think some of it can be improved, and then we can send a lot more metadata to DataCite.
Reply all
Reply to author
Forward
0 new messages