Metadata send by dataverse to DataCite

220 views
Skip to first unread message

Michel Bamouni

unread,
Jul 4, 2019, 5:18:34 AM7/4/19
to Dataverse Users Community
Hi,

We use DataCite as DOI provider in our Dataverse installation.
We had notice that when we publish a dataset on dataverse, dataverse doesn't send all the metadata to DataCite but only those which are mandatory for DataCite.

We want to know if it's planned that Dataverse send all metadata to DataCite in it future versions?

Best regards,

Michel

Péter Király

unread,
Jul 4, 2019, 5:22:02 AM7/4/19
to dataverse...@googlegroups.com
Hi Michael,

It is a good topic, I'd also add a specific issue. I found that the
files were exported as datasets, and thus they "pollute" my ORCID
profile (which were automatically updated).

Best,
Péter

Michel Bamouni <olimi...@gmail.com> ezt írta (időpont: 2019. júl.
4., Cs, 11:18):
> --
> You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
> To post to this group, send email to dataverse...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e8cf0d3a-2ade-43f4-b2b9-1fcdc20c47c7%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Péter Király
software developer
GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal
http://linkedin.com/in/peterkiraly

Philipp at UiT

unread,
Jul 5, 2019, 7:24:29 AM7/5/19
to Dataverse Users Community

I found that the files were exported as datasets

Note that this is due to the issue I have described earlier; see GitHub issue #5086.

Best, Philipp


torsdag 4. juli 2019 11.22.02 UTC+2 skrev Péter Király følgende:
Hi Michael,

It is a good topic, I'd also add a specific issue. I found that the
files were exported as datasets, and thus they "pollute" my ORCID
profile (which were automatically updated).

Best,
Péter

Michel Bamouni <olimi...@gmail.com> ezt írta (időpont: 2019. júl.
4., Cs, 11:18):
>
> Hi,
>
> We use DataCite as DOI provider in our Dataverse installation.
> We had notice that when we publish a dataset on dataverse, dataverse doesn't send all the metadata to DataCite but only those which are mandatory for DataCite.
>
> We want to know if it's planned that Dataverse send all metadata to DataCite in it future versions?
>
> Best regards,
>
> Michel
>
> --
> You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Gautier, Julian

unread,
Jul 5, 2019, 9:35:20 AM7/5/19
to dataverse...@googlegroups.com
Hi Michel,

Thanks for bringing this up, Michel! We plan that Dataverse will send DataCite all dataset metadata that can be mapped to DataCite's metadata fields. I think these are the Dataverse Github repo's most relevant issues: https://github.com/IQSS/dataverse/issues/2917https://github.com/IQSS/dataverse/issues/2778 and https://github.com/IQSS/dataverse/issues/5889. We're particularly interested in 2778, so my current focus is helping the Dataverse community agree on how to send metadata about related journal articles and other research objects, since its very related to the Make Data Count work being done now.

When I look at the DataCite XML exports from DataCite Search (like this one), I see that installations running the latest Dataverse versions, including Data INRA, send more than the required metadata but like you wrote Dataverse could be sending more. Some of the metadata mapping work was already done when Dataverse started exporting OpenAIRE metadata (https://github.com/IQSS/dataverse/issues/4257). And Dataverse forks by our friends at Consorcio Madroño (https://edatos.consorciomadrono.es) and the Qualitative Data Repository (https://data.qdr.syr.edu/dataverse/main) added other fields to their DataCite metadata exports.

But as far as I know most Dataverse installations running the more recent versions of Dataverse actually send DataCite the same amount of metadata upon DOI registration and when publishing new dataset versions. QDR is the exception - it sends related publication metadata. (If datasets were published with DataCite DOIs before Dataverse started sending additional metadata to DataCite, and new versions of those datasets haven't been published since, DataCite might not be sent the additional metadata, like the relationship between datasets and their files that have DOIs, unless additional steps are taken; see https://github.com/IQSS/dataverse/issues/5144. I think you can see this problem by comparing metadata DataCite has from Data INRA for this older dataset versus this newer dataset.)

Peter, it's a great point that ORCID profiles display files of datasets the same way they display the datasets, so there can be lots of noise. If/when Dataverse sends DataCite file metadata that says that it's a file (like resourceType = Datafile or DataDownload), wouldn't ORCID need to do something so that it automatically ignores those things (and you don't have to hide or remove the records from your profile manually)?

--
Julian Gautier
Product Research Specialist, IQSS

Michel Bamouni

unread,
Jul 22, 2019, 8:45:09 AM7/22/19
to Dataverse Users Community
Hi Julian,

If I understand, when a dataset has some files, these files are send to DataCite like relative identifiers of this dataset and that's was false?
I also understand that the recent versions of dataverse send more metadata to DataCite. Can you tell you what version (if you has the version number naturally) of dataverse do it?

best regards,

Michel

Julian Gautier

unread,
Jul 22, 2019, 12:41:18 PM7/22/19
to Dataverse Users Community
Hi Michel,

If I understand, when a dataset has some files, these files are send to DataCite like relative identifiers of this dataset and that's was false?

You're right. When a Dataverse repository registers DataCite DOIs for a file, the file metadata sent to DataCite includes information about what dataset the file is a part of.

File metadata sent to DataCite:
...    
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">doi:10.7910/DVN/G7NNUL</relatedIdentifier>
...

where doi:10.7910/DVN/G7NNUL is the DOI of the dataset. 

This extra metadata was added in Dataverse 4.9.2 (https://github.com/IQSS/dataverse/issues/4782). 

I'm not sure what's false. Could you write more about what you mean?

I also understand that the recent versions of dataverse send more metadata to DataCite. Can you tell you what version (if you has the version number naturally) of dataverse do it?

As far as I can tell, 4.9.2 was the last time that information was added to the metadata that Dataverse sends to DataCite.

Dataverse 4.11 includes a fix to the file metadata that Dataverse sends to DataCite: https://github.com/IQSS/dataverse/issues/5546. So another community member improved how Dataverse formats the file metadata it sends to DataCite.

Dataverse 4.12 includes a fix for how Dataverse sends metadata when a repository is missing information that DataCite requires, like author (https://github.com/IQSS/dataverse/issues/5559).

I think the code that controls what metadata is sent to Datacite is called DOIDataCiteRegisterService.java, and a history of changes made to that code is at https://github.com/IQSS/dataverse/commits/c62af8f4b73337f1be8b61aef9e5d42a0990ab36/src/main/java/edu/harvard/iq/dataverse/DOIDataCiteRegisterService.java

I hope this helps!

Julian

Michel Bamouni

unread,
Jul 23, 2019, 4:57:35 AM7/23/19
to Dataverse Users Community
Hi Julian,

Thanks for the return,

I think that it was false to have files of a dataset appear as related data in DataCite because I think that this block should contain informations about dataset realated publication or dataset related datasets. Related publication and related datasets are in citation metadata block.
Below are the snapshot of theses metadata
Best regards,

Michel

Julian Gautier

unread,
Jul 23, 2019, 11:03:24 AM7/23/19
to dataverse...@googlegroups.com
Ah, I think I get it. DataCite uses its relatedIdentifier property for any kind of relationship between a dataset and something else. The DataCite schema documentation lists 30-something terms for indicating types of relationships. (PDF, page 46).

...
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="DOI" relationType="HasPart">1234/AA123456/ABCDEF</relatedIdentifier>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsSupplementTo" resourceTypeGeneral="Text">10.1000/BB01234</relatedIdentifier>
...

So this XML could come from a repository that uses HasPart to indicate the file's relationship to the dataset. And the repository is using IsSupplementTo to indicate any relationship a dataset has with a research article (with the resourceTypeGeneral="Text").

The Github issue https://github.com/IQSS/dataverse/issues/2778 is about making it easier for Dataverse installations to send to DataCite information about the relationship between datasets and articles (or other "publications").

Issue https://github.com/IQSS/dataverse/issues/5998 is about related datasets. It is its own GitHub issue because we would also use different relationTypes to indicate the relationships between two datasets. (And also because by default, the related dataset field is a single text box, so one proposal is adding the kind of structure you show in your screenshot.)

We want Dataverse admins to be able to choose how their depositors define the relationships between their datasets and other research objects. But the only way to customize the metadata in the citation metadata block is to fork Dataverse. So first we need to figure out how Dataverse admins can customize their metadata without forking Dataverse, which is part of what https://github.com/IQSS/dataverse/issues/6030 is about.

There's some overlap in how DataCite defines the relationTypes, and data repositories that are using them now aren't using them the same ways. So I'm hoping that the Dataverse community can agree on how to use each of the relationTypes, even though I think some Dataverse repositories will want to use only one or a few and some will want to use many. Soon I plan to post in https://github.com/IQSS/dataverse/issues/2778 proposed definitions for relationTypes for related publications.

More generally, the Dataverse crosswalk shows how Dataverse fields are mapped to DataCite properties for the DataCite export (column H) and the OpenAIRE export (column I) (the OpenAIRE export uses the DataCite schema). The Github issue https://github.com/IQSS/dataverse/issues/5889 is about syncing or merging these exports. You can see that in the OpenAIRE export, a lot of mapping as already been done. I think some of it can be improved, and then we can send a lot more metadata to DataCite.

Philipp Conzett

unread,
Nov 3, 2025, 2:12:53 AM (4 days ago) Nov 3
to Dataverse Users Community
We currently upgrading from v6.3 to v.6.6 and again looking into what metadata is sent from Dataverse to DataCite. With v6.4, a relation type sub-field was introduced to the Related Publication metadata field, but from what I see in installations already running on a post-v6.4 version, this information is not passed on to DataCite. See, e.g., this dataset on the Harvard Dataverse: https://doi.org/10.7910/DVN/8DCXQF. I cannot find information on the related publication in the DataCite metadata export [1].

Does anyone have any updated information or thoughts on this?

Thanks!
Philipp

[1] DataCite metadata export for https://doi.org/10.7910/DVN/8DCXQF as of 2025-11-03:
<resource xmlns="http://datacite.org/schema/kernel-4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.5/metadata.xsd">
<identifier identifierType="DOI">10.7910/DVN/8DCXQF</identifier>
<creators>
<creator>
<creatorName nameType="Personal">Krajewski, Andrew</creatorName>
<givenName>Andrew</givenName>
<familyName>Krajewski</familyName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="https://orcid.org">https://orcid.org/0000-0001-5958-546X</nameIdentifier>
<affiliation>The University of Texas at Dallas</affiliation>
</creator>
<creator>
<creatorName nameType="Personal">Pickett, Justin</creatorName>
<givenName>Justin</givenName>
<familyName>Pickett</familyName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="https://orcid.org">https://orcid.org/0000-0002-8519-2659</nameIdentifier>
<affiliation>University at Albany, State University of New York</affiliation>
</creator>
<creator>
<creatorName nameType="Personal">Jacobs, Bruce</creatorName>
<givenName>Bruce</givenName>
<familyName>Jacobs</familyName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="https://orcid.org">https://orcid.org/0000-0002-3356-3382</nameIdentifier>
</creator>
</creators>
<titles>
<title>Replication Data for: How People Choose Between Criminal Opportunities</title>
</titles>
<publisher>Harvard Dataverse</publisher>
<publicationYear>2025</publicationYear>
<subjects>
<subject>Social Sciences</subject>
<subject>criminological theory</subject>
<subject>experiment</subject>
<subject>decision-making</subject>
<subject>deterrence</subject>
<subject>rational choice</subject>
</subjects>
<contributors>
<contributor contributorType="ContactPerson">
<contributorName nameType="Personal">Krajewski, Andrew</contributorName>
<givenName>Andrew</givenName>
<familyName>Krajewski</familyName>
<affiliation>The University of Texas at Dallas</affiliation>
</contributor>
</contributors>
<dates>
<date dateType="Submitted">2025-06-02</date>
<date dateType="Available">2025-10-30</date>
</dates>
<resourceType resourceTypeGeneral="Dataset"/>
<sizes>
<size>4534</size>
<size>512542</size>
</sizes>
<formats>
<format>application/x-stata-syntax</format>
<format>text/tab-separated-values</format>
</formats>
<version>1.0</version>
<rightsList>
<rights rightsURI="info:eu-repo/semantics/openAccess"/>
<rights rightsURI="http://creativecommons.org/publicdomain/zero/1.0" rightsIdentifier="CC0-1.0" rightsIdentifierScheme="SPDX" schemeURI="https://spdx.org/licenses/" xml:lang="en">Creative Commons CC0 1.0 Universal Public Domain Dedication.</rights>
</rightsList>
<descriptions>
<description descriptionType="Abstract">This repository contains the data file and the Stata do file for Krajewski, Pickett, and Jacobs&apos; article &quot;How People Choose Between Criminal Opportunities.&quot; &lt;b&gt; Abstract: &lt;/b&gt; The explanatory power of criminological theories may differ across decision-making stages, because involvement decisions (the choice to become involved in crime) and event decisions (the choice between criminal opportunities) are theoretically distinct. Although our understanding of offender decision-making has advanced greatly in recent years, event decisions remain understudied. Rational choice theory (RCT) indicates that crime benefits, arrest risk, sanction severity, opportunity cost, and payout timeliness should drive event decisions. Other scholarship indicates that the presence of co-offenders and victim type may also matter. To test the causal effects of each of these factors, we conducted a paired-profile conjoint experiment with a national sample (N = 1,023), wherein participants collectively evaluated over 10,000 criminal opportunities. Consistent with RCT, crime benefits, arrest risk, and sanction severity exerted sizeable effects on event decisions. Victim type also mattered, such that participants preferred to target wealthy individuals and large corporations. Other factors (e.g., co-offenders, opportunity cost) had weaker effects. Event decision-making was mostly similar regardless of participants’ self-control or past offending. Our experiment suggests that RCT may be especially useful for explaining event decisions, even if other theories provide a stronger account of involvement decisions.</description>
</descriptions>
</resource>

mireia alcala

unread,
Nov 3, 2025, 3:22:12 AM (4 days ago) Nov 3
to dataverse...@googlegroups.com

Hi Philipp,

In our repository, we are running exactly version 6.4 of Dataverse, and in our case, this functionality is indeed exported to DataCite. For example, in the XML we see:

<relatedIdentifiers> 
<relatedIdentifier relationType="IsReferencedBy" relatedIdentifierType="DOI">10.3390/biomedicines10050958</relatedIdentifier> 
</relatedIdentifiers>

Therefore, the issue must lie in versions released after 6.4.

Best regards,

Mireia


Missatge de Philipp Conzett <uit.p...@gmail.com> del dia dl., 3 de nov. 2025 a les 8:12:
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/790120e8-73a6-4cd7-b00c-02e67955ff3cn%40googlegroups.com.


--

Mireia Alcalá Ponce de León
676012634
ap.m...@gmail.com

Julian Gautier

unread,
Nov 4, 2025, 9:24:22 AM (3 days ago) Nov 4
to Dataverse Users Community
Whoops, I replied directly to Mireia instead of replying in this thread like I wanted to. Posting below for Philipp and anyone else following the conversation. And Mireia replied directly to me to confirm that Dataverse v6.4 is sending related publication information to DataCite as expected.

My reply:

Hey Philipp and Mireia,

For the dataset at https://doi.org/10.7910/DVN/8DCXQF, the related publication info isn't sent to DataCite because the depositor didn't include an identifier.

But for the dataset at https://doi.org/10.7910/DVN/FR9JMK, the "Related Publication" field does includes an identifier, so that info is sent to DataCite.

I haven't heard or seen any bugs with how this works in post v6.4 versions of Dataverse.

Julian Gautier (he/him)
Product Research Specialist, IQSS
Interested in helping test Dataverse? Sign up for user experience research

Philipp Conzett

unread,
Nov 6, 2025, 5:25:19 AM (yesterday) Nov 6
to Dataverse Users Community
Hi Mireia and Julian,

Thanks for clarifying! I only checked the information in the metadata export: 

Is there a way in Dataverse to check what is sent to DataCite, other than checking the metadata record at DataCite?

Best,
Philipp

Julian Gautier

unread,
Nov 6, 2025, 11:37:19 AM (21 hours ago) Nov 6
to Dataverse Users Community
Ah, there's no way in the Dataverse application, like in a browser or using the Dataverse API, for us to see which metadata can be sent to DataCite and how. There's been some discussion in threads in our Google Group and in GitHub issues about making the Dataverse API capable of providing that sort of info, like maybe an endpoint we could use to see what metadata is exported by a particular Dataverse installation and how it's exported.

The Dataverse Guide mentions a metadata crosswalk that we keep in a Google Sheet. Does using that help?

It's meant to show which metadata the latest version of Dataverse includes in the exports that ship with Dataverse, like the DataCite export in your link (https://dataverse.harvard.edu/api/datasets/export?exporter=Datacite&persistentId=doi:10.7910/DVN/FR9JMK), which should also reflect what Dataverse repositories that register DOIs send to DataCite.

I just updated the crosswalk a bit to try to make it clearer that Dataverse sends to DataCite what users enter in the Related Publication field only when they include something in the Identifier and Identifier Type fields.

qqm...@hotmail.com

unread,
Nov 6, 2025, 11:57:54 AM (21 hours ago) Nov 6
to Dataverse Users Community
FWIW: The content of the DataCite exporter is what is sent to DataCite. They should only be different if the export or the content at DataCite is out of date due to some error.

You can query for what's at DataCite via the PIDs API: https://guides.dataverse.org/en/latest/api/native-api.html#get-info-on-a-pid . It doesn't do a comparison though.

If the FeatureFlag ONLY_UPDATE_DATACITE_WHEN_NEEDED is set, this will only update DataCite if what's at DataCite doesn't match what is generated locally at Dataverse. 

-- Jim
Reply all
Reply to author
Forward
0 new messages