OAI-PMH and Datacite metadata format

174 views
Skip to first unread message

Laura Huisintveld

unread,
Nov 22, 2017, 8:05:59 AM11/22/17
to Dataverse Users Community
Hello everybody,

I was wondering if there are any plans to make harvesting with OAI-PMH possible in Datacite-format in the near future?
At the moment ddi, Dublin Core and JSON are the supported metadata formats for harvesting.

We are now harvesting in oai_dc, but we would like to harvest the Dataverse metadata in Datacite-format.
I saw some work is going on for exporting metadata in datacite format from the user interface: https://github.com/IQSS/dataverse/issues/3697. Will this also have effect the OAI-PMH export possibilities?

Thanks,

Laura

julian...@g.harvard.edu

unread,
Nov 22, 2017, 11:17:01 AM11/22/17
to Dataverse Users Community
Hi Laura,

Could you open a github issue for this? I looked over Dataverse's harvesting-related github issues and it doesn't look like there are plans to make OAI-PMH harvesting possible in Datacite format. I'm not sure how work on being able to export Datacite metadata from the user interface would effect OAI-PMH possibilities. Might make it easier to do code-wise?

In the github issue, could you share why you would like to use Datacite metadata for harvesting?

I think this template shows the Datacite metadata Dataverse produces. So right now, using Datacite for harvesting would mean more information loss than using oai-dc, at least until more metadata is made available in Datacite format (#2917)

Thanks,
Julian

Laura Huisintveld

unread,
Nov 27, 2017, 7:50:53 AM11/27/17
to Dataverse Users Community
Hi Julian,

Thanks for your reaction.

I did not know that the datacite metadata export from Dataverse would contain less information than the dc output.
So in this case it won't be useful for us to switch to harvesting in Datacite.  Should I open a Github-issue anyway, in case more Datacite fields wil be available in the future?

I saw this metadata mapping document (https://docs.google.com/spreadsheets/d/10Luzti7svVTVKTA-px27oq3RxCUM-QbiTkm8iMd5C54/edit#gid=0 ) so I thought the datacite export would already exist according to this mapping.

Main reason for my question is that we saw some harvested records in dc that had mapped the depositor-field to the contributor-field.
We use the harvested records in another application, and in this application the contributor-field is used only for persons that contributed to the actual content of the dataset, and not for depositors.

Kind regards,
Laura




Op woensdag 22 november 2017 17:17:01 UTC+1 schreef julian...@g.harvard.edu:

Mercè Crosas

unread,
Nov 27, 2017, 8:51:57 AM11/27/17
to dataverse...@googlegroups.com
Yes, I would suggest to enter a GitHub issue. DataCite metadata will expand and it's becoming a more appropriate standard for data than DC, so it will be useful to have it as a choice for exporting metadata (and also for importing/depositing datasets at some point).

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/34b48078-40ff-4434-a221-a5dd14a5f92e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Laura Huisintveld

unread,
Nov 29, 2017, 8:25:32 AM11/29/17
to Dataverse Users Community
I made a GitHub issue: https://github.com/IQSS/dataverse/issues/4318.
It is my first, so let me know if I need to correct it :) .

Thanks!


Op maandag 27 november 2017 14:51:57 UTC+1 schreef Merce:
To post to this group, send email to dataverse...@googlegroups.com.

Gautier, Julian

unread,
Nov 30, 2017, 4:32:45 PM11/30/17
to dataverse...@googlegroups.com
I saw this metadata mapping document (https://docs.google.com/spreadsheets/d/10Luzti7svVTVKTA-px27oq3RxCUM-QbiTkm8iMd5C54/edit#gid=0 ) so I thought the datacite export would already exist according to this mapping. 

That the crosswalk documents how metadata is currently mapped to other standards plus is used for planning purposes has caused some confusion before. I've tried shading cells and leaving comments in the doc to show what isn't being mapped in current versions of Dataverse (as I learn about them), but I'm thinking it might be best to maintain a separate document for planning purposes and, for the document you referenced, start using some version control.

Main reason for my question is that we saw some harvested records in dc that had mapped the depositor-field to the contributor-field. 
We use the harvested records in another application, and in this application the contributor-field is used only for persons that contributed to the actual content of the dataset, and not for depositors. 

Thanks for making that github issue. Please let us know if we there's anything we could do to help fix this mapping issue if it needs to be fixed in the short term.

Thanks!
Julian 

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Julian Gautier
Product Research Specialist, IQSS

Janet McDougall - Australian Data Archive

unread,
Dec 5, 2017, 2:39:42 AM12/5/17
to Dataverse Users Community
HI All
I'm wondering why Dataverse 'depositor' field is being mapped to 'contributor' as DC does not have a 'depositor' element and Datacite contributorType does not have a 'depositor' value.  Would it be more accurate to not include depositor in the DC Export at this point? 

Let me know if I've misinterpreted the question:

      Main reason for my question is that we saw some harvested records in dc that had mapped the depositor-field to the contributor-field. 
      We use the harvested records in another application, and in this application the contributor-field is used only for persons that contributed to         the actual content of the dataset, and not for depositors.   

DDI maps 2.1.2.2 othId to DC Contributor which I've highlighted in the spreadsheet -  ADA has been mapping our Nesstar DDI to Dataverse Contributor field.  I have mapped the DDI elements and attributes to the Parent/child fields as written in the TSV metadata block so I can really understand how the metadata is being used.  
It is not complete and I haven't worked on it for quite a while so there may be issues.

I've uploaded a subset of the ADA mapping spreadsheet based on yours https://docs.google.com/spreadsheets/d/1gnDsfzI2TRkonuunhaI_GXFvO4QHFMEI36yTv7E1BlI/edit#gid=1200812887 . 

thanks
Janet
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

julian...@g.harvard.edu

unread,
Dec 5, 2017, 2:43:53 PM12/5/17
to Dataverse Users Community
Hi Janet. I haven't been able to see the crosswalk you're working on (https://docs.google.com/spreadsheets/d/1gnDsfzI2TRkonuunhaI_GXFvO4QHFMEI36yTv7E1BlI/edit#gid=1200812887). (Just requested access.) 

But I think the question is: How much of a contribution we should assume a depositor is making to the dataset. I think depositor is mapped to dc.contributor because we didn't want to assume that the act of depositing wasn't enough of "contribution" in dublin core's very broad sense of the word (and of course no other DC element is appropriate, so the alternative is not including depositor in the DC export). But are you wondering if, since DataCite doesn't have a field to record who deposited the data, and since DDI already maps othID to dc.contributor, this is evidence that the larger community agrees that the act of depositing isn't a large enough contribution to include depositor in dc.contributor? (Datacite does have an "Other" contributorType where they suggest putting things like depositor names.)

Very glad for your question,
Julian

Janet McDougall - Australian Data Archive

unread,
Dec 5, 2017, 5:36:36 PM12/5/17
to Dataverse Users Community
Hi Julian

Thanks for your response.  I  definitely agree 'depositor' is a valuable contribution, but in the case of DC they don't have a depositor field so it becomes problematic to migrate between standards if it's mapped to a different element, as Laura indicated.  Datacite has the capacity to use 'other' for depositor as you've noted, and hopefully they will include a depositor value in contributor type soon.  It would be helpful for us. I guess if you look at the purpose of DC (limited), it was built to describe materials at a high level where as DDI 2.5 was more specific, but it doesn't anticipate the most recent requirements that  are now required - ORCID for example - that you are including in your metadata.

I hope to finish the mapping soon - i have found mapping directly between DDI elements & attribs to dataverse TSV sytax helpful, but it may be more confusing?
J

Laura Huisintveld

unread,
Dec 6, 2017, 3:10:20 AM12/6/17
to Dataverse Users Community
Hi Janet and Julian,

In our case, where  the depositor is
often the research data librarian, it is not necessary to include the depositor in the metadata as contributor. He/she did not contribute to the actual content of the dataset. If the depositor would not be included in the DC export, it would solve our problem, but I can imagine that others might think different about this. (Still, I would like to use Datacite as it is more specific than DC)
Best, Laura

julian...@g.harvard.edu

unread,
Dec 8, 2017, 4:13:51 PM12/8/17
to Dataverse Users Community
Great! Thanks Laura and Janet. Looks like there's agreement so far! I'm thinking of how to identify others whose data might be affected by this change, so that they can weigh in as well.

But since you'd like to use DataCite for harvesting, and changing this mapping wouldn't help, I'm thinking of including the change in this issue about adding more metadata to the oai_dc export.

There are a few issues related to augmenting the DataCite export (and making it OpenAIRE compliant), and we've been thinking about how to organize and work on them.

Looking forward to hearing your thoughts.

Best,
Julian

Laura Huisintveld

unread,
Dec 13, 2017, 5:56:48 AM12/13/17
to Dataverse Users Community
Hi Julian,

Just to make sure I understand: Do you mean adding to this Github issue the proposal to not include the depositor field from Dataverse in the dc:contributor field?
If so, I think that would be a good idea.

Best, Laura


Gautier, Julian

unread,
Dec 13, 2017, 8:23:28 AM12/13/17
to dataverse...@googlegroups.com
Hi Laura,

Yes, exactly. Thank you for following up.

Bestb
Julian


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Pete Meyer

unread,
Dec 14, 2017, 12:57:15 PM12/14/17
to Dataverse Users Community
Hi everyone,


On Wednesday, December 6, 2017 at 3:10:20 AM UTC-5, Laura Huisintveld wrote:
Hi Janet and Julian,

In our case, where  the depositor is
often the research data librarian, it is not necessary to include the depositor in the metadata as contributor. He/she did not contribute to the actual content of the dataset. If the depositor would not be included in the DC export, it would solve our problem, but I can imagine that others might think different about this. (Still, I would like to use Datacite as it is more specific than DC)

You could also make the argument that a depositor has contributed to the publication of the dataset, even if they haven't contributed to the content of the dataset.  Different fields have different standards though; possibly this should be something configurable to allow for that?

Best,
Pete 
 
Best, Laura

julian...@g.harvard.edu

unread,
Dec 14, 2017, 2:06:22 PM12/14/17
to Dataverse Users Community
Thanks Pete. When you say that in some fields a depositor should be listed as a contributor even though she hasn't contributed to the content, would any of the "contributor types" in Dataverse's contributor field describe how else a depositor contributed?

Funder
Sponsor
Hosting Institution
Data Manager
Project Leader
Research Group
Related Person
Editor
Other
Researcher
Supervisor
Work Package Leader
Rights Holder
Project Member
Project Manager
Data Curator
Data Collector

All of the types are mapped in the oa_dc to dc:contributor.

Pete Meyer

unread,
Dec 14, 2017, 2:37:39 PM12/14/17
to Dataverse Users Community
Hi Julian,

From https://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf#%5B%7B%22num%22%3A58%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C0%2C605%2Cnull%5D (which is attempting to be a link to the appendix in the PDF, and may or may not succeed); DataManager or Editor might be closest - although I'm not sure if either is an exact match.  

I may be splitting semantic hairs between dataset and data publication too finely though (e.g. my interpretation would be that HostingInstitution makes sense as a contributor for a data publication because they're hosting it; even though the institution may not have necessarily had anything to do with the dataset itself).

Best,
Pete
Reply all
Reply to author
Forward
0 new messages