OpenAIRE compliancy for Dataverse

231 views
Skip to first unread message

Bollini Andrea

unread,
Mar 2, 2018, 12:10:42 PM3/2/18
to dataverse...@googlegroups.com
Dear all,
I’m glad to announce that our proposal to enhance the interoperability
of several open source platforms has been awarded by OpenAIRE, see
https://www.4science.it/en/2018/02/23/4science-awarded-by-openaire/
In our proposal, we have included the implementation of the Data
Repository Guidelines in Dataverse, more specifically the support for
the datacite schema 4.1, to be ready for the new version of the
guidelines that are expected soon.
We have already found an issue on github where such things are discussed

https://github.com/IQSS/dataverse/issues/4257

and posted a comment. Philip suggest to also share this news through the
mailing list to raise more awareness.

I will be happy to contribute to develop a general solution that works
for all and hopefully can be included by default in a next Dataverse
version so please get in touch if you have any local customization or
plan related to this topic!

Andrea


--
Andrea Bollini
Chief Technology and Innovation Officer

4Science, www.4science.it
office: Via Edoardo D'Onofrio 304, 00155 Roma, Italy
mobile: +39 333 934 1808
skype: a.bollini
linkedin: andreabollini
orcid: 0000-0002-9029-1854

an Itway Group Company
Italy, France, Spain, Portugal, Greece, Turkey, Lebanon, Qatar, U.A.Emirates

--
Questo messaggio e' stato analizzato da Libra ESVA ed e' risultato non infetto.
This message was scanned by Libra ESVA and is believed to be clean.

Mercè Crosas

unread,
Mar 2, 2018, 4:24:42 PM3/2/18
to dataverse...@googlegroups.com
This is great news, Andrea!

We will definitely coordinate with you to include your contributions to the main Dataverse code.

Would you be interested in writing (together with us) a news story or blog on this successful news?

Merce


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/fa0f9621-8e98-5edd-86b9-4fd1dbb4b005%404science.it.
For more options, visit https://groups.google.com/d/optout.

Bollini Andrea

unread,
Mar 9, 2018, 3:14:14 AM3/9/18
to dataverse...@googlegroups.com, Mercè Crosas

Hi Merce,

yes, we will be happy to co-write a post on that. I guess it will be more valuable when we will have some concretes outcome or at the least inprogress work to share.

This is something that we expect in early April, do you agree?

Andrea

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAPAYmDNTeYYyyvdjP6SQOmzn%2Bou%3Dzi0a-Gr9w6ETLQajtTrPHQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Clicca qui per segnalarlo come spam.


-- 
Andrea Bollini
Chief Technology and Innovation Officer

4Science,  www.4science.it
office: Via Edoardo D'Onofrio 304, 00155 Roma, Italy
mobile: +39 333 934 1808
skype: a.bollini
linkedin: andreabollini
orcid: 0000-0002-9029-1854

an Itway Group Company
Italy, France, Spain, Portugal, Greece, Turkey, Lebanon, Qatar, U.A.Emirates

-- 
This message has been checked by Libra ESVA and is believed to be clean.

julian...@g.harvard.edu

unread,
Jun 5, 2018, 3:27:02 PM6/5/18
to Dataverse Users Community
Hi everyone,

I'm hoping we can use this thread to make sure we all agree on the scope of the issue 4275, "As an installation admin, I want my repository to export OpenAIRE-compliant metadata to improve discoverability, reusability of research data", and how it related to other issues.

Scope of 4275

I think I left the title of the issue vague so that we could fill in the details later, but it left some room for misinterpretation! As Juan pointed out, OpenAIRE compliancy means OpenAIRE being able to harvest OpenAIRE-compliant metadata from the repository. While Dataverse exposing OpenAIRE-compliant metadata in the UI, in that dropdown on the metadata tab, is a bonus, the real value is in being able to share metadata about data in the way OpenAIRE is requiring, by using OAI-PMH to harvest OpenAIRE-compliant metadata. Do we all agree?

If so I'll keep in the issue description: "The definition of done for this issue will be a Dataverse admin being able to have OpenAIRE harvest OpenAIRE-compliant metadata from her installation."


What does success for 4275 mean?

When I looked at the PR from our colleagues at 4Science, it seemed that it includes the ability to create OAI-PMH sets with the OpenAIRE-compliant metadata format. So to me the PR fulfills the scope of 4275. To test, can we use the OpenAIRE validator at https://www.openaire.eu/validator/welcome to make sure this works?

Andrea, you wrote that
we have included the implementation of the Data Repository Guidelines in Dataverse, more specifically the support for the datacite schema 4.1, to be ready for the new version of the guidelines that are expected soon.
This sounds like OpenAIRE's metadata requirements, which are based on DataCite 3.1, are going to change to be in line with the changes made in the DataCite 4.1 metadata schema. So the metadata that this PR produces should validate against the DataCite 4.1 schema. Is that right? To test, can we just make sure that the xml validates against the DataCite 4.1 scheme?


Relationship to other issues

Issue 3697: Download DataCite metadata through UI:
- Since the PR includes a metadata dropdown, I think this satisfies this issue, especially if it produces xml that is valid. And I think it's already been tested. (Users are able to download metadata in that format.)

Issue 4318: Let other repositories harvest DataCite metadata from Dataverse repositories:
- Since the PR allows Dataverse admins to create harvesting sets with DataCite metadata, does it satisfy this issue? If the metadata should validate against the DataCite 4.1 schema, I think it does. Would success mean that a non-Dataverse repository can harvest OpenAIRE-compliant metadata from a Dataverse repository? (I think Laura, who opened this issue for DANS - DataverseNL, will be at the Community Meeting next week, perfect time to follow up.)

Here's a crude diagram of how I picture this PR satisfying these three issues: 




Thanks,
Julian
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAPAYmDNTeYYyyvdjP6SQOmzn%2Bou%3Dzi0a-Gr9w6ETLQajtTrPHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Clicca qui per segnalarlo come spam.

Juan Corrales

unread,
Jun 6, 2018, 1:43:27 AM6/6/18
to Dataverse Users Community
Thanks Julian,
  I have though a little over this issues.
  • the real value is in being able to share metadata about data in the way OpenAIRE is requiring, by using OAI-PMH to harvest OpenAIRE-compliant metadata. Do we all agree?. I do.
  • To test, can we use the OpenAIRE validator at https://www.openaire.eu/validator/welcome to make sure this works? OpenAIRE validator works with Datacite schema 3, not with datacite schema 4.1 . I think that there are not a validator for new schema.
  • the metadata that this PR produces should validate against the DataCite 4.1 schema. Is that right? To test, can we just make sure that the xml validates against the DataCite 4.1 scheme? . I have validate the results of my test with jhove, but I do not know if it possible do a general validation. There is another problem new OpenAIRE guidelines are in draft state and guidelines (http://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/latest/application_profile.html) and xsd file (http://schema.datacite.org/meta/kernel-4.1/metadata.xsd) have some different requirements.
  • Since the PR includes a metadata dropdown, I think this satisfies this issue, especially if it produces xml that is valid. And I think it's already been tested. (Users are able to download metadata in that format.) . I Think so
  • Would success mean that a non-Dataverse repository can harvest OpenAIRE-compliant metadata from a Dataverse repository?. It will be true when new metadata guidelines are approved . I do not sure that this guidelines have backward compatibility.
Juan

Mercè Crosas

unread,
Jun 6, 2018, 8:57:24 AM6/6/18
to dataverse...@googlegroups.com
I'm all for supporting datacite metadata schema. There is a simple mapping some of us has worked on from schema.org (which we support) to datacite and other standards (might not be the right version of datacite schema). See:


Could we discuss this at the Dataverse Community Meeting?

Merce


----------
Mercè Crosas, Ph.D., Chief Data Science and Technology Officer, IQSS, Harvard University

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

julian...@g.harvard.edu

unread,
Jun 8, 2018, 1:38:37 PM6/8/18
to Dataverse Users Community
Perfect, thanks Merce. This PR is in line with the mapping in your article. And fortunately it doesn't look like the differences between the DataCite schemas affect that mapping.

Discussing during the Dataverse Community Meeting would be great. Or maybe before? Danny announced a pre-conference meeting at IQSS (in the building across from conference location) and posted an agenda whose topics I think have a good amount of overlap with this OpenAIRE discussion and work. Maybe we could include this discussion during those meetings and host a Zoom call for those interested who won't be in Boston?

julian...@g.harvard.edu

unread,
Jun 18, 2018, 4:33:15 PM6/18/18
to Dataverse Users Community
Thanks, Juan! There wasn't a group discussion about this during the Community Meeting, so I'm hoping we can continue here.
  • To test, can we use the OpenAIRE validator at https://www.openaire.eu/validator/welcome to make sure this works? OpenAIRE validator works with Datacite schema 3, not with datacite schema 4.1. I think that there are not a validator for new schema.

Even though the validator will not work with the DataCite 4.1 schema, will OpenAIRE still be able to harvest from Dataverse?

You've pointed to the draft guidelines for literature repositories at http://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/latest/application_profile.html. But I've been referring to the guidelines for Data Archives here at https://guidelines.openaire.eu/en/latest/data/index.html, but I can't find its latest draft version. I just want to make sure that using the guidelines for the literature repositories is okay. There are some differences.

I've copied from the github issue my four metadata mapping concerns (so I don't confuse myself by discussing in two places):
    • But maybe it's best to try following how closedAccess and restrictedAccess are used by Zenodo (which doesn't use closedAccess as "toll gated"):
      • If any of the files in the dataset are restricted and the option to request access is enabled (people are allowed to request access), the dataset is restrictedAccess
      • If any of the files in the dataset are restricted and the option to request access is disabled, the dataset is closedAccess
 
  • Dataverse's "language" field and OpenAIRE's "language" property should define the language of the resource (i.e. dataset). But here the Dataverse language field is being used to populate the xml language attributes of certain OpenAIRE fields:
    • <title xml:lang="English">Historical Climate Model Output Of Echam5-Wiso From 1871-2011 At T106 Resolution</title>
    • When a Dataverse depositor chooses English, she's saying that the dataset is in English. But this PR uses English to describe the language of the metadata as well, which isn't always true.
    • Could the xml attributes not be used for now?
  • It looks like the NameType attribute is always set to "personal" (as opposed to Organization) if there's a comma in the entry. But there being a comma doesn't guarantee that the entry is a personal name. nameType isn't mandatory. Can it be removed?

  • There might be an issue with the funder property and Dataverse's Grant Information field. I haven't had a chance to explore how the funder information in Dataverse's Grant Information fields are mapped to OpenAIRE.
Juan wrote: Funder information section have not a complete definition in OpenAIRE guidelines (which sections are mandatory): "Grant Agency" is mapped to "funderName" and "Grant Number" to "awardNumber" and xsd is validated.

Unfortunately, Dataverse has two different compound fields for funding information: In the "Grant Information" fields Juan is referring to and in the Contributor field, using Dataverse's contributor type called "Funder". The funder contributor type was used in DataCite 3.1 but removed in the newer DataCite schemas in favor of their funding properties. I'm going to propose in a separate GitHub issue that Dataverse remove the Funder contributor type and move those funder names to the Grant Agency field.

But the last time I checked the PR, when the contributor type is Funder, the xml is invalid (again because DataCite removed "Funder" contributorType). So I propose either:
  • When the Dataverse contributor type is funder, the name should be mapped to Grant Agency or
  • When the Dataverse contributor type is funder, it's not mapped to anything in the DataCite metadata

Thanks!
Julian

Juan Corrales

unread,
Jun 18, 2018, 5:48:08 PM6/18/18
to dataverse...@googlegroups.com
Hi Julian!

El lun., 18 jun. 2018 a las 22:33, <julian...@g.harvard.edu> escribió:
Thanks, Juan! There wasn't a group discussion about this during the Community Meeting, so I'm hoping we can continue here.
  • To test, can we use the OpenAIRE validator at https://www.openaire.eu/validator/welcome to make sure this works? OpenAIRE validator works with Datacite schema 3, not with datacite schema 4.1. I think that there are not a validator for new schema.

Even though the validator will not work with the DataCite 4.1 schema, will OpenAIRE still be able to harvest from Dataverse?

I think that not yet. Andrea,  have you more information?. We can question Perdo Prince otherwise about the new guidelines implementation roadmap.
 

You've pointed to the draft guidelines for literature repositories at http://openaire-guidelines-for-literature-repository-managers.readthedocs.io/en/latest/application_profile.html. But I've been referring to the guidelines for Data Archives here at https://guidelines.openaire.eu/en/latest/data/index.html, but I can't find its latest draft version. I just want to make sure that using the guidelines for the literature repositories is okay. There are some differences.

I am sorry, I have seen the error, but I have not write you yet. I have written to Pedro Prince to question if data guidelines will be updated. He has responded that only schema will be updated, but not the guidelines. He hopes to have new literature guidelines in production this month and data guidelines "a little latter". Data guidelines will apply to Dataverse but not literature guidelines.
 

I've copied from the github issue my four metadata mapping concerns (so I don't confuse myself by discussing in two places):
    • But maybe it's best to try following how closedAccess and restrictedAccess are used by Zenodo (which doesn't use closedAccess as "toll gated"):
      • If any of the files in the dataset are restricted and the option to request access is enabled (people are allowed to request access), the dataset is restrictedAccess
      • If any of the files in the dataset are restricted and the option to request access is disabled, the dataset is closedAccess
 
  • Dataverse's "language" field and OpenAIRE's "language" property should define the language of the resource (i.e. dataset). But here the Dataverse language field is being used to populate the xml language attributes of certain OpenAIRE fields:
    • <title xml:lang="English">Historical Climate Model Output Of Echam5-Wiso From 1871-2011 At T106 Resolution</title>
    • When a Dataverse depositor chooses English, she's saying that the dataset is in English. But this PR uses English to describe the language of the metadata as well, which isn't always true.
    • Could the xml attributes not be used for now?
  • It looks like the NameType attribute is always set to "personal" (as opposed to Organization) if there's a comma in the entry. But there being a comma doesn't guarantee that the entry is a personal name. nameType isn't mandatory. Can it be removed?

  • There might be an issue with the funder property and Dataverse's Grant Information field. I haven't had a chance to explore how the funder information in Dataverse's Grant Information fields are mapped to OpenAIRE.
Juan wrote: Funder information section have not a complete definition in OpenAIRE guidelines (which sections are mandatory): "Grant Agency" is mapped to "funderName" and "Grant Number" to "awardNumber" and xsd is validated.

Unfortunately, Dataverse has two different compound fields for funding information: In the "Grant Information" fields Juan is referring to and in the Contributor field, using Dataverse's contributor type called "Funder". The funder contributor type was used in DataCite 3.1 but removed in the newer DataCite schemas in favor of their funding properties. I'm going to propose in a separate GitHub issue that Dataverse remove the Funder contributor type and move those funder names to the Grant Agency field.

But the last time I checked the PR, when the contributor type is Funder, the xml is invalid (again because DataCite removed "Funder" contributorType). So I propose either:
  • When the Dataverse contributor type is funder, the name should be mapped to Grant Agency or
  • When the Dataverse contributor type is funder, it's not mapped to anything in the DataCite metadata

Good observation. Thanks Julian.
 
Thanks!
Julian

--
You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/OALTzINxkX0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e4affc4d-9ef4-40c4-aaa4-6127b2c47bca%40googlegroups.com.

Philipp at UiT

unread,
Feb 21, 2019, 1:56:41 AM2/21/19
to Dataverse Users Community
Is there any news on the work with openAIRE compliance of Dataverse? We just recently discovered that funder information is not exported (correctly) via OAI-PMH. I guess openAIRE compliance would solve all such metadata issues.

Best, Philipp
Hi Julian!
To unsubscribe from this group and all its topics, send an email to dataverse-community+unsub...@googlegroups.com.

danny...@g.harvard.edu

unread,
Feb 21, 2019, 5:52:55 PM2/21/19
to Dataverse Users Community
Hi Philipp, there hasn't been progress on this as far as I know. The PR would need to be updated from the develop branch and a few minor fixes would need to be addressed. Is there anyone out there in the community interested in taking this completion? :)

Philipp at UiT

unread,
Feb 21, 2019, 11:38:21 PM2/21/19
to Dataverse Users Community
Hi Danny,

Thanks for this heads-up. We are participating in a EU project called SSHOC: An Open Cloud for Social Sciences and Humanities. The task I'm involved in is about establishing a European installation of Dataverse for institutions that do not have the resources to run their own instances. I have raised the issue about Dataverse compliance with openAIRE to the task leader. Hopefully, we can include this in the task deliveries.

Best, Philipp
Reply all
Reply to author
Forward
0 new messages