adding datasets with existing DOIs?

195 views
Skip to first unread message

Aaron Curtis

unread,
Jun 11, 2020, 9:11:10 PM6/11/20
to Dataverse Users Community
Hello,

My institution would like to use our DataVerse instance to host documents that we've already published and therefore already have DOIs. I couldn't find an easy way to do this in the web UI. How would you recommend doing it? Additionally, we have a large number of papers as pdf that we'd like to add automatically. Is this a use case that Dataverse supports?

Thanks,
Aaron Curtis

Sebastian Karcher

unread,
Jun 12, 2020, 10:03:10 AM6/12/20
to Dataverse Users Community
Hi Aaron,

See the API documentation here: http://guides.dataverse.org/en/latest/api/native-api.html#import-a-dataset-into-a-dataverse on how to import with existing PIDs.

FWIW, when we migrated this wasn't available and we used the procedure described here: https://groups.google.com/d/msg/dataverse-community/j1D8Fy9FJbc/ZiR3jsuCCQAJ

Could you say more about what you mean by  "adding papers automatically?" -- I'm not clear what you're after here (though I'll note that generally speaking, Dataverse is really focused on archiving data, so features that relate to archiving research papers/preprints etc. don't really exist the way the might in a more generalist platform like eprints or dspace.

Sebastian

Aaron Curtis

unread,
Jun 16, 2020, 7:04:27 PM6/16/20
to Dataverse Users Community
Great, thanks, I was able to upload with an existing DOI using the API.

What I really meant by "automatically" is this: ideally, I would give dataverse a folder of pdfs of published papers, and it would fetch / create the metadata it needs (authors, abstract, etc) and add each paper as a dataset. Zotero can do this... but I guess it isn't really a common use case for Dataverse.

If dataverse cannot fetch or extract the metadata for me, is there a way to import metadata to Dataverse from a format like BibTeX or Endnote? Or similarly, can I convert a BibTeX file into a dataverse .json or DDI to use with the API?

Philip Durbin

unread,
Jun 17, 2020, 4:11:39 PM6/17/20
to dataverse...@googlegroups.com
First of all, you're already talking to a Zotero expert (Sebastian), so he might know how Zotero does that PDF trick. I suppose enough metadata is squirreled away in the PDF somewhere. Extracting that metadata isn't a trick Dataverse knows, but you're welcome to create an issue about it at https://github.com/IQSS/dataverse/issues . Or maybe Zotero could be used in a pipeline that ends with deposit into Dataverse? And Sebastian is right. It's fine to put PDFs in Dataverse but it's oriented toward data.

For the converter question, I just attempted to answer it at https://groups.google.com/d/msg/dataverse-community/ujJFzPodTJk/_wqObbh5BQAJ

It would be nice for Dataverse to support additional formats for creating datasets beyond DDI and it's native JSON format. Please feel free to create issues for these too. One per format, please because we like to work in small chunks.

Finally, I don't know if you're attending #dataverse2020 but there are two sessions about flexible metadata on Thursday and Friday if you're interested: https://projects.iq.harvard.edu/dcm2020

I hope this helps,

Phil



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/3d8e3f7e-e499-41ad-a6cb-4000cd46f54eo%40googlegroups.com.


--

Julian Gautier

unread,
Jun 17, 2020, 4:39:06 PM6/17/20
to Dataverse Users Community
Hi Aaron,

Have you considered changing the metadata exports for each article published? One issue with using Dataverse as it is for publishing research objects that aren't data is that the dataset metadata that Dataverse publishes will be indexed in places that are expecting data, like Google Dataset Search, and won't be indexed in places that index articles, like Google Scholar. You could edit some of the metadata exports your installation produces (like the DataCite, Schema.org and Dublin Core exports) so that they indicate that what's being published are articles, and not include the DDI export since it describes data and isn't meant to describe articles. I imagine this would be a fork you'd have to maintain, since I don't think Dataverse has plans to support the publication of research articles.

Aaron Curtis

unread,
Jun 26, 2020, 2:14:38 PM6/26/20
to Dataverse Users Community
Thanks for the replies, all. I ended up writing a python script using pydataverse which:
  • extracts the DOI from the PDF
  • queries doi.org to get the author, title, etc.
  • translates those fields into dataverse native metadata
  • uploads using the API
One point of clarification is that we're looking at using Dataverse internally, without exposing it to the internet, at least at first.

We are exploring use cases including an internal pre-publication server, data hosting for experiments, and reproducibility stuff (I was very interested in the paper by Trisovic et al 2020).

Aaron

Stefan Kasberger

unread,
Jul 14, 2020, 6:59:26 PM7/14/20
to Dataverse Users Community
Hi,

I am the developer of pyDataverse, so great to hear, that you used it.

Do you maybe have the script somewhere online, or can you share it with me? Would be interesting to see (especially the doi.org parsing/mapping), if some parts would be reasonable for a future pyDataverse release.

Regards, Stefan

Aaron Curtis

unread,
Jul 22, 2020, 3:31:51 PM7/22/20
to Dataverse Users Community
Thanks for your work on pyDataverse! It's just a hacky script but I put it up as a gist: https://gist.github.com/foobarbecue/4738be626392855ef92541e479c7d0c8
Reply all
Reply to author
Forward
0 new messages