File DOIs

Durand, Gustavo

unread,

Sep 28, 2017, 2:09:48 PM9/28/17

to dataverse...@googlegroups.com

Hi all,

In the past year, the Dataverse team has been involved with several other groups in preparing guidelines for data citations. A preprint of the paper, "A Data Citation Roadmap for Scholarly Data Repositories", is available at:

* https://www.biorxiv.org/content/early/2016/12/28/097196

Dataverse already adheres to most of the required guidelines, and in order to better support guideline #2, "Persistent identifiers for datasets must support multiple levels of granularity, where appropriate", we are working on adding individual DOIs (or other persistent identifiers) for files. You can follow along with the progress (and add feedback) at:

https://github.com/IQSS/dataverse/issues/2438

Please let us know if you think this would be a good point for discussion for a future community call.

Thanks,

Gustavo

Durand, Gustavo

unread,

Dec 15, 2017, 5:32:57 PM12/15/17

to dataverse...@googlegroups.com

Hi all,

We have a followup question / discussion for the community on this and would appreciate any feedback. As a reminder the issue in github is:

https://github.com/IQSS/dataverse/issues/2438

As we work to complete the functionality for file DOIs, we are trying to decide what exactly persistent identifiers for files will look like. In particular, this discussion is about the local part of the identifier, so for a doi like:

http://dx.doi.org/10.5072/FK2/BXOJPJ

we are referring to what comes after the shoulder, "FK2/", i.e the bolded text..

There have already been some discussions about this in the issue and whether file DOIs should be completely arbitrary or be generated using the dataset DOI identifier as part of it (for human readability / usability; internally the DOI would be stored in the same manner, i.e. the system would not infer any meaning). There have been good arguments for both, so rather than limit dataverse to one option, we plan on supporting an ability to configure it.

We currently support a configuration option for DOIs (used by datasets) of whether they are:

randomString, e.g. BXOJPJ, GKSTMU, MESSI1
sequentialNumber, e.g. 10001, 10002, 10003

Our plan is to add another configuration option for file DOI's on whether they are dependent or independent. Using this and the existing setting, you would have 4 cases for files:

dependent randomString, e.g. BXOJPJ/RHBISG
dependent sequentialNumber, e.g. 10001/1
independent randomString, e.g. RHBISG
independent sequentialNumber, e.g. 10002

So some questions are:

we know of use cases for the first 3 (Harvard, SBGrid, and QDR, respectively), does anyone see a use case for the last choice? Is there any reason not to allow this particular combination? (It has been pointed out that it might be confusing / messy to have a DOI with dataset 10001 and the next being 10010, because the first dataset had some files that used up the identifiers for 10002 - 10009)
are there any other formats that we should be handling that are not supported by the above?
any other thoughts on any of this? :)

Thanks,

Gustavo

Philip Durbin

unread,

Dec 20, 2017, 3:41:59 PM12/20/17

to dataverse...@googlegroups.com

I don't know if this helps or not but Derek and I have been attempting
to document the four possible configuration options for persistent
identifiers for files that we are proposing would be available. The
draft we're working on is here (:IdentifierGenerationStyle and
:DataFilePIDFormat sections):
https://github.com/IQSS/dataverse/blob/895a0aa61b6139881a08139880c4cbed7fbdb61a/doc/sphinx-guides/source/installation/config.rst#id176

If I could fit it all in a single screenshot, I would. :)

Questions and feedback on this email thread are welcome, as always!

Phil

> --
> You received this message because you are subscribed to the Google Groups
> "Dataverse Users Community" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dataverse-commu...@googlegroups.com.
> To post to this group, send email to dataverse...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dataverse-community/CAF2sSeeLu%2B9P9As%2BUeLzdJdNOV4ecKoOSuU76NekzNb-ns90Zw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

--
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Sherry Lake

unread,

Jan 2, 2018, 11:44:53 AM1/2/18

to Dataverse Users Community

Is there a use case for the File DOI to have the dependent sequentialNumber at the end of the randomString?

If the dataset DOI has random string BKOED, then could the DOI file be: BKOED/10001 ? Where the dataset DOI is always the 1st part of the file DOI?