Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

DOI versioning

306 views
Skip to first unread message

Philipp at UiT

unread,
Jun 1, 2017, 1:49:58 AM6/1/17
to Dataverse Users Community
I just saw that Zenodo has introduced DOI versioning, cf. https://blogs.openaire.eu/?p=2010.
I this something Dataverse is considering?

Best,
Philipp

Philip Durbin

unread,
Jun 1, 2017, 6:26:16 AM6/1/17
to dataverse...@googlegroups.com
Thanks for the link to the Zenodo article. There was some lively discussion on this topic on the Force11 Software Citation Working Group* mailing list in early April under the thread "Question: Who does DOI versioning well [and doesn't rely on dot suffixing to do it]" but I'm not aware of any plans to change the way Dataverse handles DOIs in the context of versioning.

What do you and others from the community think?

Phil

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/5ef3876b-9e70-4076-8b94-3d46c146ebe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Sebastian Karcher

unread,
Jun 1, 2017, 8:45:47 AM6/1/17
to dataverse...@googlegroups.com
While this is probably more important for software citation, I think having PIDs for individual versions of a dataset is basically a must going forward. How else are you going to have machine-actionable citations. Or, in the word of some obscure working paper on the topic:

Persistent identifiers for datasets must support multiple levels of granularity to support both the
citation of a specific version and/or individual dataset, as well the citation of an unspecified
version of a dataset and/or a collection of primary data.
I also think the Zenodo implementation is exactly right in the choices they made on how to do this (i.e. only force version changes on file changes -- similar to what DV does already -- not use suffixes)

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Mercè Crosas

unread,
Jun 1, 2017, 8:47:56 AM6/1/17
to dataverse...@googlegroups.com
Yes, of course, I agree :)

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAOSYSD7Ebmu5bXHtnVoNziCUW0xcMwTQBt7pH5TmaPwQXH6m9w%40mail.gmail.com.

Steven McEachern

unread,
Jun 1, 2017, 11:22:03 AM6/1/17
to Dataverse Users Community
Hi all,

We've had an ongoing discussion here at the Australian Data Archive (ADA) about this. I agree with Sebastian's points, particularly that the end-game here is machine-actionable citations. The DV DOI as it currently stands is NOT machine-actionable as best I can tell - or am I missing something? Although you can reference a dataset through a combination of the DOI and the version number, I think that is sub-optimal - why not just have a DOI assigned to the specific version.

It's problematic for us in terms of our longitudinal and time series datasets, where each new wave of data collected results in a new release (version) - but right now NOT a new DOI. We will be implementing a policy where each release is established as a new dataset, given Dataverse's current versioning approach. This will allow a user can reference the specific release/version of the data - but I think the better option really would be to increment the DOI with each version.

(There is an additional complication when you think about variable metadata - if I change a label or a column name, should these trigger a new version? And a new DOI?)

ADA staff would be happy to contribute to a more detailed discussion if that would be of interest.

Regards,
Steve

-------------------------------
Dr. Steven McEachern
Director
Australian Data Archive
-------------------------------


On Thursday, June 1, 2017 at 10:47:56 PM UTC+10, Merce wrote:
Yes, of course, I agree :)


On Jun 1, 2017, at 8:45 AM, Sebastian Karcher <kar...@u.northwestern.edu> wrote:

While this is probably more important for software citation, I think having PIDs for individual versions of a dataset is basically a must going forward. How else are you going to have machine-actionable citations. Or, in the word of some obscure working paper on the topic:

Persistent identifiers for datasets must support multiple levels of granularity to support both the
citation of a specific version and/or individual dataset, as well the citation of an unspecified
version of a dataset and/or a collection of primary data.
I also think the Zenodo implementation is exactly right in the choices they made on how to do this (i.e. only force version changes on file changes -- similar to what DV does already -- not use suffixes)
On Thu, Jun 1, 2017 at 6:26 AM, Philip Durbin <philip...@harvard.edu> wrote:
Thanks for the link to the Zenodo article. There was some lively discussion on this topic on the Force11 Software Citation Working Group* mailing list in early April under the thread "Question: Who does DOI versioning well [and doesn't rely on dot suffixing to do it]" but I'm not aware of any plans to change the way Dataverse handles DOIs in the context of versioning.

What do you and others from the community think?

Phil

On Thu, Jun 1, 2017 at 1:49 AM, Philipp at UiT <uit.op...@gmail.com> wrote:
I just saw that Zenodo has introduced DOI versioning, cf. https://blogs.openaire.eu/?p=2010.
I this something Dataverse is considering?

Best,
Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Philip Durbin

unread,
Jun 1, 2017, 1:47:49 PM6/1/17
to dataverse...@googlegroups.com
I like the idea of machine-readable citations but these would be expressed in JSON, XML, or some other standard format, right?

Dataverse already exports machine-readable citations in the following formats when you click the "Cite Dataset" button:

- EndNote XML
- RIS
- BIbTeX

I'll include examples below from http://dx.doi.org/10.7910/DVN/LDJ7MS but I will not that there is no version number in any of these machine-readable citation formats. Here they are:

## EndNote XML

<?xml version="1.0" encoding="UTF-8"?>
<xml>
  <records>
    <record>
      <ref-type name="Online Database">45</ref-type>
      <contributors>
        <authors>
          <author>Bakshy, Eytan</author>
          <author>Messing, Solomon</author>
          <author>Adamic, Lada</author>
        </authors>
      </contributors>
      <titles>
        <title>Replication Data for: Exposure to Ideologically Diverse News and Opinion on Facebook</title>
      </titles>
      <section>2015-05-07</section>
      <dates>
        <year>2015</year>
      </dates>
      <publisher>Harvard Dataverse</publisher>
      <urls>
        <related-urls>
          <url>http://dx.doi.org/10.7910/DVN/LDJ7MS</url>
        </related-urls>
      </urls>
      <electronic-resource-num>doi/10.7910/DVN/LDJ7MS</electronic-resource-num>
    </record>
  </records>
</xml>

## RIS

Provider: Harvard Dataverse
Content: text/plain; charset="us-ascii"
TY  - DBASE
T1  - Replication Data for: Exposure to Ideologically Diverse News and Opinion on Facebook
AU  - Bakshy, Eytan
AU  - Messing, Solomon
AU  - Adamic, Lada
DO  - doi/10.7910/DVN/LDJ7MS
PY  - 2015
UR  - http://dx.doi.org/10.7910/DVN/LDJ7MS
PB  - Harvard Dataverse
ER  -

## BIbTeX

@data{LDJ7MS_2015,
author = {Bakshy, Eytan and Messing, Solomon and Adamic, Lada},
publisher = {Harvard Dataverse},
title = {Replication Data for: Exposure to Ideologically Diverse News and Opinion on Facebook},
year = {2015},
doi = {10.7910/DVN/LDJ7MS},
url = {http://dx.doi.org/10.7910/DVN/LDJ7MS}

I guess I'm wondering if we could stick a version number in each of these as a solution. :)

To be clear, I have no knowledge of any of these citation standards, or if they are even standards, and if there's a place to put a version number.

Thanks,

Phil



To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Durand, Gustavo

unread,
Jun 1, 2017, 2:03:56 PM6/1/17
to dataverse...@googlegroups.com
This definitely seem like a good lunch table topic for the community meeting! Now to just figure out how to clone myself, so I can be at all of them at once. :)

At the Pidapalooza conference last year, I discussed some of the questions* we were thinking about in regards to this during my presentation. I'll include that slide here.

* a new question: what do we do with the current datasets that have only 1 the one doi tied to multiple versions and have been cited as such?

Inline image 1

On Thu, Jun 1, 2017 at 11:22 AM, Steven McEachern <stev...@gmail.com> wrote:
Hi all,

We've had an ongoing discussion here at the Australian Data Archive (ADA) about this. I agree with Sebastian's points, particularly that the end-game here is machine-actionable citations. The DV DOI as it currently stands is NOT machine-actionable as best I can tell - or am I missing something? Although you can reference a dataset through a combination of the DOI and the version number, I think that is sub-optimal - why not just have a DOI assigned to the specific version.

It's problematic for us in terms of our longitudinal and time series datasets, where each new wave of data collected results in a new release (version) - but right now NOT a new DOI. We will be implementing a policy where each release is established as a new dataset, given Dataverse's current versioning approach. This will allow a user can reference the specific release/version of the data - but I think the better option really would be to increment the DOI with each version.

(There is an additional complication when you think about variable metadata - if I change a label or a column name, should these trigger a new version? And a new DOI?)

ADA staff would be happy to contribute to a more detailed discussion if that would be of interest.

Regards,
Steve

-------------------------------
Dr. Steven McEachern
Director
Australian Data Archive
-------------------------------


On Thursday, June 1, 2017 at 10:47:56 PM UTC+10, Merce wrote:
Yes, of course, I agree :)


On Jun 1, 2017, at 8:45 AM, Sebastian Karcher <kar...@u.northwestern.edu> wrote:

While this is probably more important for software citation, I think having PIDs for individual versions of a dataset is basically a must going forward. How else are you going to have machine-actionable citations. Or, in the word of some obscure working paper on the topic:

Persistent identifiers for datasets must support multiple levels of granularity to support both the
citation of a specific version and/or individual dataset, as well the citation of an unspecified
version of a dataset and/or a collection of primary data.
I also think the Zenodo implementation is exactly right in the choices they made on how to do this (i.e. only force version changes on file changes -- similar to what DV does already -- not use suffixes)
On Thu, Jun 1, 2017 at 6:26 AM, Philip Durbin <philip...@harvard.edu> wrote:
Thanks for the link to the Zenodo article. There was some lively discussion on this topic on the Force11 Software Citation Working Group* mailing list in early April under the thread "Question: Who does DOI versioning well [and doesn't rely on dot suffixing to do it]" but I'm not aware of any plans to change the way Dataverse handles DOIs in the context of versioning.

What do you and others from the community think?

Phil

On Thu, Jun 1, 2017 at 1:49 AM, Philipp at UiT <uit.op...@gmail.com> wrote:
I just saw that Zenodo has introduced DOI versioning, cf. https://blogs.openaire.eu/?p=2010.
I this something Dataverse is considering?

Best,
Philipp

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

Sebastian Karcher

unread,
Jun 1, 2017, 3:56:14 PM6/1/17
to dataverse...@googlegroups.com
Hi all,
thanks for engaging this and happy to talk more.
Phil -
while adding version numbers in the metadata export would be good, what I am referring to is the ability to create citations _to_ the dataset that are machine actionable.
Imagine me writing an article in R markdown using data saved on a dataverse. I add a citation to a dataset using the DOI and then execute some R code on the dataset once it's downloaded [1]. For this to work, it's crucial that the DOI leads to _exactly_ the data that I used, say version 1.5.

Gustavo -
For the questions:
- as Zenodo explains convincingly, DOIs should not be semantic, i.e. not include v.1 etc. The linking of versions is in the metadata
- my view is that this should be similar to how DV already handles versioning, i.e. optional for metadata changes, mandatory for file changes
- not sure what you refer to as template identifiers
- for existing datasets, I think the right thing is to treat the current DOI as a generic dataset DOI and then create new ones for the versions.
 
Best,
Sebastian

[1] For now this requires going through the DV API, but I think it's easy to imagine a future version where the availability of the actual data files is advertised in the response header so that this could work without API calls specific to DV.

To post to this group, send email to dataverse-community@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Philip Durbin

unread,
Jun 1, 2017, 7:21:49 PM6/1/17
to dataverse...@googlegroups.com
That makes sense, but isn't your R code going to operate on a specific file? You'd want to operated on the CSV file in your dataset rather than your README. In that case you're targeting a file, which in Dataverse is only known by it's database id.

Here's an R script I wrote a while back that downloads a specific file within a dataset based on the file's database id: https://github.com/IQSS/dataverse/blob/v4.6.1/scripts/issues/2438/download.R

In Dataverse, files are immutable so that database id for a file means the file has a specific checksum that will never change out from under you.

I'm not trying to be contrary. I'm just trying to think through the R Markdown user story.

It sounds like people might be interested in a DOI per dataset version? If so, we need a brave soul to create a GitHub issue. :)

Thanks,

Phil






--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Gautier, Julian

unread,
Jun 2, 2017, 7:19:07 AM6/2/17
to dataverse...@googlegroups.com
Another use case that might be obvious or not, or maybe just a different angle: If I'm a researcher trying to verify findings from an article I read, and look for the data using this citation -

Davey, Rohan, 2017, "Replication Data for: "Title", doi:10.7910/DVN/YP1HS9, Harvard Dataverse, V2

Will I know that that DOI points to the most current dataset version, which might not be version 2, and that I should find version 2 and download that data instead?

Also, the discussion's been mostly about the pros of versioned, non-semantic DOIs. Would one possible con be, as Steven said, that DOIs might be created for versions with changes that are insignificant? Could addressing that problem involve giving depositors better control over, and more info. to decide when, a new version and DOI are created?

Are there other cons?

And finally, if Gustavo clones himself, should each version get a different ORCID?

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Julian Gautier
Product Research Specialist, IQSS
versionsPID.jpg

Philip Durbin

unread,
Jun 2, 2017, 7:55:44 AM6/2/17
to dataverse...@googlegroups.com
Heh. Oh, there's another quote I wanted to share about per-version DOIs at Zenodo from the "software citations" issue at https://github.com/IQSS/social_science_software_toolkit/issues/5 . I hope it's good food for thought. Here it is:

^^^
To kick this thread off, an email @mercecrosas received recently about per version DOIs over at Zenodo:

Dear Julie,

Please don’t feel that you shouldn’t single out Zenodo for things that you think we should do better. We’re happy to take critical feedback!

We’re fully aware that the current method which uses just related identifiers is fully insufficient for doing versioning nicely. That’s also why that we hopefully within the next two weeks with launch a complete revamped versioning system which will:

1) Mint a) concept DOI and b) per version DOI
2) Versioning will be done solely via metadata (concept DOI will link all versions in metadata, and version DOIs will link to their concept DOI - initially we use hasPart/isPartOf, but will move to isVersionOf/hasVersion once DataCite v4.1 is out). All DOIs will follow the usual pattern 10.5281/zenodo.<integer> (i.e. no versioning information in the identifier)
3) Concept DOI resolves to the latest minted version DOI.
4) Records clearly shows a) if it’s not the latest version and b) all available versions of a resource (and you can easily jump between them)
5) Our "Cite as” will use the version specific resource for citation.

I’ve included a couple of screenshots.

For impact measures we’re pretty far behind so we currently have no measures to “roll-up”, but clearly researchers don’t want to dilute their citations just because they use versioning so this is very important in my opinion as well.

Best regards,
Lars
---
Lars Holm Nielsen
CERN, IT Department
Tel: +41 22 76 79182 | Cel: +41 76 672 8927 | Twitter: @larshankat | Skype: larsholm-hankat
^^^

Hope this helps,

Phil




--
Julian Gautier
Product Research Specialist, IQSS

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
zenodo-version.png
versionsPID.jpg

James Turitto

unread,
Mar 7, 2018, 11:59:35 AM3/7/18
to Dataverse Users Community
Hi all,

I'm picking up on this thread from last summer. 

Sebastian mentioned to me last week that this is something the Dataverse is working on/considering. The roadmap indicates there will be a release for file-level PIDs and nothing else so far. I just wanted to check in because we are working out ways to implement persistent identifiers for versions on the AEA Registry. 

Thanks,
James
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Julian Gautier
Product Research Specialist, IQSS

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Mar 7, 2018, 8:56:22 PM3/7/18
to dataverse...@googlegroups.com
Hi James,

I'm not sure if this helps answer your question or not but (as always) files are immutable in Dataverse so a persistent identifier (DOI or Handle) at the file level* will always identify a particular file with a particular MD5. Right now files in Dataverse are identified by their database ID but once https://github.com/IQSS/dataverse/pull/4350 is merged (and we cut a release) files will have persistent identifiers.

If this isn't what you're getting at, please let us know. I just re-read through this thread but it's getting a bit log and hard (for me) to follow. Please do feel free to start a new thread if you feel like it will help explain your point of view and use case.

Thanks,

Phil


To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.