Feb 6 community call

183 views
Skip to first unread message

Philip Durbin

unread,
Feb 1, 2024, 2:12:39 PMFeb 1
to dataverse...@googlegroups.com
The Feb 6 community call will feature a presentation by Jorrit Poelen on how Preston, a biodiversity dataset tracker, was recently integrated with Dataverse.

In addition, we'll very likely talk about a patch we released for Dataverse 6.1, the upcoming community meeting in Mexico, and whatever topics you bring!

You can find information on how to join at https://dataverse.org/community-calls

Hope you can make it!

Phil

Philip Durbin

unread,
Feb 5, 2024, 4:04:24 PMFeb 5
to dataverse...@googlegroups.com
Just a quick reminder about the community call tomorrow. See you there! https://dataverse.org/community-calls

Philip Durbin

unread,
Feb 6, 2024, 12:05:30 PMFeb 6
to dataverse...@googlegroups.com
Hi all,

Wow, very interesting community call today. Some philosophy and not just tech. Thank you to Jorrit Poelen for presenting "A Dataverse Beyond The Internet" and to all of you for the stimulating discussion afterwards. It's now on https://dataverse.org/dataversetv and here are the direct links:


I wasn't very good about keeping time so we didn't discuss Mexico or patches for 6.1. For now, links are in the notes: https://docs.google.com/document/d/1t0eY4mh2f2aH6yhnzfyXF9J05yUgr8A5aMDIMyuae80/edit?usp=sharing

I also neglected to mention that we won't have a call in March because of the community meeting in Mexico.

If you have an idea for April, please get in touch!

Thanks,

Phil


Jorrit Poelen

unread,
Feb 6, 2024, 5:13:02 PMFeb 6
to Dataverse Users Community
Hey Phil and friends,

Thanks again for having me at the DataVerse Community call today.

In an effort to make our exchange citable, I've packaged and signed the various slide format versions (e.g., pptx, pdf, md, html) and talk recording (mp4) in:

> Poelen, J. H. (2024, February 6). A DataVerse Beyond the Internet hash://md5/e34b50213fc407892d0810dabd742b1f. Zenodo. https://doi.org/10.5281/zenodo.10626561

Please feel free to contact me if you'd like to continue the conversation, like to collaborate, have questions etc,

-jorrit

PS Also, I noticed that our "Harvard Kitty" with recommended citation of:
 
Joshua Carp, 2014, “cat.jpg”, CarpTest, https://doi.org/10.7910/DVN/24358/N4FCVS, Harvard Dataverse, V1

appears to no longer live on the dataverse: the DOI no longer produces a landing page but a 404 not found (see below) . . . Luckily it lives on beyond on GitHub, Zenodo, etc.  and can retrieve it via the signed citation: 

Joshua Carp, 2014, “cat.jpg”, CarpTest, https://doi.org/10.7910/DVN/24358/N4FCVS, Harvard Dataverse, V1 hash://md5/7d62417b5b689ed91dcd25f10c9c2132


I wonder why the cat got ejected from the DataVerse? Perhaps the kitty misbehaved?
 
$ curl --silent -IL https://doi.org/10.7910/DVN/24358/N4FCVS | tail -n8
HTTP/2 404
date: Tue, 06 Feb 2024 21:21:43 GMT
content-type: application/xhtml+xml;charset=UTF-8
set-cookie: AWSALB=tG97jRfUaEkkDTHKZqyMLWbQN1Js7iUSjlppef1qwLVFm4FiIaj36h8BRs4hgnq4BNnaajPeutTxo8lQWddRNqzz7t16Bg0KlfHwrqC266iJ1rce2MRxEGPXuSo9; Expires=Tue, 13 Feb 2024 21:21:43 GMT; Path=/
set-cookie: AWSALBCORS=tG97jRfUaEkkDTHKZqyMLWbQN1Js7iUSjlppef1qwLVFm4FiIaj36h8BRs4hgnq4BNnaajPeutTxo8lQWddRNqzz7t16Bg0KlfHwrqC266iJ1rce2MRxEGPXuSo9; Expires=Tue, 13 Feb 2024 21:21:43 GMT; Path=/; SameSite=None; Secure
server: Apache
set-cookie: JSESSIONID=04c5855be33793403560d563dac6; Path=/; Secure

 

Julian Gautier

unread,
Feb 6, 2024, 5:18:25 PMFeb 6
to Dataverse Users Community
It was a misplaced kitty lol.

Specifically I deleted it since it was a dataset someone had published while testing Dataverse features years ago. Should have been deleted sooner 😬 We were having a laugh about it in the Zoom chat during the call.

In any case, thanks for encouraging a thought provoking community discussion!

Julian Gautier (he/him)
Product Research Specialist, IQSS
Interested in helping test Dataverse? Sign up for usability testing

Jorrit Poelen

unread,
Feb 7, 2024, 11:07:01 AMFeb 7
to Dataverse Users Community
Thanks for clarifying Julian. I now understand that the kitty is missing. And, . . . I now have citation in a published presentation containing a DOI that no longer resolves. While this supports some of the claims I made in the talk, I was wondering: Is there any way to replace the 404 with a more informative message like - "Hi! There used to be a kitty here, but they got ejected/retracted by Julian because of ...". ? 

thx,
-jorrit

Julian Gautier

unread,
Feb 8, 2024, 9:29:21 AMFeb 8
to Dataverse Users Community
I'm not sure if there's a way to replace the 404 with something that says that the dataset was removed and why. We had an interesting conversation about this with folks from DataCite a few months ago and I think we, namely the repository's curation manager, clarified that it's Harvard Dataverse policy to leave no trace of deposits that don't meet the requirements for deposit, as opposed to leaving a page like I think you're describing, which is what happens when we "deaccession" a dataset, like at https://doi.org/10.7910/DVN/MO1MXQ.

I'll ask the repository's curation manager.

You wrote that this supports some of the claims you made in your talk. Could you write more about this?

Thanks,
Julian

Sebastian Karcher

unread,
Feb 8, 2024, 9:50:53 AMFeb 8
to dataverse...@googlegroups.com
I think you could have an internal curation rule to not delete datasets that have been up for longer than x days. Keeping tombstone pages for obvious spam that's quickly removed is silly, so I'm generally in favor of allowing blanket deletions in a self-publish repository -- dataverse kitty is a borderline case, and the fact that kitty had been available for so long would arguably have favored keeping a lasting record (though presumably hash search would still break :P) .
Sebastian

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/b19505dc-7c28-4855-a8ec-e5f9c3fd6a46n%40googlegroups.com.


--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Philip Durbin

unread,
Feb 8, 2024, 10:58:44 AMFeb 8
to dataverse...@googlegroups.com
Hmm, I dunno. Yes, Dataverse Kitty was up for nearly a decade but it was very clearly a test dataset in a test collection. I know the guy who created it. He was working on the OSF/Dataverse integration at the time. (I'm not even sure if we had a demo server back then.) We obviously missed it, but I'm glad we did a little early spring cleaning. :)

Sonia B.

unread,
Feb 8, 2024, 3:03:00 PMFeb 8
to Dataverse Users Community
Hi Everyone,
I had a conversation with Kelly Stathis from Datacite about DOIs and deleted content not long ago and we agreed that 404 is quite useless when it comes to tracking content that is deleted but worthwhile to continue tracking. We've had valid datasets that were destroyed as required by the legal agreement, and the DOI was used in publications and if users returned to the HDV today, they'd find no tombstone record available. This "Kitty"  dataset could very well have been treated the same way. Spam will always be deleted and no record is required, as we all agree. I can't see any world where best practices for deleting valid content should ever result in a 404. 

Jorrit Poelen

unread,
Feb 12, 2024, 10:09:25 AMFeb 12
to Dataverse Users Community
Hi Data-nauts, Dataversians, (How do you call folks inhabiting the DataVerse?)

Julian asked:

> You wrote that this supports some of the claims you made in your talk. Could you write more about this?

In my published slides and recorded talk of the 6 Feb 2024 dataverse community call:

Poelen, J. H. (2024, February 6). A DataVerse Beyond the Internet hash://md5/e34b50213fc407892d0810dabd742b1f. Zenodo. https://doi.org/10.5281/zenodo.10626561


> How do you cite data?
> How do you look up cited data now?
> How do you look up cited data 40 years from now?

and proceeded to take the Harvard Kitty citation as suggested by Harvard Data Verse (HDV):

> Joshua Carp, 2014, “cat.jpg”, CarpTest, https://doi.org/10.7910/DVN/24358/N4FCVS, Harvard Dataverse, V1

And less than a week later (not 40/50 years later), the  (aspirationally) "Persistent Identifier" (aPID) doi:10.7910/DVN/24358/N4FCVS  minted by the HDV no longer resolves (see attached screenshot) as if the kitty never existed. 

redirected to
which caused a 404

I know that this a sample size of N=1, but it does support my claim made later in the presentation (also see https://jhpoelen.nl/dataverse-talk-2024-02-06/#/how-to-retrieve-this-cat-picture-50-years-from-now):

> How To Retrieve This Cat Picture 50 Years From Now?
>
> Joshua Carp, 2014, “cat.jpg”, CarpTest, https://doi.org/10.7910/DVN/24358/N4FCVS, Harvard Dataverse, V1
>
> Likely will not work due to intricate network of dependencies.

Also, note that the signed citation (as proposed in my presentation):

Joshua Carp, 2014, “cat.jpg”, CarpTest, https://doi.org/10.7910/DVN/24358/N4FCVS, Harvard Dataverse, V1 hash://md5/7d62417b5b689ed91dcd25f10c9c2132

Allows for retrieving the cat picture via their digital fingerprint hash://md5/7d62417b5b689ed91dcd25f10c9c2132 :


> preston cat --remote https://linker.bio,https://dataverse.org hash://md5/7d62417b5b689ed91dcd25f10c9c2132

while leaving open other known, or as of yet unknown, methods to retrieve published digital data via their signature.

I hope this message helps to support that the case of the lost Harvard Kitty provides evidence to support my claim that our current way of citing (and resolving) digital datasets may need a little work beyond including aPIDs to help carry our digital knowledge into the future. 

Curious to hear your thoughts,

-jorrit

PS. I've attached a copy of the Harvard Kitty just to have another place to be able to retrieve the cute 4.5MB cat picture.
harvard-kitty.jpg

Julian Gautier

unread,
Feb 12, 2024, 11:31:58 AMFeb 12
to Dataverse Users Community
Thanks jorrit!

Sounds like this also hinges on how repositories define in their collection development policies what "data" and "valid content" are.

Is it possible to create a deaccession "tombstone" page in Harvard Dataverse for this "Dataverse Kitty" deposit that I deleted? I'm not sure how much effort it would take, which of course would help us determine if it's worth the effort.

Jorrit Poelen

unread,
Mar 7, 2024, 10:39:53 AMMar 7
to Dataverse Users Community
Hey y'all Dataversians, 

Related to our Feb 6 community call [1], a recent Nature magazine news article [2] mentioned that: 

"[...] A study identified more than two million articles that did not appear in a major digital archive, despite having an active DOI. [...]"

This recent study appears to get some high profile coverage (even appeared in slashdot [3]), and is consistent with statement I made in my 6 Feb 2024 talk at the DataVerse community call : blind trust in that DOIs will continue to resolve to their original content is likely not enough to ensure long term access to digital knowledge we keep and reference. 

I've proposed a method to have a DataVerse item be preserved independent of DOI resolve mechanisms, and I am still interested to hear about your long term strategy to preserve today's DataVerse for future use. 

-jorrit

PS Interestingly, the reference study [4] was published in Journal of Librarianship and Scholarly Communication. And it so happens that they included the md5 signature of the pdf in their html landing page for the article.
pdf-md5-example.png

References

[1] Poelen, J. H. (2024, February 6). A DataVerse Beyond the Internet hash://md5/e34b50213fc407892d0810dabd742b1f. Zenodo. https://doi.org/10.5281/zenodo.10626561

[2] Hart, S. 2024. Millions of research papers at risk of disappearing from the Internet. Nature. https://doi.org/10.1038/d41586-024-00616-5


[4] Eve, M. P., (2024) “Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles”, Journal of Librarianship and Scholarly Communication 12(1). doi: https://doi.org/10.31274/jlsc.16288
Reply all
Reply to author
Forward
0 new messages