Deaccessioning a dataset

Dario Basset

unread,

Dec 13, 2023, 9:38:35 AM12/13/23

to Dataverse Users Community

We still have a problem with a deaccessioned dataset.

We already had a similar problem, as described here: (https://groups.google.com/g/dataverse-community/c/rTMkdCfYzg8/m/1NzV40DQAwAJ)

We solved that problem upgrading the version.

But now, the problem is still there, we cannot list the dataset in the API, because the dataverse platform won't return any answer when we issue API of the following form:

https://dataverse.unimi.it/api/search?q=*&type=dataset&key=<key>

while the dataset is here: (https://dataverse.unimi.it/dataset.xhtml?persistentId=doi:10.13130/RD_UNIMI/KI5KBY). The dataverse platform loops without answering.

At the end, can someone clear to us the following questions:

In general, what is the use of deaccessioning a dataset and how long maximum should we keep deaccessioned?
Our deaccessioned dataset causes that we cannot list all the datasets anymore. Is this a known behaviour or a bug?

Thanks a lot

Barbosa, Sonia

unread,

Dec 13, 2023, 12:22:39 PM12/13/23

to dataverse...@googlegroups.com

Hi Dario:

Deaccessioning a dataset serves the purpose of leaving a DOI record for a dataset that was published and used. We don't remove Deaccessioned (legitimate) datasets. Deaccessioning takes place for numerous reasons including:

Moving a dataset to a new repository (the DOI in the Deaccessioning note will point to the new repository and the new repository should have a pointer to the Deaccessioned dataset).

Sometimes someone deposits content they shouldn't and when they replace the content without losing their DOI, they can Deaccession an old version of the data. So you'll end up with Deaccessioned versions instead of Deaccessioning an entire dataset and starting over with a new DOI.

Just a couple of examples.

The devs can chime in on the API issue itself.

Sonia

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/38e24a67-c5f1-42e5-b6e9-779ae3f2c753n%40googlegroups.com.

Philip Durbin

unread,

Dec 13, 2023, 3:31:00 PM12/13/23

to dataverse...@googlegroups.com

Hi Dario,

I believe the behavior you're describing is by design if I'm understanding your scenario correctly. I looked up our design doc for deaccessioning* and we say this:

"The overall goal is to always have a way for users to search for or browse to any dataset that's related to them. If they have deaccessioned a dataset, they (and other people who have been granted access) should still be able to find it! That said, we don't want to overwhelm them with deaccessioned datasets. If they take a deaccessioned dataset and make a new draft or publish it, we won't show a deaccessioned card anymore.

If there are no published versions but there is a deaccessioned version, index the deaccessioned dataset version with the same permissions/discoverability as drafts (i.e. hidden from the public). Show a "Deaccessioned" label on the dataset card but don't show a facet."

Of course, it's certainly possible the behavior of the app has changed since we wrote that design doc years ago!

We're actually working on a deaccessioning issue now so I'm suggesting we do some testing around the Search API while we're in there: https://github.com/IQSS/dataverse/issues/10164#issuecomment-1854151124

When you mentioned the key, whose key is that? Does it belong to the author of the deaccessioned dataset?

Also, can you find the dataset in Solr if you look there directly? I usually use something like this to get a dump of all the documents in Solr:

curl -s 'http://localhost:8983/solr/collection1/select?rows=1000000&wt=json&indent=true&q=*%3A*'

Finally, it sounds like we should better document the expected behavior. Please feel free to open a GitHub issue about this!

Thanks,

Phil

* deaccessioning design doc: https://docs.google.com/document/d/1lFcwjtdGIqqQLYYnwtTI2GZNzmzYlrnMk2omGcXO9oY/edit?usp=sharing

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CAKAtYx0cg3pgxeAocLgj9E7VGxH54tS5U3uB8mGdBUed%2BVFSEg%40mail.gmail.com.

--

Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Sherry Lake

unread,

Dec 14, 2023, 11:05:28 AM12/14/23

to dataverse...@googlegroups.com

Hello Dario,

Deaccessioning is important as Sonia said, for landing pages for datasets with published DOIs, called "tombstone pages" (if it's the only version of a previously published dataset).

I have two datasets that were originally on Harvard's dataverse repo that I "moved" to our local dataverse repository (after our repository went live). Here is one of them and how it looks describing the reason why and pointing to its new location:

https://doi.org/10.7910/DVN/29213

That dataset is "findable" (in the results that show after a minute?) in an API search when I use my personal Harvard API token (<myKey>) in the following command:

https://dataverse.harvard.edu/api/search?q=authorName:Lake&type=dataset&per_page=100&key=<myKey>

--

Sherry

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/CABbxx8FhzxkGAtq2A-EcT55M5M%3DtrEakynFpbSmEe7kvjp6eow%40mail.gmail.com.

Dario Basset

unread,

Dec 18, 2023, 2:05:04 AM12/18/23

to Dataverse Users Community

Thank you Sherry, Sonia and Philip.

Now we have a more concrete idea of a de-accessioned dataset use.

What I figured out in our case is the following: since the author cannot not discard a dataset after its publication, the authors seems to de-accession it in order to get rid of it.

And that is not the use for the de-accessioning operation.

@Philip:

- the key I mentioned is the API key. I tried with my "superuser" key, but the error is still there.

- unfortunately I do not have access to the SOLR databse, the system is managed out of our premises.

Thank you again to all.

Philip Durbin

unread,

Dec 18, 2023, 10:27:16 AM12/18/23

to dataverse...@googlegroups.com

It is possible to "destroy" a dataset if you are a superuser. Because Harvard Dataverse allows self-publishing, we sometimes destroy datasets that are spam: https://guides.dataverse.org/en/6.1/api/native-api.html#delete-published-dataset

That said, for non-spam, it sounds like deaccessioning is the best option. That way, there is a tombstone page with an explanation.

To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/f703ba2a-f321-4b54-ba6f-6303319f6101n%40googlegroups.com.

Reply all

Reply to author

Forward