Handling deleted and deaccessioned datasets when using Dataverse API to harvest dataset metadata from Harvard Dataverse

37 views
Skip to first unread message

Julian Gautier

unread,
Jun 16, 2025, 12:24:01 PM6/16/25
to Dataverse Users Community
Hi everyone!

I'm forwarding a request from Alvin Stockdale, who helps manage a dataset catalog that is harvesting metadata from repositories that use Dataverse. Alvin's asking for advice on handling datasets that have been deleted from the source repository.

In later emails Alvin mentioned that they're using Dataverse's API to harvest the metadata, and not OAI-PMH, and we talked about how "deleted" datasets could be ones that are what we call "destroyed" and datasets whose versions have all been deaccessioned.

Could anyone recommend how best to know if a dataset has been deleted "without having to diff our entire catalog each time we grab new datasets and modified datasets"?

---------- Forwarded message ---------
From: Stockdale, Alvin (NIH/NLM)
Date: Wed, Jun 11, 2025 at 8:10 AM
Subject: Harvard Dataverse deletes

Hi Julian,

Things are progressing with NLM’s Dataset Catalog and we’re starting to figure out how we will handle datasets that have been deleted from the source repository...

We are trying to figure out how we will know if a dataset has been deleted without having to diff our entire catalog each time we grab new datasets and modified datasets. Any information you can provide would be greatly appreciated!

Thanks,

Alvin Stockdale
Senior Serials Specialist
Metadata Management Program
National Library of Medicine

Julian Gautier

unread,
Jun 25, 2025, 11:46:44 AM6/25/25
to Dataverse Users Community
I very vaguely remember hearing something about folks working on an Dataverse API endpoint that would let others see what's changed about datasets within a Dataverse installation, or maybe within a collection within an installation, since some time period, like which datasets were added, removed and updated. Or at least I think there was some work done for some sort of API endpoint like this.

But I can't find anything in the Dataverse Guides and in open and closed Dataverse GitHub issues. I might be misremembering, might have misinterpreted what I heard, or might not know enough to search well. Maybe someone else watching this forum knows what I'm talking about, or could knows enough to say that I'm hallucinating?

Also, I've been thinking that another approach might be to reach out directly to other groups that are also using the Dataverse APIs to harvest the metadata from one or more Dataverse installations. I think that the folks who manage the ODISSEI Portal are doing this somehow. Alvin, could you email them at in...@odissei-data.nl to learn more?

And if anyone else knows of other groups that are using the Dataverse APIs to harvest the metadata from one or more Dataverse installations, could you let us know here?
Reply all
Reply to author
Forward
0 new messages