Deleting LARGE Dataset - TIming Out

56 views
Skip to first unread message

Sherry Lake

unread,
Feb 28, 2025, 9:01:46 AMFeb 28
to Dataverse Users Community
I have a draft dataset with over 14,000 files.

Our repository (V6.2) is slow to display that dataset and when I try to "delete" from the UI. Doesn't work, or at least I click the "delete dataset" button - some spinning (quite a lot) and then the dataset is still there.

I've tried deleting via the command line and get various timeout messages 503 or 500.
curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE "$SERVER_URL/api/datasets/$ID"

And right after my deletion attempts, the whole server is slow to respond (or responds with "Server Temporarily Unavailable"). It still isn't deleted....

Is there a way in the db to remove it? But then there are the 14,000+ files to remove from S3? And lots of table dependencies.

Any advice?

Thanks,

Sherry Lake
UVA Dataverse http://dataverse.lib.virginia.edu


James Myers

unread,
Feb 28, 2025, 6:51:10 PMFeb 28
to dataverse...@googlegroups.com

Sherry,

Looks like you were discussing in zulip but I can’t tell if you’ve solved your issue. As you noted there, the 503 doesn’t mean that Dataverse has failed, just that the timeout you have set for responses was shorter than the time it takes to delete. If the delete did not finish for some reason, I suspect you could delete the datafile and filemetadata entries associated with that dataset/version in the db. It would be worse if there are categories/tags associated with the files, but if these were bulk uploaded, hopefully there isn’t much. Cleaning up the physical files would be a pain except we now have the https://guides.dataverse.org/en/6.2/api/native-api.html#cleanup-storage-of-a-dataset API call that will remove physical files not associated with the dataset in the database.

 

There’s good news coming too – shameless plug – a new delete files API call that would let you delete the whole list of files at once. (It looks like the delete dataset/delete draft version calls are iterating though n calls to DeleteDatafileCommand right now, so that will still be slow unless/until we adapt it to delete all the files at once like the new API.) There have also been other improvements since 6.2 for dealing with large numbers of files too, so, while scaling isn’t a solved problem, we’re making progress and hearing about specific issues like this one are useful in planning further work.

 

Hope that helps,

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/b537fec0-b418-432b-8b5b-138924bedf7fn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages