How to delete a collection with a large number of items?

630 views
Skip to first unread message

ku bun

unread,
Jan 2, 2022, 6:18:21 AM1/2/22
to DSpace Community
I use dspace 6,xmlui interface and I want to delete collection containing about 150,000 items, i try using admin account and delete on web interface but no success, waited for dozens of hours and the website said timeout, and nothings changes, the collection is still intact. Is there any way i can delete directly in database ? Thank you very much!

Tim Donohue

unread,
Jan 6, 2022, 12:31:38 PM1/6/22
to DSpace Community
Hi,

Yikes, that's definitely a very large number of items to delete at one time.  When trying to delete them all at once, it's likely that DSpace is waiting for them *all* to succeed before saving the changes. That's why the timeout is occurring.

You might try making the collection smaller by deleting the items in batches.  One way to do this from the website would be to use the Batch Metadata Editing (CSV) feature, which provides an "expunge" action to delete items in batches.   So, you might try to use that to delete the items in batches of 1,000 or so (you may need to play around with what size of batch works best for you, as too large of a batch also could timeout).   https://wiki.lyrasis.org/display/DSDOC6x/Batch+Metadata+Editing#BatchMetadataEditing-Performing'actions'onitems

Once the Collection is smaller in size, a deletion from the UI should no longer timeout.  I'm honestly not sure what size you'd need to get it down to though, as that's very dependent on your local setup, bandwidth, etc.

The only other batch deletion option I can think of would be to write a small script (in the programming language of your choice) and using the DSpace REST API ("DELETE /items/{item_id}"). But, that'd require first figuring out the list of Item IDs which you need to delete (either from querying the database or Solr, or similar). https://wiki.lyrasis.org/display/DSDOC6x/REST+API#RESTAPI-Items

Hopefully that gives you some ideas to start with,

Tim

ku bun

unread,
Jan 6, 2022, 11:21:32 PM1/6/22
to DSpace Community
Thank for reply..  I solved it by deleting the rows with the same column value owning_collection in the item table and other tables like collection2item, item2bundle,.. in the database.
After I run command dspace index-discovery -c and dspace index-discovery -bf.
It works.
Reply all
Reply to author
Forward
0 new messages