Remove Entries from a Collection Workflow

25 views
Skip to first unread message

Keith Jones

unread,
Mar 9, 2020, 9:42:49 AM3/9/20
to DSpace Technical Support
Hi All,

I'm working on trying to remove 1000 plus entries that are sitting in
a collection workflow. There was a mis-hap with a vendor who deposited
items into my repository using SWORD. Now I have 1000 plus entries I
need to remove from my workflow, these items have not been finalized
and do not reside in the collection.

I working in Dspace version 5.3. Looks like i can get a workflow_id
from from the dspace ui, and then use it to remove the entry in the
workflowitem table, do I also need to consider removing entries from
the item table and the metadatavalue table? Do I also need to remove
entries from the bistream table. I'm not sure how much orphaned data
would be around if I only remove the entries from the workflowitem
table.

Thanks
Keith

Tim Donohue

unread,
Mar 10, 2020, 12:10:29 PM3/10/20
to Keith Jones, DSpace Technical Support
Hi Keith,

I hate to say this, but I'd recommend *AGAINST* deleting anything in the database directly.  The workflowitem table also does use/link to the item table and metadatavalue table, and resourcepolicy table, and they already are linked to whatever collection they are being submitted into (via the item table), etc etc.  In other words, it's really hard to craft a query that would clean everything up safely.  More than likely something would get forgotten and you could hit really odd errors later on (as DSpace doesn't always handle orphaned data perfectly...it expects you not to touch the database directly)

A few options exist:
  • You could see if it's possible to "rollback" (or restore) and older backup of your database from prior to the 1,000 workflow items.  However, if other new content was added since then, it would also be lost (unless you could simply resubmit those items)
  • You could go in and reject these items (one by one, or if you can install the JSPUI, there's an "Administer -> Content -> Workflow" page which lets you reject as an Admin).  This would send them back to the submitter's "workspace".  The submitter can then login & bulk delete them (using a checkbox).
  • You could craft a SQL query to move all those (in progress) items into a new, temporary Collection and then delete that entire temporary Collection (which should delete all associated items).
  • (There may be other workarounds here that others on this list have done to fix this issue)
I'd highly recommend trying these on a subset of data in your test environment *first* to verify it works overall.  This is especially true if you decide to go the database modification route.

I know this is frustratingly not easy to bulk delete in progress submissions in DSpace right now.  This will become quite a bit easier in DSpace 7 (once released later this year), as it comes with a full featured REST API which allows for easier scripting of bulk changes (without needing touch the database).

If others on this list have figured out a better way to clean up workspace or workflow items in bulk, hopefully they will share their hints/tips.

Tim




From: dspac...@googlegroups.com <dspac...@googlegroups.com> on behalf of Keith Jones <mad...@udel.edu>
Sent: Monday, March 9, 2020 8:42 AM
To: DSpace Technical Support <dspac...@googlegroups.com>
Subject: [dspace-tech] Remove Entries from a Collection Workflow
 
--
All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/CAJ6iCzOwDe%2B9Rwk9TYSGNRMKy9piBRusFC-Ozo-UuDjGVnDV0Q%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages