Workflow issue when bitstreams are in S3 storage

15 views
Skip to first unread message

Edmund Balnaves

unread,
Aug 21, 2024, 3:26:45 AM8/21/24
to DSpace Technical Support
We are running DSpace 7.6.2 on a well-configured AWS Linux server with 16gb memory and 8 x CPU.

In addition to the known problems with performance in the workflow component we found a block where where the bitstreams are in S3 storage and items have multiple (30-40) bitstreams.   Claiming, editing and approving these items entailed delays of 10 minutes or more on each action, mostly with a resulting browser timeout.  Editors were unable to progress these items.  Tomcat CPU sits on 100% for this thread and multiple rest, database, SOLR and S3 interactions are evident.

We have implemented a workaround by:

* making the inbound asset store a local volume
* writing a curation task to migrate specific items to the S3 object store after archiving.

Not ideal as S3 is the target environment as the system has 1tb+ items, but at least allows the user to proceed.  

The reduced the claim/edit/approval time to "workable" range of 30-60 seconds per action - albeit still pretty poor.

We have other DSpace7 instances with S3 object store performing ok but mainly  with 1 item to 1 bitstream ratio.

Has anyone else experienced this issue ?  Is there configuration I am missing?   IMHO the whole workflow process needs a rewrite - it only really works well when there are only a few simple items in the workflow.


Edmund Balnaves

Reply all
Reply to author
Forward
0 new messages