Workflow issue when bitstreams are in S3 storage

15 views

Skip to first unread message

Edmund Balnaves

unread,

Aug 21, 2024, 3:26:45 AM8/21/24

to DSpace Technical Support

We are running DSpace 7.6.2 on a well-configured AWS Linux server with 16gb memory and 8 x CPU.

In addition to the known problems with performance in the workflow component we found a block where where the bitstreams are in S3 storage and items have multiple (30-40) bitstreams. Claiming, editing and approving these items entailed delays of 10 minutes or more on each action, mostly with a resulting browser timeout. Editors were unable to progress these items. Tomcat CPU sits on 100% for this thread and multiple rest, database, SOLR and S3 interactions are evident.

We have implemented a workaround by:

* making the inbound asset store a local volume

* writing a curation task to migrate specific items to the S3 object store after archiving.

Not ideal as S3 is the target environment as the system has 1tb+ items, but at least allows the user to proceed.

The reduced the claim/edit/approval time to "workable" range of 30-60 seconds per action - albeit still pretty poor.

We have other DSpace7 instances with S3 object store performing ok but mainly with 1 item to 1 bitstream ratio.

Has anyone else experienced this issue ? Is there configuration I am missing? IMHO the whole workflow process needs a rewrite - it only really works well when there are only a few simple items in the workflow.