Informal survey regarding bitstream sizes

Bill Tantzen

unread,

Apr 1, 2021, 2:23:12 PM4/1/21

to dspace-c...@googlegroups.com

If you have a minute, I am trying to get a feel for some of the larger (reasonable) bitstreams the community is currently supporting. On my site, we have removed the DSpace upload limits to allow for records containing research data, but of course there are practical limits that dictate what makes for a good user experience.

What is the largest bitstream you support? Do you enforce upload limits? Assuming download speeds are faster than upload speeds, what are some of the methods in use (besides the DSpace gui) to get large files onto the server? What are some alternatives to simple DSpace upload currently utilized -- like globus for instance?

I realize the answer to these questions will always include "it depends...", but are these all questions you have had at your institution and how have you dealt with them?

Thanks for any discussion you wish to contribute!

~~ Bill

--

Human wheels spin round and round
While the clock keeps the pace... -- John Mellencamp
________________________________________________________________
Bill Tantzen University of Minnesota Libraries
612-626-9949 (U of M) 612-325-1777 (cell)

Oliver Goldschmidt

unread,

Apr 9, 2021, 10:28:42 AM4/9/21

to DSpace Community

Bill,

I just saw your request on discussing bitstream sizes in DSpace and would like to join the conversation.

We are also having some reservations about uploading large files via the Web UI to DSpace. For us the reasonable limit to upload this way is at about 5 GB. If someone wants to publish data larger than this (which happened a couple of times yet, and we expect that it will increase to happen in the future), we are offering them to upload the files to our server via WebDav. Once we have the files, we are building an SAF import package with them and ingest it to DSpace on behalf of the user, ingesting it into the normal approval workflow. There we reject the item, so that the user has the chance to review and change the metadata before it's eventually approved. Another scenario using a similar approach is to let the user create a new item in DSpace, but tell them not to upload the file within that process and instead to use the WebDav option to do that. If we know the item number the user created and have the files, we can add the file with the dspace command (however, this is an new function The Library Code implemented for us and thus not available in generic DSpace yet).

We do not use the upload limit in DSpace - as far as I remember it was not working properly on DSpace 5 JSPUI, though. But admittedly I haven't tried it for a long time now, so maybe this issue has been solved in the meantime.

Let me add another aspect about large files, but in the other direction (not the upload aspect, but the download aspect): we are storing all of our bitstreams on an S3 storage and as DSpace 5 does not natively support that, we are using Cloudian Hyperfile for that, which is providing an NFS-mountable volume, which is linked to our assetstore. That means, all of the bitstreams (including thumbnails, licenses and so on) are going to the S3 storage. This is basically working fine, but with large files we once had some trouble in context with web crawlers harvesting those files: if too many users were getting too many of the large files parallely, this caused cache problems on the hyperfile volume. To avoid this, we have tuned some of the cache settings on Hyperfile, and we excluded the big files in our robots.txt from being crawled (as we think, it would be rather useless to crawl them at all). That solved those problems until now. But I guess it's something worth noting.

If anyone has some experience with other download regulators preventing a user to download too many stuff parallely, I would be eager to know about that.

Best

Oliver

rice

unread,

Apr 30, 2021, 12:34:25 PM4/30/21

to DSpace Community

Hello,

I saw this and want to give a brief response, about to go on leave but my colleagues could say more, John Pinto or Pauline Ward.

In Edinburgh DataShare https://datashare.ed.ac.uk/ which is an institutional data repository, we have made some enhancements (first in DSpace 5 now in 6.x) to allow drag and drop uploads up to 20 GB per item. We will also allow batch import for up to 100 GB per item. We have tested this and we have a message for larger data downloads that it will take a while sometimes more than a day, but it is resumable download so robust.

We also have a 'download all' button which points to a zip file of the item's bitstreams, since most datasets have numerous files.

Cheers,

Robin Rice

University of Edinburgh Library

FILIPPOS KOLOVOS

unread,

Apr 30, 2021, 2:29:41 PM4/30/21

to DSpace Community

Dear Sir,

Yes in our institution we do enforce bitstream limit size up to 100MB per file.

Occasionally, we allow system admins to upload larger files and we have approximately 300 files above 300MB and only a few above 1GB.

Best Regards,

-Fk

--
All messages to this mailing list should adhere to the Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/
---
You received this message because you are subscribed to the Google Groups "DSpace Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-communi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-community/CADgrb7GLKZa-oyHnJUGurbBu5CRFgXFu0wsBi_VHOtA7NJZoyw%40mail.gmail.com.

Oliver Goldschmidt

unread,

May 3, 2021, 10:00:34 AM5/3/21

to DSpace Community

Robin,

that's interesting - it would be great to hear more about the enhancements you have made (for us it would be most interesting for DSpace 5) - maybe your colleagues can add more details about that? Do you have those enhancements publicly available on Github?

The "Download All" button sounds interesting as well - actually we had a request about something pretty similar recently.