Large file upload problem

40 views
Skip to first unread message

mark....@dartmouth.edu

unread,
Jul 23, 2025, 10:41:41 AMJul 23
to Dataverse Users Community
We are running into a problem with our dataverse install not processing large (-gt 2GB) uploaded zip files.  The files are uploaded to the 'temp" directory in the Ddataverse.files.file.directory and there seems to be no problem with their integrity.  The upload itself takes about 10 minutes but the browser UI just spins and will continue to do so for hours until it is cancelled or the window closed.  I'm not seeing any resource constraints (CPU, storage or RAM use) that would be causing problems. Unfortunately, I don't seem to have any errors or warnings in the logs to help clarify the problem.  I'm not sure how to proceed with troubleshooting this problem, so any help would be greatly appreciated.

Dataverse 6.6; Payara 6.2025.2; RHEL 8.10 (4 cores, 16GB RAM)

PS. I did find messages like this in our server.log.  I can't even say for sure that it is related as they don't seem to always correlate with the upload attempts.  Hopefully it's not distracting from the issue:
[2025-07-22T12:42:06.310-0400] [Payara 6.2025.2] [WARNING] [] [] [tid: _ThreadID=95 _ThreadName=http-thread-pool::jk-connector(4)] [timeMillis: 1753202526310] [levelValue: 900] [[
Response has already been committed, and further write operations are not permitted. This may result in an IllegalStateException being triggered by the underlying application. To avoid this situation, consider adding a Rule `.when(Direction.isInbound().and(Response.isCommitted())).perform(Lifecycle.abort())`, or figure out where the response is being incorrectly committed and correct the bug in the offending code.]]

James Myers

unread,
Jul 24, 2025, 4:35:46 PMJul 24
to dataverse...@googlegroups.com

Mark,

It’s hard to diagnose remotely, but I might suggest checking for timeouts. When you upload a zip file, the file is sent, and unzipped, and the list of files inside is sent back to the browser. If Payara, or Apache httpd or nginx or a load balancer, etc. times out, the list never gets back to the browser. Looking in the browser dev console might show if you are indeed getting an error on that call, usually a 504. In the info about the response header, you might see the name of the component that is sending the response which would point to where you need to up the timeout. If you can find that, people running similar systems might be able to point you to the right settings.

 

It sounds like you’ve checked for disk space – the only thing I can add there is that, with normal uploads (vs. direct uploads via S3 which don’t send files to the Dataverse server), there can be 2 copies of the file and the unzipped files, so 3x + of the file size can be needed.

 

In general, when files start to get large, we’d recommend looking into direct S3 upload (can be done over a local file system using minio or other service). There are pros and cons to that, but I’d guess most places with GB+ files use it (we don’t have actual stats that I know of).

 

Hope that helps,

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataverse-community/1abc4e8c-0847-4ef6-9f9e-bf8662699c99n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages