About the large .tmp file in /usr/local/payara5/glassfish/domains/domain1/generated/jsp/dataverse

337 views
Skip to first unread message

Taki NakaMura

unread,
Aug 6, 2021, 5:52:08 AM8/6/21
to Dataverse Users Community
Hi everyone,

I am testing the file upload with HTTP via browser, and I am trying to upload a large file (~11Gb). For uploading the smaller file ( ~1Gb to  2Gb), the .tmp file in the /usr/local/payara5/glassfish/domains/domain1/generated/jsp/dataverse will be deleted after the files are successfully uploaded. However, when I try to upload the large file (~11Gb), it keeps waiting for quite a long time and it seems that the upload is failed and I close the browser directly. 

waiting.PNG

A 11Gb .tmp file is in the directory /usr/local/payara5/glassfish/domains/domain1/generated/jsp/dataverse/

After I upload other files, the 11Gb .tmp file is still in the mentioned directory. May I know why it is not removed like the other .tmp file? Thanks.

Thanks,
Patrick

James Myers

unread,
Aug 6, 2021, 8:08:46 AM8/6/21
to dataverse...@googlegroups.com

Patrick,

Temp files should get deleted if/when successfully uploaded – there’s no change in behavior due to size.

 

However, there are ways temporary files, of any size, can be left in place with unsuccessful uploads. One simple example is if you start uploading files and then close the browser window without hitting save or cancel. There has been a lot of work to minimize the circumstances where this will happen and there are open issues related to improving file handling further, particularly when using a file store (versus sending files to an S3 store). In general, admins should monitor space and either manually delete older temp files or set up automated mechanisms to do so.

 

In terms of why your 11 GB file failed, there are multiple possibilities:

·         Lack of disk space – when using a file store (versus using S3), Dataverse requires 2-3x the size of the file as temporary storage (if your temporary and final files go to different file systems, you need to check space on both)

·         Connection timeouts – several components in your setup including Apache and Payara, and possibly things like a load balancer can limit the time a connection can be open and if uploading a file takes longer than that it will fail

 

Handling a large file with a file store can also be slow enough to appear broken even if things are going correctly. Dataverse has to internally copy the file to the final location (and actually makes a second temporary copy right now and will try to unzip zip files as well).

 

There are ways of debugging this by looking in the Dataverse log and checking the browser console and, with some work, uploads of ~11GB to a file store can be handled. There are many threads discussing this in email that provide more details.

 

However, these days I think the recommendation for handling files of this size and larger would be is to use an S3 store and to configure ‘direct upload’ to it. With direct upload, Dataverse does not make temporary copies of the file at all and will actually break the file into multiple pieces and send them independently/in parallel, doing some retries if there are network problems in transferring specific parts. Further, the file transfer is just between your browser and the S3 store, so it does not add any load to your Dataverse server. This method has worked for people with files up to a few hundred GBs each. (There are other Big Data options for Dataverse that exist/are in development that can handle this size and even larger.)

 

Hope that helps,

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/c4d694fd-0160-4993-b06a-5bbe04e9cd65n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages