file upload not completing

202 views
Skip to first unread message

Jamie Jamison

unread,
Jun 11, 2020, 9:25:00 PM6/11/20
to Dataverse Users Community
I'm running into some problems uploading large files.  My example file is a geodatabase (gdb) that's been zipped to around 2.9mg.

From reading other people's postings what I've done so far:
1) raised the maximum file upload:   MaxFileUploadSizeInBytes

/etc/httpd/conf.d/ssl.conf and looks like the following:
# pass everything else to Glassfish
ProxyPass / ajp://localhost:8009/ timeout=600

I don't get any errors, most of the upload progress fills up and then the file disappears.

I'm wondering if I should have a larger time out? Under 2.7 gb file upload with no problem so I don't think I have a permissions problem.

Thank you,

Jamie

James Myers

unread,
Jun 12, 2020, 9:40:27 AM6/12/20
to Dataverse Users Community

Jamie – it’s hard to debug this sort of thing by email. My guess is that there will be clues to be found in the browser console and network display. If you know how to open those, I’d look in the list of network calls to see which one is failing and go from there – when I’ve had to look at this type of issue before, seeing the returned status code and the agent (Server) responding with that code was useful in figuring out which timeout settings were involved. If you haven’t already, it could also be useful to look in the Dataverse server.log as well. If there are issues in parsing that zip file, or insufficient disk space, etc. they’ll be logged there.

 

If it’s helpful, I’d be happy to do a zoom/screen share to walk through this – just email me directly.

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/a543031c-48f3-4651-b6da-41099b453eb3o%40googlegroups.com.

Jamie Jamison

unread,
Jun 12, 2020, 1:15:39 PM6/12/20
to Dataverse Users Community
I'll go through your suggestions first and if I"m still stick I'll email directly.  Knowing where to look has often been my problem.

thank you,

jamie

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Jamie Jamison

unread,
Jun 15, 2020, 2:44:02 PM6/15/20
to Dataverse Users Community
It might save time to explain what I'm trying to do.   UCLA has a collection of LiDAR geospatial data.  Large files, DEMs, etc.  I'm trying to come up with a way to get these into our collection.   I've been trying to do this with DVUploader but  would there be a better method?  

Sorry, I would have emailed directly but I can't find your email.   Google groups are pretty good about privacy.

- jamie

James Myers

unread,
Jun 15, 2020, 3:16:02 PM6/15/20
to dataverse...@googlegroups.com

Jamie,

The DVUploader is good when you have many files/ a whole tree of files or when the data is large enough/upload long enough that having something that logs the result is useful. It’s also useful if you are doing multiple upload sessions – DVUploader keeps track of which files are already uploaded. (It would also be useful as a way to upload all the files in a zip separately (if you have more than the max number of files per zip allowed or if you’re hitting a timeout due to the length of time it takes Dataverse to unzip and store the individual files.)

 

With a couple small differences, the DVUploader is doing the same thing as the Dataverse web UI does, so it’s mostly a matter of convenience. Differences to note:

 

·         DVUploader uses the API rather than the internal calls the UI uses – the biggest difference there is that the API only allows a single file upload so using the DVUploader is analogous to repeatedly uploading one file at a time in the web UI.

·         Whereas the UI just registers an error if an upload fails, the DVUploader will retry – useful if it’s a random network problem, not so helpful if there’s a timeout that will make retries also fail.

·         For direct uploads, the DVUploader will have support for uploading files > 5GB to an AWS S3 bucket before that is working in the web UI.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/921f14a9-31e9-452d-9b25-7ae36adb6aa5n%40googlegroups.com.

Jamie Jamison

unread,
Jun 23, 2020, 4:34:08 PM6/23/20
to Dataverse Users Community
I tried raising the max number of files in the zip, raised the time out and this time for the UI the bar went all the way to then end and then got a 500 error.  

For the command line with DVUploader I got:
  I/O exception (java.net.SocketException) caught when processing request to {s}->https://dataverse.ucla.edu:443: Connection reset by peer: socket write error

At least I have different errors now and I'm still going through the server.log.  One error which I don't quite understand yet and am googling  is:
   An exception or error occurred in the container during the request processing java.lang.Exception: Host is not set

Jamie 
500-error.JPG

Philip Durbin

unread,
Jun 23, 2020, 4:41:29 PM6/23/20
to dataverse...@googlegroups.com
Sounds like you're still going through server.log for the 500 error you saw in the Dataverse UI (thanks for sending a screenshot). You're welcome to email it to sup...@dataverse.org when you're ready.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e999b179-e3d4-458d-98a6-7df4108886c0o%40googlegroups.com.


--

Paul Boon

unread,
Jun 23, 2020, 5:38:24 PM6/23/20
to dataverse...@googlegroups.com

Hi Jamie,

 

I should check the filesystem (df -h), most times there is a partition with not enough space, especially the /usr and /tmp could be problematic.

Dataverse is doing a lot with temporary files for multi-part upload and unzipping.

 

Paul

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/e999b179-e3d4-458d-98a6-7df4108886c0o%40googlegroups.com.

Jamie Jamison

unread,
Jun 23, 2020, 6:31:30 PM6/23/20
to Dataverse Users Community
Does Dataverse write temporary files first to /usr or /tmp even if my storage is an s3 bucket?  

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

James Myers

unread,
Jun 23, 2020, 7:29:17 PM6/23/20
to dataverse...@googlegroups.com

Yes, unless you’re using the directupload option. You may need local storage that is a few times (3+) the size of your file – with a zip file, Dataverse also tries to unzip the file and doesn’t delete the zip itself until done.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/b4489411-aa05-4bcf-9d02-72d99da68a5eo%40googlegroups.com.

Message has been deleted

Jamie Jamison

unread,
Jun 24, 2020, 5:34:18 PM6/24/20
to Dataverse Users Community
I've created a new s3 bucket for direct upload only, raised the Max file upload size and I'm still getting the " Software caused connection abort: socket write error".   
Full error:
Jun 24, 2020 2:21:11 PM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (java.net.SocketException) caught when processing request to {s}->https://dataverse.ucla.edu:443: Software caused connection abort: socket write error
Jun 24, 2020 2:21:11 PM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {s}->https://dataverse.ucla.edu:443

I admit I've probably missed something in the setup but I'm not sure what to try next.   The LiDAR data is too large to upload through the UI so I'm going to have to find another way to get it uploaded.

Jamie

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages