Large files won't upload

135 views
Skip to first unread message

Jonathan Bohan

unread,
Dec 8, 2017, 11:42:09 AM12/8/17
to Dataverse Users Community
Hi!

So I've come across an issue in v. 4.8.1 where larger files stall in upload when the progress bar fills. This may be related to this issue: https://groups.google.com/forum/#!msg/dataverse-community/mcji2ytn3QI/SgVSrcu5BAAJ;context-place=msg/dataverse-community/jRETXzdj8ac/4Mfc5TfpAwAJ 

Our tech team here has configured Dataverse to accept up to 10 GB files, however anything over the prior 2 GB limit seems to stall out, even if I've done the trick of double zipping it - for example, I had one file just over the 2 GB limit which stalled even when double-zipped into an approximately 987 MB file. 

Any thoughts or advice would be appreciated.

Best regards,

Jonathan Bohan

Sebastian Karcher

unread,
Dec 8, 2017, 11:47:14 AM12/8/17
to dataverse...@googlegroups.com
No help, but we're seeing exactly the same in our dev environment (running 4.6.2). Sometimes waiting and then uploading a single file works, but then pretty soon it stalls again (also after seemingly uploading the file & filling the progress bar). When trying to upload multiple files at once, they all seem to upload and the vanish(?) for lack of a better word. We just started seeing this so don't have any error logging yet but wanted to flag that this doesn't seem to be an isolated issue.
Our files are only ~250-350MB.

Thanks,
Sebastian

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/76467e3a-39ac-41cb-bf69-f35d2f0afea8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

Jonathan Bohan

unread,
Dec 8, 2017, 11:56:54 AM12/8/17
to Dataverse Users Community
Thanks, Sebastian. I was having a similar issue with files disappearing while trying to upload multiple files - it seems to have gone away since IT upgraded my computer. 

Sebastian Karcher

unread,
Dec 8, 2017, 12:05:13 PM12/8/17
to dataverse...@googlegroups.com
We've tested this on three different computers (though all on the same network) and it happens consistently, though only with large files. I can't say what the limit is -- we don't really have anything between 10MB (which is fine) and 200MB (which isn't).
As I say, we haven't done serious troubleshooting on this and we're running an outdated DV release, but since the error you describe (stalling with full upload bar) is exactly what we're seeing, I thought it'd be useful to mention.

On Fri, Dec 8, 2017 at 11:56 AM, Jonathan Bohan <waiting...@gmail.com> wrote:
Thanks, Sebastian. I was having a similar issue with files disappearing while trying to upload multiple files - it seems to have gone away since IT upgraded my computer. 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Pete Meyer

unread,
Dec 8, 2017, 2:22:58 PM12/8/17
to Dataverse Users Community
It might be useful to know if this happens with API file uploads (in addition to upload through the web UI) - but that said, I'm not sure where the problem here is.


On Friday, December 8, 2017 at 12:05:13 PM UTC-5, Sebastian Karcher wrote:
We've tested this on three different computers (though all on the same network) and it happens consistently, though only with large files. I can't say what the limit is -- we don't really have anything between 10MB (which is fine) and 200MB (which isn't).
As I say, we haven't done serious troubleshooting on this and we're running an outdated DV release, but since the error you describe (stalling with full upload bar) is exactly what we're seeing, I thought it'd be useful to mention.
On Fri, Dec 8, 2017 at 11:56 AM, Jonathan Bohan <waiting...@gmail.com> wrote:
Thanks, Sebastian. I was having a similar issue with files disappearing while trying to upload multiple files - it seems to have gone away since IT upgraded my computer. 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Dec 14, 2017, 8:08:19 PM12/14/17
to dataverse...@googlegroups.com
I agree that testing large file upload via SWORD or the native API would be an interesting data point compared to file upload via the GUI. Please see the API Guide for details: http://guides.dataverse.org/en/4.8.4/api

Also, Slava was saying recently that he was able to upload 5 GB files and with some tricks, file as big as 30 GB: https://github.com/IQSS/dataverse/issues/4125#issuecomment-328994405

That said, where there's smoke there's fire. I feel like there is something that could be configured better or coded better because it seems like there may be a bottleneck somewhere.

I hope this helps,

Phil


On Fri, Dec 8, 2017 at 2:22 PM, Pete Meyer <me...@hkl.hms.harvard.edu> wrote:
It might be useful to know if this happens with API file uploads (in addition to upload through the web UI) - but that said, I'm not sure where the problem here is.

On Friday, December 8, 2017 at 12:05:13 PM UTC-5, Sebastian Karcher wrote:
We've tested this on three different computers (though all on the same network) and it happens consistently, though only with large files. I can't say what the limit is -- we don't really have anything between 10MB (which is fine) and 200MB (which isn't).
As I say, we haven't done serious troubleshooting on this and we're running an outdated DV release, but since the error you describe (stalling with full upload bar) is exactly what we're seeing, I thought it'd be useful to mention.
On Fri, Dec 8, 2017 at 11:56 AM, Jonathan Bohan <waiting...@gmail.com> wrote:
Thanks, Sebastian. I was having a similar issue with files disappearing while trying to upload multiple files - it seems to have gone away since IT upgraded my computer. 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Amber Leahey

unread,
Dec 15, 2017, 8:54:20 AM12/15/17
to Dataverse Users Community
Hi all, we too have been experimenting a bit with large files, using the GUI and SWORD API, neither seems to support files larger than 2GB even with our settings set much higher in our demo http://demodv.scholarsportal.info (I believe 10GB), unfortunately. The Harvard teams has given us some ideas for backend file upload, but this isn't ideal, so we are really starting to ramp up our investigation of what it will take to have Rsync work with other storage locations for large file transfer. 

We've even had someone report issues with uploading a 1GB text file using the GUI with several failed attempts. We are running 4.7.1 in both demo and production. Eventually zipping the file did the trick, but something definitely seems to be slower than usual!

Interesting solution that is presented by Slava in Github, I wonder what our systems admin will think of that. :)

....

Best,
Amber





On Thursday, 14 December 2017 20:08:19 UTC-5, Philip Durbin wrote:
I agree that testing large file upload via SWORD or the native API would be an interesting data point compared to file upload via the GUI. Please see the API Guide for details: http://guides.dataverse.org/en/4.8.4/api

Also, Slava was saying recently that he was able to upload 5 GB files and with some tricks, file as big as 30 GB: https://github.com/IQSS/dataverse/issues/4125#issuecomment-328994405

That said, where there's smoke there's fire. I feel like there is something that could be configured better or coded better because it seems like there may be a bottleneck somewhere.

I hope this helps,

Phil

On Fri, Dec 8, 2017 at 2:22 PM, Pete Meyer <me...@hkl.hms.harvard.edu> wrote:
It might be useful to know if this happens with API file uploads (in addition to upload through the web UI) - but that said, I'm not sure where the problem here is.

On Friday, December 8, 2017 at 12:05:13 PM UTC-5, Sebastian Karcher wrote:
We've tested this on three different computers (though all on the same network) and it happens consistently, though only with large files. I can't say what the limit is -- we don't really have anything between 10MB (which is fine) and 200MB (which isn't).
As I say, we haven't done serious troubleshooting on this and we're running an outdated DV release, but since the error you describe (stalling with full upload bar) is exactly what we're seeing, I thought it'd be useful to mention.
On Fri, Dec 8, 2017 at 11:56 AM, Jonathan Bohan <waiting...@gmail.com> wrote:
Thanks, Sebastian. I was having a similar issue with files disappearing while trying to upload multiple files - it seems to have gone away since IT upgraded my computer. 

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Brandon Kowalski

unread,
Jan 24, 2018, 3:08:12 PM1/24/18
to Dataverse Users Community
Hey folks. Bringing this topic back from the dead.

I had some free time to debug the issue Jonathan was seeing on Cornell's Dataverse. It appears that no proxy timeout set in apache2. The network tab in the debugger reported a 500 error of which the body matched the default apache2 500 error page.

After investigating the apache2 logs I saw that the ajp timed out. Long story short I changed the following in the apache2 config (in particular  /etc/http.d/conf.d/ssl.conf).

Towards the bottom of this config the following appears:

# pass everything else to Glassfish
ProxyPass / ajp://localhost:8009/

I simply added timeout=600 to the end of the ProxyPass line to give a 10 minute timeout.

# pass everything else to Glassfish
ProxyPass / ajp://localhost:8009/ timeout=600

Once making this change the file uploaded without an issue. Perhaps a small update should be made to the Dataverse configuration docs to suggest that one also checks their apache2 settings as this can cause uploads to fail even if the maxuploadsize is set properly in DV.

~btk

Philip Durbin

unread,
Jan 24, 2018, 3:31:29 PM1/24/18
to dataverse...@googlegroups.com
Wow, that seems like an easy fix. Thanks for researching that, Brandon. Would you mind creating an issue at https://github.com/IQSS/dataverse/issues about adding this to the docs?

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Brandon Kowalski

unread,
Jan 24, 2018, 4:37:26 PM1/24/18
to Dataverse Users Community
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Jan 24, 2018, 9:32:39 PM1/24/18
to dataverse...@googlegroups.com
Thanks, Brandon! I just left a comment in that issue you opened.

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Message has been deleted

Bikramjit Singh

unread,
Jan 25, 2018, 10:31:48 AM1/25/18
to Dataverse Users Community
We had similar issue with timeouts but we are using HAProxy instead of Apache and had following timeouts
 Client --2mins-> HAProxy  --5mins--> Glassfish:8080

Setting both to 30mins solved upload problems. We were still getting some issues at 10mins.

-
Bikram 
Scholars Portal
Reply all
Reply to author
Forward
0 new messages