I think the only thing wget can do that browser downloads don’t is to autorestart after a failure to get the rest of the file. If that’s the case, I wonder if browser extensions like ‘Auto Resume’ in the Chrome store would help. (There are more sophisticated one that do download in parallel (like the S3 direct upload mechanism) which might also improve performance.) I’m not sure if the new Range Header support in v5.9 is required for or would help with any of this, but I would suspect that if wget succeeds, a browser plugin of this type might help/be able to do the same thing.
-- Jim
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/96ebd748-7500-4654-926d-4803af422510n%40googlegroups.com.
In general, when tools request a file over https they get info about the size and then bytes start streaming and get written to disk. If the connection fails (for any reason – timeouts or cables getting unplugged, etc.) before the correct number of bytes are written, tools can then re-request a file, starting from the point (byte offset) where things failed so that it only has to retrieve bytes it hadn’t gotten the first time. This means that, with a timeout, one can keep getting new bytes and eventually get the whole file. Google Chrome and other browsers don’t do this automatically, but they do actually keep the partial download and, if the user requests, they can restart download to just get new bytes. My understanding is that the plugins just automate that request to restart and continue downloading.
As for timeouts, I think the general reason for them is to be able to kill stuck connections and/or to avoid people maliciously tying up connections (denial of service), e.g. by asking for many files and reading them a 1 byte/second to keep all of the available connections busy. When you know there are valid long connections, it is usually reasonable to lengthen the timeout – you may just want to watch for any denial of service activity and be ready to shorten the timeout if/when needed.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/c95733f6-9ccf-470b-baf8-338d22f61f96n%40googlegroups.com.