dataset zip downloads partially

88 views
Skip to first unread message

Alejandro Ayerdi

unread,
Feb 3, 2021, 3:58:50 AM2/3/21
to Dataverse Users Community

Hello!

I'm trying to solve the problem that in our installation of Dataverse we have a 64GB dataset composed by ~50 files. When we try to download it in .zip, it does not work well, in two cases the .zip is 1.1GB length and the other ones 34 and 35 GB.
The :ZipDownloadLimit is set to 100GB, so I don't know where the problem is?
Maybe some timeouts?
If someone could help me, I would appreciate it.

Thanks!

Alejandro

danny...@g.harvard.edu

unread,
Feb 3, 2021, 9:55:01 AM2/3/21
to Dataverse Users Community
Hi Alejandro, 

There could certainly be some timeouts coming into play with zips of this size, even if the ZipDownloadLimit is set very high. I'll let more technical people on the group provide more information, but I don't know if this is something that we'd expect to generally succeed. Some silent failures may occur, or the manifest provided in the zip file may provide more information.

Instead of trying to optimize this in the application itself, in 5.0 we introduced a way to move the zipping operations off the application server. You can read more about this implementation here:


If you expect to serve large zips, I'd suggest investigating that. We've been running it at on the Harvard Dataverse Repository for a few months and have been happy with the performance. 

- Danny

James Myers

unread,
Feb 3, 2021, 10:53:49 AM2/3/21
to dataverse...@googlegroups.com

Another possibility would be running out of disk space on the Dataverse server machine. To zip, I think the Dataverse Software brings a copy of the files to the local machine and then creates the zip which also takes space.

 

-- Jim

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/4f656e22-fbcc-4339-b23f-8be83540cdf7n%40googlegroups.com.

Alejandro Ayerdi

unread,
Feb 4, 2021, 7:35:29 AM2/4/21
to Dataverse Users Community
Thank you for the responses.
I'll check both with the team.

Alejandro Ayerdi

unread,
Feb 4, 2021, 8:28:54 AM2/4/21
to Dataverse Users Community
Hello!

Danny, would it work with a 5.0 installation or only since the 5.3?

Thank you

danny...@g.harvard.edu

unread,
Feb 4, 2021, 8:58:47 AM2/4/21
to Dataverse Users Community
Hi, it was released in 5.0 and we've made some improvements and bug fixes since. I'd suggest 5.1 if possible, since it contained some fixes in this area of the code.

- Danny

Alejandro Ayerdi

unread,
Feb 19, 2021, 6:14:14 AM2/19/21
to Dataverse Users Community
Hello Danny,

We installed the ZipDownload (https://github.com/IQSS/dataverse/tree/v5.3/scripts/zipdownload) in our dv installation but it does not work fine.
In some cases it downloads and let you extract the files from the .zip but in other cases it just downloads the .zip but it does not allow you to unzip it, saying it is an "empty archive".

Do you know if it happens frequently or is happening to other communities? Do we need to fix something?

Thank you

Alejandro


zip_error.png

Maylein, Leonhard

unread,
Feb 19, 2021, 6:19:49 AM2/19/21
to Dataverse Users Community
Same here:


Leonhard Maylein




Von: dataverse...@googlegroups.com <dataverse...@googlegroups.com> im Auftrag von Alejandro Ayerdi <ayerdia...@gmail.com>
Gesendet: Freitag, 19. Februar 2021 12:14
An: Dataverse Users Community
Betreff: Re: [Dataverse-Users] Re: dataset zip downloads partially
 

danny...@g.harvard.edu

unread,
Feb 19, 2021, 9:49:29 AM2/19/21
to Dataverse Users Community
Hi, thanks for the details here and in the Github issue. We can take a look. It is experimental, but as I mentioned it's been working well for the Harvard Dataverse Repository. 

Thanks,

Danny

leo...@g.harvard.edu

unread,
Feb 19, 2021, 2:26:02 PM2/19/21
to Dataverse Users Community
On Wednesday, February 3, 2021 at 10:53:49 AM UTC-5 Jim Myers wrote:

Another possibility would be running out of disk space on the Dataverse server machine. To zip, I think the Dataverse Software brings a copy of the files to the local machine and then creates the zip which also takes space.


Actually, no, the application reads the individual files from the storage locations directly, and streams the zipped output to the client directly; so no, it should not be using any temp space on the server. 

leo...@g.harvard.edu

unread,
Feb 19, 2021, 2:45:18 PM2/19/21
to Dataverse Users Community
Hi Alejandro, 
Yes, please take a look at the GitHub issue that Leonhard has linked. (I added a couple of diagnostics tips there earlier). 
However, I'm guessing that your issue may be different from what they were seeing at Heidelberg. Because you mentioned earlier that "in some cases it downloads and let you extract the files from the .zip but in other cases ... it does not allow you to unzip it..."; and for them it just wasn't working at all, for any files. 

How do you store your files? Are they all stored on the filesystem? or S3? (or a mix of both?)

As others have pointed out, this may be happening simply because of the size of the zip file; and something timing out... But we'll need to experiment some more to find out what and where exactly. 

One other things is that there are still some OSs, and/or some filesystems, that are not going to allow you to save a 30+GB file... Is there any failure pattern, when trying to download from different clients? (macOS vs Window vs ...?)

best,
-Leonid


On Friday, February 19, 2021 at 6:19:49 AM UTC-5 Leonhard Maylein wrote:
Reply all
Reply to author
Forward
0 new messages