file extension missing from original file format downloads

37 views
Skip to first unread message

aussda....@gmail.com

unread,
Mar 23, 2018, 3:32:53 AM3/23/18
to Dataverse Users Community
We're experiencing a strange problem with file downloads from our production Dataverse (4.6.2). For some reason if you try to download the "original file format" version of a dataset the file downloads successfully but without a file extension (e.g., .tab). This only happens with the original format option. For an example check out this dataset:


Is there a problem with our ingest workflow, or configuration settings, or maybe there is a bug of some kind? Thanks very much for the help. Best, Frank


Philip Durbin

unread,
Mar 23, 2018, 6:37:30 AM3/23/18
to dataverse...@googlegroups.com
Hi Frank,

I just downloaded https://data.aussda.at/file.xhtml?fileId=183&version=1.0 from that dataset and you're right, the file was saved without an extension. Because the dropdown said "Original File Format (Tab-Delimited)" I expected the file to be a tab-delimited file, but the Unix `file` command thinks it's an SPSS file:

$ file 10007_da_de_v1_0
10007_da_de_v1_0: SPSS System File TICS 64-bit MS Windows 24.0.0.1

The file seems to have been successfully ingested (I downloaded the "Tab-Delimited" version that doesn't say "Original File Format").

Do you think the file was uploaded without an extension? That would explain why you don't see any extension while downloading the original file. If Dataverse guessed wrong about the file type, I'm surprised the file was successfully ingested but I'm not super familiar with this part of the code.

Maybe you could try uploading the file with various extensions or no extension to https://demo.dataverse.org to see what the behavior is there.

Phil

p.s. This is related: "This is how the filename is generated in the application; once the file is ingested, the stored file name has the ".tab" extension. For the stored original that extension is modified on the fly based on the original type saved in the database." https://github.com/IQSS/dataverse/issues/2734#issuecomment-310751970



--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/4d7ce772-4132-4ef5-8f1f-2daaa8124d70%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Don Sizemore

unread,
Mar 23, 2018, 8:57:58 AM3/23/18
to dataverse...@googlegroups.com
Phil,

This is known behavior, if I'm remembering correctly. I wrote a python script to download all files from a given dataset in original format, but Thu-Mai wound up manually replacing the extensions of certain filetypes. This turned into a discussion of renaming original files to avoid name collisions; I thought we had an issue open about it at some point, but a search doesn't turn it up.

Don

On Fri, Mar 23, 2018 at 6:37 AM, Philip Durbin <philip...@harvard.edu> wrote:
Hi Frank,

I just downloaded https://data.aussda.at/file.xhtml?fileId=183&version=1.0 from that dataset and you're right, the file was saved without an extension. Because the dropdown said "Original File Format (Tab-Delimited)" I expected the file to be a tab-delimited file, but the Unix `file` command thinks it's an SPSS file:

$ file 10007_da_de_v1_0
10007_da_de_v1_0: SPSS System File TICS 64-bit MS Windows 24.0.0.1

The file seems to have been successfully ingested (I downloaded the "Tab-Delimited" version that doesn't say "Original File Format").

Do you think the file was uploaded without an extension? That would explain why you don't see any extension while downloading the original file. If Dataverse guessed wrong about the file type, I'm surprised the file was successfully ingested but I'm not super familiar with this part of the code.

Maybe you could try uploading the file with various extensions or no extension to https://demo.dataverse.org to see what the behavior is there.

Phil

p.s. This is related: "This is how the filename is generated in the application; once the file is ingested, the stored file name has the ".tab" extension. For the stored original that extension is modified on the fly based on the original type saved in the database." https://github.com/IQSS/dataverse/issues/2734#issuecomment-310751970


On Fri, Mar 23, 2018 at 3:32 AM, <aussda....@gmail.com> wrote:
We're experiencing a strange problem with file downloads from our production Dataverse (4.6.2). For some reason if you try to download the "original file format" version of a dataset the file downloads successfully but without a file extension (e.g., .tab). This only happens with the original format option. For an example check out this dataset:


Is there a problem with our ingest workflow, or configuration settings, or maybe there is a bug of some kind? Thanks very much for the help. Best, Frank


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

Philip Durbin

unread,
Mar 23, 2018, 10:03:24 AM3/23/18
to dataverse...@googlegroups.com
Interesting. Don, you might be in the best position to create a GitHub issue. Do you mind?

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Message has been deleted
Message has been deleted

Philip Durbin

unread,
Mar 26, 2018, 8:37:23 AM3/26/18
to dataverse...@googlegroups.com
Hi Frank,

Well, it's usually a good idea to upgrade. I'm not sure if your message came through to everyone or not (the mailing list archive says some messages were deleted) but it's below. Certainly if you are seeing odd behavior on the latest release of Dataverse it is much easier for developers to troubleshoot it than older releases. This is why I often ask people to attempt to reproduce issues on https://demo.dataverse.org which runs the latest release. If something (anything) doesn't work on the demo server, please open an issue and mention that you tested it on the demo server. It's helpful to indicate which version the demo server is running (it appears in the bottom right) but if one of of notices the issues quickly enough, we can often capture this information in a comment.

Sorry to ramble on. I hope this helps,

Phil

On Mon, Mar 26, 2018 at 7:51 AM, <aussda....@gmail.com> wrote:
Hi Phil,
Thanks very much for taking a look at this. The files were all uploaded with their native extensions. We added another file late Friday, a CSV list of variable descriptions, and encountered the same problem. I should note though that this does not happen on our development server, which is 4.8.5. Thanks also for the link to the file label issue on github. We looked at the filemetadata table, and all the labels include file extensions, so that seems okay. Probably our best solution now would be to upgrade our dataverse installation to see if that helps. 

If there is anything we can contribute to figuring this out, just let me know. Best, Frank



On Friday, March 23, 2018 at 11:37:30 AM UTC+1, Philip Durbin wrote:
Hi Frank,

I just downloaded https://data.aussda.at/file.xhtml?fileId=183&version=1.0 from that dataset and you're right, the file was saved without an extension. Because the dropdown said "Original File Format (Tab-Delimited)" I expected the file to be a tab-delimited file, but the Unix `file` command thinks it's an SPSS file:

$ file 10007_da_de_v1_0
10007_da_de_v1_0: SPSS System File TICS 64-bit MS Windows 24.0.0.1

The file seems to have been successfully ingested (I downloaded the "Tab-Delimited" version that doesn't say "Original File Format").

Do you think the file was uploaded without an extension? That would explain why you don't see any extension while downloading the original file. If Dataverse guessed wrong about the file type, I'm surprised the file was successfully ingested but I'm not super familiar with this part of the code.

Maybe you could try uploading the file with various extensions or no extension to https://demo.dataverse.org to see what the behavior is there.

Phil

p.s. This is related: "This is how the filename is generated in the application; once the file is ingested, the stored file name has the ".tab" extension. For the stored original that extension is modified on the fly based on the original type saved in the database." https://github.com/IQSS/dataverse/issues/2734#issuecomment-310751970


On Fri, Mar 23, 2018 at 3:32 AM, <aussda....@gmail.com> wrote:
We're experiencing a strange problem with file downloads from our production Dataverse (4.6.2). For some reason if you try to download the "original file format" version of a dataset the file downloads successfully but without a file extension (e.g., .tab). This only happens with the original format option. For an example check out this dataset:


Is there a problem with our ingest workflow, or configuration settings, or maybe there is a bug of some kind? Thanks very much for the help. Best, Frank


--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse-community@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

aussda....@gmail.com

unread,
Mar 27, 2018, 4:55:57 AM3/27/18
to Dataverse Users Community
Hi Phil,
Thanks for the response. We're working on an dataverse upgrade and migration workflow which will hopefully take effect soon. 

Since I have you on the line, can I ask another related question about data downloads? Whenever we try to download a data file in RData format glassfish throws a 503 error. You can see this directly with the same dataset as the file extension problem: https://data.aussda.at/dataset.xhtml?persistentId=doi:10.11587/EHJHFJ. This seems like a configuration error on our end, but we can't quite figure out what we missed or did wrong during installation. Any suggestions?

Thanks for the help. Frank

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
To post to this group, send email to dataverse...@googlegroups.com.

Philip Durbin

unread,
Mar 27, 2018, 7:04:33 AM3/27/18
to dataverse...@googlegroups.com
I see what you mean. 503 error. Can you please email your Glassfish server.log file to sup...@dataverse.org? Please reproduce the problem first so that the error appears near the end of the log. Thanks!

Phil

To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsubscribe...@googlegroups.com.

To post to this group, send email to dataverse...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages