Upload aborted error

245 views
Skip to first unread message

Gabriele Hayden

unread,
Oct 8, 2020, 7:22:25 PM10/8/20
to Dataverse Users Community
Hi,
I'm working on a deposit for a research group, and when I try to upload a csv from them I get the error message "Upload Completed with Errors
Tabular data ingest failed. Ingest succeeded, but failed to save the ingested tabular data in the database: Transaction aborted"

It's unclear to me whether this is being triggered by a problem with the spreadsheet data or something else. When I open the csv in Excel, I see name errors in one column, presumably because the strings in this column are preceded by a negative sign (for no reason???). I'm tempted to go back and ask them about this/ask them to fix it, but I want to check to see if others think this is the likely source of the error or if another issue is more likely. The csv has many blank cells, but my understanding is that this should not be an issue.
Many thanks,
Gabriele


danny...@g.harvard.edu

unread,
Oct 8, 2020, 10:07:40 PM10/8/20
to Dataverse Users Community
Hi Gabriele, can you let us know to which Dataverse installation you were trying to upload the file? Was it dataverse.harvard.edu? Also, is it possible for you to share the file here? 

Thanks,

Danny

Gabriele Hayden

unread,
Oct 9, 2020, 12:48:47 PM10/9/20
to Dataverse Users Community
Hi Danny,
Yes, sorry, it was dataverse.harvard.edu. The file is attached. Thanks!
EIS_Database_JFSP16-3-01-10.csv

Philip Durbin

unread,
Oct 9, 2020, 1:29:47 PM10/9/20
to dataverse...@googlegroups.com
Hi Gabriele,

The error I'm getting when I try to ingest this CSV into Dataverse is this...

Tabular data ingest failed. The header contains a duplicate name: "Threshold"

... followed by a long list of columns (please see the attached screenshot). It does seem like "Threshold" appears twice* so I think you need to rename one of them.

I hope this helps,

Phil

* There are also two variations on Threshold that are unique (so they're fine):

$ head -1 EIS_Database_JFSP16-3-01-10.csv | tr , "\n" | grep Threshold
Threshold
Threshold_SES
Threshold_text
Threshold

--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/57ebeeb7-c6c1-43bc-b371-e53ab2a840acn%40googlegroups.com.


--
Screen Shot 2020-10-09 at 1.21.30 PM.png

Gabriele Hayden

unread,
Oct 9, 2020, 2:24:22 PM10/9/20
to Dataverse Users Community
Very helpful--thanks! I tried the ingest a number of times and only got the generic error, so I appreciate your taking the time,
Gabriele

Gabriele Hayden

unread,
Oct 14, 2020, 8:38:20 PM10/14/20
to dataverse...@googlegroups.com
Hi Phil,
Many thanks for your assistance! We addressed the issue you helped with and appear to have another, as the csv no longer has "Threshold" as a duplicate header, but presumably has some other problem/is not tidy data. It's not my data set (I'm just handling ingest, as an administrator of the UOregon Scholars' Bank: Data Dataverse within Harvard Dataverse). Just like with the last time, I'm only seeing this very general error message, but in the last instance you were able to see a more informative error message than what I saw. Is there any way that I can have access to the better error messages? Or do you have a comprehensive list of what might cause an ingest failure? I don't want to have to keep bothering you to pass on the more detailed error messages--I also don't want to have to ask the research group that owns this csv file to clean up their data to my standard of tidy/database ready data, since who knows if my standards are similar enough to what Dataverse has in place. Below is a screen shot of the new error message, and the new csv is attached. Any advice would be greatly appreciated,
Gabriele
Librarian for Research Data Management and Reproducibility
University of Oregon Libraries
image.png


You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/c-EbPI1HJ7Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/1f52f0b2-e1c3-425b-b0ef-6260ec378778n%40googlegroups.com.
EIS_Database_JFSP16-3-01-10.csv

Philip Durbin

unread,
Oct 15, 2020, 10:31:25 AM10/15/20
to dataverse...@googlegroups.com
Hi Gabriele,

This time it's saying that "Region" is duplicated. Screenshot attached.

I don't *think* there's anything magic on my laptop (where I'm testing your files) compared to Harvard Dataverse, where you're uploading your files. They're both running more or less Dataverse 5.1.1. I'm not aware of extra debugging in certain circumstances. I even tried looking at the dataset as a lowly contributor rather than a superuser but it looks the same, showing the duplicate fields.

Here's what I'd suggest (in order):

- Try it on https://demo.dataverse.org to see if it's different.
- Email sup...@dataverse.harvard.edu to see if they have any suggestions.
- Open an issue at https://github.com/IQSS/dataverse/issues explaining how you are seeing a less informative error message.

Meanwhile, hopefully someone on this list can solve this mystery!

Phil


Screen Shot 2020-10-15 at 10.19.01 AM.png

leo...@g.harvard.edu

unread,
Oct 15, 2020, 10:54:51 AM10/15/20
to Dataverse Users Community

Could you please post a link to the actual dataset on our server (sorry if it's already posted, but I'm missing it...)
As Phil pointed out, the CSV file above fails to ingest right away, because of an inconsistency in the header... Which suggests to me that it was not the exact same file. (The message "... succeeded, but failed to save" suggests that it went much further). 
Best,
-Leonid

Gabriele Hayden

unread,
Oct 15, 2020, 9:08:31 PM10/15/20
to dataverse...@googlegroups.com
Hi Leonid,
Thanks for looking into this! Here's the link to the private URL: https://dataverse.harvard.edu/privateurl.xhtml?token=d199ee3a-4c5a-4f96-be6d-b65c15fbb709 I haven't wanted to publish until the issues with it were resolved.
Many thanks,
Gabriele

Sebastian Karcher

unread,
Oct 15, 2020, 9:17:11 PM10/15/20
to dataverse...@googlegroups.com
I don't want to derail this thread, but we're also seeing this on occasion and the lack of useful feedback in the GUI is a bit of a hassle.
We have some experience now of what typically causes issues and can always check the server log, but that's... rather complicated. Would it be possible (and would people consider it worthwhile) to work on somewhat better user-facing errors?



--
Sebastian Karcher, PhD
www.sebastiankarcher.com

danny...@g.harvard.edu

unread,
Oct 16, 2020, 10:39:16 AM10/16/20
to Dataverse Users Community
Hi Gabriele, can you send a ticket to sup...@dataverse.harvard.edu so we can investigate this in a bit more detail?

Sebastian, there's been some interest in providing more robust messaging in regards in ingest errors, but we haven't been able to prioritize: https://github.com/IQSS/dataverse/issues/3769

Thanks both,

Danny

Gabriele Hayden

unread,
Oct 16, 2020, 12:52:59 PM10/16/20
to dataverse...@googlegroups.com
Hi Danny,
Thanks--I will send a ticket in. I started by sending in a support ticket but got an undeliverable message that made me assume--falsely--that there was something up with the support email (reason: 553 Exception during HTTP GET to '/v4/domains/dataverse.harvard.edu/internal' - HTTP GET to '/v4/domains/dataverse.harvard.edu/internal' returned '500' body='Internal Server Error''). It was only after that that I posted here. This time I'll send in my first message without the csv, in case that's the problem.

I would agree with Sebastian that more informative error messages would be fantastic. Evidently some work was done on my specific error message in 2017 (https://github.com/IQSS/dataverse/issues/4250). Ultimately, I understand the need to prioritize, and now that I know I can use the support ticket system it seems less urgent.

Thanks again,
Gabriele

pigins...@gmail.com

unread,
Oct 16, 2020, 5:36:40 PM10/16/20
to Dataverse Users Community
Hi Gabriele,
The CSV file currently in the dataset is very different from the one posted above; but both have a non-unique column names in the header. This alone would prevent getting the file ingested as tabular data. So I would try fixing that (for example by replacing that second "Region" with "Region2"); you would need to delete the current version of the file from the draft, then upload a fixed version. I just checked: this appears to be enough to make the copy currently in your dataset ingestable; as for the version posted here, it has another duplicate variable in the header - "Threshold", that needs to be similarly fixed).

Having said all that, just to make sure: ingest errors like these are not fatal. You can still publish a dataset. That error message, and the red failure icons are only visible to you, the author, but not to any end users looking at the published dataset. When we fail to process a .CSV (or Stata, SPSS, etc.) file as "tabular data", it just means that Dataverse has failed to produce some extra metadata, that describes the variables in the file. So then we can't offer extra "data explore" view on that file. But end users would be able to download it, open it in excel, etc. (There's a good chance you know this - just making sure! Generally it's a good idea to try and get these extra metadata entries produced...)

Hope this helps,
-Leonid

pigins...@gmail.com

unread,
Oct 16, 2020, 5:45:41 PM10/16/20
to Dataverse Users Community
Hi Sebastian,
This is definitely a valid request, and there is definitely room for improvement there. 
We do try to provide as much information as we can; and we add meaningful messages when we can - such as when the file fails to parse for some obvious reason, such as an inconsistency in the CSV header, or wrong numbers of CSV fields in it, etc. A message "... was successful, but failed to save" usually means just that - that WE (the Dataverse code) did not find anything wrong with the file and were able to parse it; but then the database threw an error when saving the resulting metadata... It's a little more difficult to extract the exact cause in such a case (in my experience this often means that there is some invalid, non-UTF sequence of binary characters in some value that we are trying to save...). But I agree, we should revisit this, and try to improve it. As Danny has already mentioned, we have an open issue, and we'll hopefully get to it eventually.

(But I will admit that I DON'T understand how the file in this thread ended up with that error message. Because, once again, it's supposed to be/appears to be failing right away, way before we attempt to save anything in the db... Well, another weird thing to investigate)

-Leonid
On Thursday, October 15, 2020 at 9:17:11 PM UTC-4 sebastiank...@u.northwestern.edu wrote:

leo...@g.harvard.edu

unread,
Oct 16, 2020, 5:59:25 PM10/16/20
to Dataverse Users Community

P.S. Hi Gabriele, I obviously didn't read the entire thread super carefully - I'm seeing now that the copy of the file with the "Threshold" in it was an older version that you have since fixed. But yes, the only remaining problem with the file, as currently uploaded, is the extra "Region" in the header line. Why you were not getting a clear error message explaining that when uploading the file - I still don't know, will need to investigate. 

Gabriele Hayden

unread,
Oct 20, 2020, 7:42:11 PM10/20/20
to dataverse...@googlegroups.com
Belated thanks Leonid for both of your messages! I will pass on to the user the issue with the second duplicate header.

And thanks for looking into the issue of the vague error message. I'll note that Phil Durbin earlier in this thread got a much more helpful error message explaining the duplicate header when he tried uploading as a test. Not sure which installation of Dataverse he was looking at, but perhaps if it wasn't Harvard's production version that would be a clue?

Thanks again for all your help,
Gabriele

Reply all
Reply to author
Forward
0 new messages