Dear all,
I've recently run into a problem when uploading csv files to dataverse, and was redirected to this forum for discussion. Background: I tried to upload a csv file containing columns with natural language (text, tabs, html tags, any character really). My intention for uploading this to dataverse was to make the files available publicly.
Unfortunately, the indigestion process seemed to destroy my files (columns got broken up, possibly due to tab format conversion. The document viewer online was unable to correctly display the data, and when I downloaded the indigested data they were scrambled). The only way how to upload and preserve my data was to upload it in proprietary Excel format, triggering a failure of the indigestion process, but therefore also allowing the file to remain unchanged.
Is it possible, at all, to keep csv files in csv format? I guess I am not only asking for myself, but also for everyone who is doing natural language processing research who wants to share datasets similar to mine.
All the best,
Lena