Ingest of Stat Files Fail on UVa Dataverse (V5.11.1)

68 views
Skip to first unread message

Sherry Lake

unread,
Aug 14, 2023, 10:27:07 AM8/14/23
to Dataverse Users Community
Hello,

Someone brought this to my attention. Stata files uploaded to our Dataverse V5.11.1 are not being ingested. What is our repository missing?

I have these tabular settings:

":TabularIngestSizeLimit:xlsx": "0",   
":TabularIngestSizeLimit": "500000000",


I have attached the stata file. Here is what I get on our repository:

Screenshot 2023-08-14 at 10.10.28 AM.png

On Demo dataverse the file is ingested and I see this:
Screenshot 2023-08-14 at 10.09.11 AM.png

Thanks,
Sherry Lake
UVA Dataverse 


TCPA Class Settlement Data.dta

Leonid Andreev

unread,
Aug 16, 2023, 2:11:14 PM8/16/23
to Dataverse Users Community
Interesting. Having looked at the file (Phil gave me a link to the dataset), it is a Stata 14 file. That error message that is showing in your screenshot, however, is from the old Stata ingest plugin (for the pre-13 Stata format). Which in turn means that it's failing because Dataverse is trying to read it as the wrong format. Which is very very strange...
Was this file uploaded via direct upload by any chance? 

Leonid Andreev

unread,
Aug 16, 2023, 2:36:32 PM8/16/23
to Dataverse Users Community
... As a quick hack/workaround, I'm pretty sure that if you change the mime type from "application/x-stata" to "application/x-stata-14" for the file in the database, and then reingest it, it'll just work. 

Leonid Andreev

unread,
Aug 16, 2023, 2:51:46 PM8/16/23
to Dataverse Users Community
Oh, I for some reason assumed  these were all recent uploads. But the file I was looking at ("FCRA Class Settlement Data.dta") was uploaded back in 2015. Long before direct upload. Also, I'm pretty sure we simply did not support Stata 14 back then - hence the error message. 
But reingesting a file like the above should work, now that you are running V5.11.1.

Sherry Lake

unread,
Aug 16, 2023, 3:03:18 PM8/16/23
to dataverse...@googlegroups.com
Thanks Leonid, 

I just checked in Harvard Dataverse for Stata Binary files and found this one, w/ date May 2023, did not get ingested:
https://dataverse.harvard.edu/file.xhtml?fileId=7102778&version=1.0

I downloaded that file "stata.dta" from Harvard Dataverse, and uploaded it to demo.dataverse and it was ingested:
https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/HUUQPF

Just a strange problem.

I won't worry too, much but the Stata Tabular ingest docs make it seem "stata" is the best supporting format for ingesting - https://guides.dataverse.org/en/latest/user/tabulardataingest/stata.html

--
Sherry.



--
You received this message because you are subscribed to a topic in the Google Groups "Dataverse Users Community" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dataverse-community/EYPH_yPpDkI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/d5f64a33-7191-4819-8bc6-0c345449da32n%40googlegroups.com.

Leonid Andreev

unread,
Aug 16, 2023, 3:28:08 PM8/16/23
to Dataverse Users Community
This one (survey.dta on our site) IS in fact a result/side effect of direct upload. During direct upload, Dataverse never sees the contents of the file, so it doesn't get a chance to identify the format using our normal tests. All it has to work with is the filename extension (.dta); which unfortunately is the same for the pre-13 and 13+ Stata formats, which are completely different. It appears that it defaults to the "old" format in this situation, and then it fails. This is a bug of course - we can be more smart about it. I will open an issue. 
(To reiterate, this is only an issue for instances using S3 for storage, and only in collections and/or datasets where direct upload is enabled). 

Message has been deleted

Sherry Lake

unread,
Aug 17, 2023, 7:59:21 AM8/17/23
to Dataverse Users Community
Thanks, Leonid.

Totally understand!

For those who want to correctly ingest Stata (.dta) files, here's what I did:

As Leonid, said stata files going through direct upload (S3), the mime type is not detected and thus not ingested.

So to get STATA files "ingestable", I had to get the Dataverse software to re-detect the mime type and then re-ingest the file.

Commands from (since UVA's version of Dataverse software is V5.11.1):
https://guides.dataverse.org/en/5.11.1/api/native-api.html#redetect-file-type
https://guides.dataverse.org/en/5.11.1/api/native-api.html#reingest-a-file

-------------------

## update with your API TOKEN, Server URL, and the file ID
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export FILE_ID=24

## Just to make sure the mime type is detected, in my case it did:  "application/x-stata-14"
## The dryRun parameter just tells you what it would do

curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$FILE_ID/redetect?dryRun=true"


## since the mime type was correctly detected, re-ran the command (without "dryRun") to make the mime change permanent

curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$FILE_ID/redetect


## now that the mime type is correct on the file - in the database, do a re-ingest

curl -H "X-Dataverse-key:$API_TOKEN" -X POST $SERVER_URL/api/files/$FILE_ID/reingest

Reply all
Reply to author
Forward
0 new messages