Hi Pedro,
https://github.com/IQSS/dataverse/issues/8954 was originally about slow ingest of SPSS files but more recently a comment was added about slow CSV (and Stata) ingest as well. You're welcome to check out those observations.
I think a dedicated issue about CSV ingest speed would be nice. If you feel like creating one, please go ahead.
Off the top of my head, I don't have any suggestions for how to speed it up apart from maybe trying to throw more hardware at the problem (more CPUs and more memory), but I don't know if this would make a difference or not. For Harvard Dataverse we've considered setting up a dedicated server for ingest:
https://github.com/IQSS/dataverse.harvard.edu/issues/111Certainly disabling ingest, as you say, is an option. This is what I sometimes do when I want to quickly load up sample data from
https://github.com/IQSS/dataverse-sample-data in a development environment. As you may know, you can always for ingest for a particular file later via API:
https://guides.dataverse.org/en/6.7.1/api/native-api.html#reingest-a-fileThe docs at
https://guides.dataverse.org/en/6.7.1/installation/config.html#tabularingestsizelimit are not particularly clear*, but out of the box there is no limit. That is, Dataverse will always try to ingest a file, no matter how large.
If you can somehow process your CSV files in the same way outside of Dataverse and can construct a DDI (XML) file to feed into Dataverse, you can upload it with this API:
https://guides.dataverse.org/en/6.7.1/api/native-api.html#editing-variable-level-metadata . This is a bit theoretical, but if we were to split off the ingest functionality of Dataverse into a separate service, I believe that service would call this API to send the summary statistics, etc. to Dataverse.
I hope this helps! Please keep the questions coming!
Thanks,
Phil
* Don't worry, the docs are being clarified in
https://github.com/IQSS/dataverse/pull/11654