Upgrading to Dataverse 4.20 : Potential Data integrity issue on tabular file #6510

Skip to first unread message

Michel Bamouni

Jul 22, 2020, 5:06:55 AM7/22/20
to Dataverse Users Community


Before our migration to Dataverse v4.20, we checked our production database
with the script "check_datafiles_6522_6510.sh", and we had the below result
$ ./check_datafiles_6522_6510.sh
Checking the number of non-harvested datafiles in the database...
8494 total.

Let's check if any storage identifiers are referenced more than once within
the same dataset:

Good news - it appears that there are NO duplicate DataFile objects in your
Your installation is ready to be upgraded to Dataverse 4.20.

Checking the number of ingested ("tabular") datafiles in the database...
1058 total.

Let's check if any of these ingested files have MORE THAN ONE linked
datatable objects:
The following 1 DataFile ids appear to be referenced from multiple
(output saved in /tmp/datafileids.tmp)
Looking up details for the affected tabular files:
10.15454 FOXFFG s3://prod-datainra:16b45dbf2cc-073b1d184465 86468 540
2019-06-11 11:38:47.408 text/tab-separated-values text/tsv
10.15454 FOXFFG s3://prod-datainra:16b45dbf2cc-073b1d184465 86468 546
2019-06-11 11:38:47.408 text/tab-separated-values text/tab-separated-values
(output saved in /tmp/multiple_ingests_info.tmp)

Please send the output above to Dataverse support .
We will assist you in fixing this issue in your Dataverse database.
We apologize for any inconvenience.

As you can see we have a duplicate data on one datafile. We contact you as
recommended. You will see on attachment the 2 files produced by the script.

Best regards,

Philip Durbin

Jul 22, 2020, 10:01:01 AM7/22/20
to dataverse...@googlegroups.com
Hi Michel,

It looks like a colleague of yours opened a ticket about this at https://help.hmdc.harvard.edu/Ticket/Display.html?id=292301

I just added your email to it so that you'll get the reply.



You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-commu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/bb0e0ad1-197e-4c0b-8957-e582adf8ec87o%40googlegroups.com.


Michel Bamouni

Jul 23, 2020, 4:51:03 AM7/23/20
to Dataverse Users Community
Hi Phil,

You right, my colleague has send a message to the dataverse support.

I will wait the support response.

best regards
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-community+unsub...@googlegroups.com.
Reply all
Reply to author
0 new messages