File integrity validation with md5sum?

10 views
Skip to first unread message

shihch...@gmail.com

unread,
Jul 12, 2021, 8:26:16 PM7/12/21
to plink2-users
Hi Christopher, 

For UKB-WGS data, the files are very large. When we transfer the plink format files (bim/bed/fam), what's the best way to make sure the downloaded data are exactly same with the source files? Is there any better way compared with md5sum validation? 

Thanks. 

Shicheng

Christopher Chang

unread,
Jul 12, 2021, 8:33:16 PM7/12/21
to plink2-users
Truncation will generally be obvious, as well as corruption of the .bim and .fam text files.  However, something like md5sum is your best choice for detecting non-truncating .bed corruption that occurs after the first 3 bytes; unfortunately, such a .bed file is still valid, especially if the number of samples is a multiple of 4.

(Non-truncating .pgen corruption is overwhelmingly likely to make the file invalid.)
Reply all
Reply to author
Forward
0 new messages