Trying to help a new user get started with PLINK*, on a Windows machine. We downloaded the 1KG fileset listed in the "Resources" section. It describes how PLINK2 provides a --zst-decompress convenience feature for those who don't have a zst decompressor. We used that, which appeared to work. But then when we tried to actually use the file, the pgen is declared not valid (see log below).
Upon inspection, I noticed that the decompressed pgen's filesize was approximately double the size of the same decompression operation done on a Linux box. So I moved the pgen aside (added '.plink2' suffix), downloaded the standalone zstd.exe tool, and re-executed the decompress with that. Now the decompressed filesize is as expected, and the subsequent command now works.
Maybe there's some Windows-specific requirement I'm missing (it's a foreign land to me), or maybe there's a windows-specific issue with the --zst-decompress?
Any info much appreciated. And thanks for PLINK* - great tools.
Here are the filesizes of the decompressed pgen via zstd and plink2 --zst-decompress:
-a---- 6/20/2024 2:09 PM 9530881334 all_hg38.pgen
-a---- 6/20/2024 4:27 PM 19309023026 all_hg38.pgen.plink2
PLINK v2.00a5.11 64-bit (26 May 2024) www.cog-genomics.org/plink/2.0/
(C) 2005-2024 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink2.log.
Options in effect:
--allow-extra-chr
--chr 1-22,X,Y
--freq
--memory 6000
--missing
--pfile all_hg38
Start time: Thu Jun 20 18:10:33 2024
8065 MiB RAM detected; reserving 6000 MiB for main workspace.
Using up to 4 compute threads.
3202 samples (1603 females, 1599 males; 2583 founders) loaded from
all_hg38.psam.
73627150 out of 75193455 variants loaded from all_hg38.pvar.
Error: all_hg38.pgen is not a .pgen file (first two bytes don't match the magic
number).
End time: Thu Jun 20 18:27:58 2024