Error: .pgen file read failure: File appears to be corrupted.
Please see full log below. Note that this log does correctly report the loading of an all-male cohort. Also I tried --impute-sex for good measure and got the same error.
4. Verified that the fileset is able to be processed by other commands. I ran --freq and an maf filter --maf 0.1 for good measure. Both of these alongside --make-pgen ran successfully, please see the log below.
Any insights into the potential cause of the error would be appreciated!
Step 2 log:
(file paths redacted from the log):
PLINK v2.0.0-a.6.4LM 64-bit Intel (6 Dec 2024)
Options in effect:
--chr X,Y
--make-pgen
--out ZZPRGPLA-ZZPRGPDM_sex-check
--split-par b38
--update-sex ZZPRGPLA-ZZPRGPDM_sex-check_labeled-sex.update-sex.txt
--vcf /path/to/vcf
--vcf: ZZPRGPLA-ZZPRGPDM_sex-check-temporary.pgen +
ZZPRGPLA-ZZPRGPDM_sex-check-temporary.pvar.zst +
ZZPRGPLA-ZZPRGPDM_sex-check-temporary.psam written.
961 samples (0 females, 0 males, 961 ambiguous; 961 founders) loaded from
ZZPRGPLA-ZZPRGPDM_sex-check-temporary.psam.
--split-par: 4428 chromosome codes changed.
334810 out of 339238 variants loaded from
ZZPRGPLA-ZZPRGPDM_sex-check-temporary.pvar.zst.
Note: No phenotype data present.
--update-sex: 961 samples updated.
Writing ZZPRGPLA-ZZPRGPDM_sex-check.psam ... done.
Writing ZZPRGPLA-ZZPRGPDM_sex-check.pvar ... done.
Writing ZZPRGPLA-ZZPRGPDM_sex-check.pgen ... done.
PLINK v2.0.0-a.6.4LM 64-bit Intel (6 Dec 2024)
Options in effect:
--check-sex
--out ZZPRGPLA-ZZPRGPDM_sex-check_sex-stats
--pfile ZZPRGPLA-ZZPRGPDM_sex-check
Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and
max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you
should look at the distributions of xf and yrate in the .checksex output file,
and then rerun --check-sex with data-derived thresholds.
Random number seed: 1734338117
3923 MiB RAM detected, ~1627 available; reserving 1563 MiB for main workspace.
Using up to 2 compute threads.
961 samples (0 females, 961 males; 961 founders) loaded from
ZZPRGPLA-ZZPRGPDM_sex-check.psam.
334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.
Note: No phenotype data present.
Calculating allele frequencies... done.
--check-sex chrX:
Error: .pgen file read failure: File appears to be corrupted.
PLINK v2.0.0-a.6.4LM 64-bit Intel (6 Dec 2024)
Options in effect:
--maf 0.1
--make-pgen
--out test-maf
--pfile ZZPRGPLA-ZZPRGPDM_sex-check
Random number seed: 1734338241
3923 MiB RAM detected, ~1588 available; reserving 1524 MiB for main workspace.
Using up to 2 compute threads.
961 samples (0 females, 961 males; 961 founders) loaded from
ZZPRGPLA-ZZPRGPDM_sex-check.psam.
334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.
Note: No phenotype data present.
Calculating allele frequencies... done.
334065 variants removed due to allele frequency threshold(s)
(--maf/--max-maf/--mac/--max-mac).
745 variants remaining after main filters.
Writing test-maf.psam ... done.
Writing test-maf.pvar ... done.
Writing test-maf.pgen ... done.
--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/1e07508b-8f63-4d3b-ad46-63a621ba755an%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/9ac8f6b0-8d25-4560-b839-09675bbbd230n%40googlegroups.com.
PLINK v2.0.0-a.6.4.aLM 64-bit Intel (16 Dec 2024)
Options in effect:
--check-sex
--debug
--out debug-01-check-sex
--pfile ZZPRGPLA-ZZPRGPDM_sex-check
Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and
max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you
should look at the distributions of xf and yrate in the .checksex output file,
and then rerun --check-sex with data-derived thresholds.
Random number seed: 1734406407
3923 MiB RAM detected, ~1612 available; reserving 1548 MiB for main workspace.
Using up to 2 compute threads.
961 samples (0 females, 961 males; 961 founders) loaded from
ZZPRGPLA-ZZPRGPDM_sex-check.psam.
334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.
Note: No phenotype data present.
Calculating allele frequencies... done.
--check-sex chrX:
Error: .pgen file read failure: File appears to be corrupted.
[pgl] PgfiMultiread() called with variant_uidx_start=0, variant_uidx_end=65536, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 449192 byte(s) from 1004490.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=65536, variant_uidx_end=131072, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 423196 byte(s) from 1453682.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=131072, variant_uidx_end=196608, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 507291 byte(s) from 1876878.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=196608, variant_uidx_end=262144, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 460534 byte(s) from 2384169.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=262144, variant_uidx_end=327680, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 390753 byte(s) from 2844703.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=327680, variant_uidx_end=334810, load_variant_ct=7130
[pgl] PgfiMultiread() attempting to read 92609 byte(s) from 3235456.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=0, variant_uidx_end=65536, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 449192 byte(s) from 1004490.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=65536, variant_uidx_end=131072, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 423196 byte(s) from 1453682.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=131072, variant_uidx_end=196608, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 507291 byte(s) from 1876878.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=196608, variant_uidx_end=262144, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 460534 byte(s) from 2384169.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=262144, variant_uidx_end=327680, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 390753 byte(s) from 2844703.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=327680, variant_uidx_end=334810, load_variant_ct=25
[pgl] PgfiMultiread() attempting to read 301 byte(s) from 3235456.
[pgl] PgfiMultiread() attempting to read 18446744073706223551 byte(s) from 3328065.
[pgl] PgfiMultiread() fread(3328065, 18446744073706223551) failed, block_offset=3235456.
PLINK v2.0.0-a.6.4.bLM 64-bit Intel (17 Dec 2024)
Options in effect:
--check-sex
--debug
--out debug-02-check-sex
--pfile ZZPRGPLA-ZZPRGPDM_sex-check
Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and
max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you
should look at the distributions of xf and yrate in the .checksex output file,
and then rerun --check-sex with data-derived thresholds.
Random number seed: 1734457784
3923 MiB RAM detected, ~330 available; reserving 640 MiB for main workspace.
Using up to 2 compute threads.
961 samples (0 females, 961 males; 961 founders) loaded from
ZZPRGPLA-ZZPRGPDM_sex-check.psam.
334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.
Note: No phenotype data present.
Calculating allele frequencies... done.
CheckOrImputeSex(): x_start=0 x_end=327694 used_variant_ct_x=327694 PopcountBitRange: 327694
--check-sex chrX:
Error: .pgen file read failure: File appears to be corrupted.
[pgl] PgfiMultiread() called with variant_uidx_start=0, variant_uidx_end=65536,
load_variant_ct=65536
[pgl] PgfiMultiread(): PopcountBitRange(0, 65536)=65536
[pgl] PgfiMultiread() advanced variant_uidx_start to 0
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=65536,
variant_uidx_start=0, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 449192 byte(s) from 1004490.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=65536,
variant_uidx_end=131072, load_variant_ct=65536
[pgl] PgfiMultiread(): PopcountBitRange(65536, 131072)=65536
[pgl] PgfiMultiread() advanced variant_uidx_start to 65536
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=131072,
variant_uidx_start=65536, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 423196 byte(s) from 1453682.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=131072,
variant_uidx_end=196608, load_variant_ct=65536
[pgl] PgfiMultiread(): PopcountBitRange(131072, 196608)=65536
[pgl] PgfiMultiread() advanced variant_uidx_start to 131072
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=196608,
variant_uidx_start=131072, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 507291 byte(s) from 1876878.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=196608,
variant_uidx_end=262144, load_variant_ct=65536
[pgl] PgfiMultiread(): PopcountBitRange(196608, 262144)=65536
[pgl] PgfiMultiread() advanced variant_uidx_start to 196608
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=262144,
variant_uidx_start=196608, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 460534 byte(s) from 2384169.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=262144,
variant_uidx_end=327680, load_variant_ct=65536
[pgl] PgfiMultiread(): PopcountBitRange(262144, 327680)=65536
[pgl] PgfiMultiread() advanced variant_uidx_start to 262144
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=327680,
variant_uidx_start=262144, load_variant_ct=65536
[pgl] PgfiMultiread() attempting to read 390753 byte(s) from 2844703.
[pgl] PgfiMultiread() returned success
[pgl] PgfiMultiread() called with variant_uidx_start=327680,
variant_uidx_end=334810, load_variant_ct=25
[pgl] PgfiMultiread(): PopcountBitRange(327680, 334810)=14
[pgl] PgfiMultiread() advanced variant_uidx_start to 327680
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=334810,
variant_uidx_start=327680, load_variant_ct=25
[pgl] PgfiMultiread(): cur_read_uidx_end updated to 327694,
cur_read_end_fpos=3235757, load_variant_ct updated to 11
[pgl] PgfiMultiread(): variant_uidx_start updated to 334810,
next_read_start_fpos=3328065
[pgl] PgfiMultiread() attempting to read 301 byte(s) from 3235456.
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=334810,
variant_uidx_start=334810, load_variant_ct=11
[pgl] PgfiMultiread(): cur_read_uidx_end updated to 334811,
cur_read_end_fpos=0, load_variant_ct updated to 10
[pgl] PgfiMultiread(): variant_uidx_start updated to 334817,
next_read_start_fpos=4607182418800017408
[pgl] PgfiMultiread() attempting to read 18446744073706223551 byte(s) from
3328065.
[pgl] PgfiMultiread() fread(3328065, 18446744073706223551) failed,
block_offset=3235456.
PLINK v2.0.0-a.6.4.cLM 64-bit Intel (17 Dec 2024)
Options in effect:
--check-sex
--debug
--out debug-03-check-sex
--pfile ZZPRGPLA-ZZPRGPDM_sex-check
Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and
max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you
should look at the distributions of xf and yrate in the .checksex output file,
and then rerun --check-sex with data-derived thresholds.
Random number seed: 1734470208
3923 MiB RAM detected, ~504 available; reserving 640 MiB for main workspace.
Using up to 2 compute threads.
961 samples (0 females, 961 males; 961 founders) loaded from
ZZPRGPLA-ZZPRGPDM_sex-check.psam.
334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.
Note: No phenotype data present.
Calculating allele frequencies... done.
CheckOrImputeSex(): x_start=0 x_end=327694 used_variant_ct_x=327694 PopcountBitRange: 327694
--check-sex chrX: done.
Warning: 323287 variants skipped because they were monomorphic. You may want to
use --read-freq to provide more accurate allele frequency estimates.
variant_uidx_end=334810, load_variant_ct=14
[pgl] PgfiMultiread(): PopcountBitRange(327680, 334810)=14
[pgl] PgfiMultiread() advanced variant_uidx_start to 327680
[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=334810,
variant_uidx_start=327680, load_variant_ct=14
[pgl] PgfiMultiread(): cur_read_uidx_end updated to 327694,
cur_read_end_fpos=3235757, load_variant_ct updated to 0
[pgl] PgfiMultiread() attempting to read 301 byte(s) from 3235456.
[pgl] PgfiMultiread() returned success
--check-sex: Calculating chrY valid genotype call rates... done.
--check-sex: 327694 chrX variants and 7116 variants scanned, 961 problems
detected. Report written to debug-03-check-sex.sexcheck .