Error: .pgen file read failure when using --check-sex

90 views
Skip to first unread message

nzeltser

unread,
Dec 16, 2024, 4:16:47 AM12/16/24
to plink2-users
Hello,

I'm encountering a puzzling .pgen file read failure error when attempting to run a sanity check --check-sex analysis.

I am using the Dec 6 plink2 version: PLINK v2.0.0-a.6.4LM 64-bit Intel (6 Dec 2024)

I have performed the following steps:

1. Attempted to import my VCF with no prior sex annotations. This results in an import error asking for sex information and in a written temporary .psam. I then use the temporary.psam to format a new file with sex annotations for a future --update-sex call.
Note: this VCF was created by previously merging (bcftools) two regenotyped cohorts of NGS data, and retains homozygous reference (non-variant) sites.

2. Re-imported my VCF with --update-sex, providing the new sex annotations. (also --split-par --chr X,Y) This results in a successfully written and updated fileset. See successfully logged output below.

3. Ran --check-sex on the updated fileset. This results in an error:

Error: .pgen file read failure: File appears to be corrupted.

Please see full log below. Note that this log does correctly report the loading of an all-male cohort. Also I tried --impute-sex for good measure and got the same error.

4. Verified that the fileset is able to be processed by other commands. I ran --freq and an maf filter --maf 0.1 for good measure. Both of these alongside --make-pgen ran successfully, please see the log below.

Any insights into the potential cause of the error would be appreciated!

Step 2 log:

(file paths redacted from the log):

PLINK v2.0.0-a.6.4LM 64-bit Intel (6 Dec 2024)

Options in effect:

  --chr X,Y

  --make-pgen

  --out ZZPRGPLA-ZZPRGPDM_sex-check

  --split-par b38

  --update-sex ZZPRGPLA-ZZPRGPDM_sex-check_labeled-sex.update-sex.txt

  --vcf /path/to/vcf


--vcf: ZZPRGPLA-ZZPRGPDM_sex-check-temporary.pgen +

ZZPRGPLA-ZZPRGPDM_sex-check-temporary.pvar.zst +

ZZPRGPLA-ZZPRGPDM_sex-check-temporary.psam written.

961 samples (0 females, 0 males, 961 ambiguous; 961 founders) loaded from

ZZPRGPLA-ZZPRGPDM_sex-check-temporary.psam.

--split-par: 4428 chromosome codes changed.

334810 out of 339238 variants loaded from

ZZPRGPLA-ZZPRGPDM_sex-check-temporary.pvar.zst.

Note: No phenotype data present.

--update-sex: 961 samples updated.

Writing ZZPRGPLA-ZZPRGPDM_sex-check.psam ... done.

Writing ZZPRGPLA-ZZPRGPDM_sex-check.pvar ... done.

Writing ZZPRGPLA-ZZPRGPDM_sex-check.pgen ... done.


Step 3 log:

PLINK v2.0.0-a.6.4LM 64-bit Intel (6 Dec 2024)

Options in effect:

  --check-sex

  --out ZZPRGPLA-ZZPRGPDM_sex-check_sex-stats

  --pfile ZZPRGPLA-ZZPRGPDM_sex-check

Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and

max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you

should look at the distributions of xf and yrate in the .checksex output file,

and then rerun --check-sex with data-derived thresholds.

Random number seed: 1734338117

3923 MiB RAM detected, ~1627 available; reserving 1563 MiB for main workspace.

Using up to 2 compute threads.

961 samples (0 females, 961 males; 961 founders) loaded from

ZZPRGPLA-ZZPRGPDM_sex-check.psam.

334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.

Note: No phenotype data present.

Calculating allele frequencies... done.

--check-sex chrX: 

Error: .pgen file read failure: File appears to be corrupted.

Step 4 log:

PLINK v2.0.0-a.6.4LM 64-bit Intel (6 Dec 2024)

Options in effect:

  --maf 0.1

  --make-pgen

  --out test-maf

  --pfile ZZPRGPLA-ZZPRGPDM_sex-check

Random number seed: 1734338241

3923 MiB RAM detected, ~1588 available; reserving 1524 MiB for main workspace.

Using up to 2 compute threads.

961 samples (0 females, 961 males; 961 founders) loaded from

ZZPRGPLA-ZZPRGPDM_sex-check.psam.

334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.

Note: No phenotype data present.

Calculating allele frequencies... done.

334065 variants removed due to allele frequency threshold(s)

(--maf/--max-maf/--mac/--max-mac).

745 variants remaining after main filters.

Writing test-maf.psam ... done.

Writing test-maf.pvar ... done.

Writing test-maf.pgen ... done.


Chris Chang

unread,
Dec 16, 2024, 7:03:08 AM12/16/24
to nzeltser, plink2-users
Hi, is it possible for you to provide a set of commands and files (could have fewer samples/variants, etc.) that I can use to replicate the failure?  If not, I can send you a sequence of debug builds.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/1e07508b-8f63-4d3b-ad46-63a621ba755an%40googlegroups.com.

nzeltser

unread,
Dec 16, 2024, 1:16:19 PM12/16/24
to plink2-users
Thank you for your quick response!
Unfortunately I am not able to share any of our data, however I'm happy to assist with debugging with the debug builds.

-Nicole

Chris Chang

unread,
Dec 16, 2024, 2:51:42 PM12/16/24
to nzeltser, plink2-users
Ok, first debug build has been posted to https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20241216a.zip (or can be built from source from GitHub); try running a failing command with it, with the --debug flag added.

nzeltser

unread,
Dec 16, 2024, 10:36:17 PM12/16/24
to plink2-users
Got it, here is the log:

PLINK v2.0.0-a.6.4.aLM 64-bit Intel (16 Dec 2024)

Options in effect:

  --check-sex

  --debug

  --out debug-01-check-sex

  --pfile ZZPRGPLA-ZZPRGPDM_sex-check

Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and

max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you

should look at the distributions of xf and yrate in the .checksex output file,

and then rerun --check-sex with data-derived thresholds.

Random number seed: 1734406407

3923 MiB RAM detected, ~1612 available; reserving 1548 MiB for main workspace.

Using up to 2 compute threads.

961 samples (0 females, 961 males; 961 founders) loaded from

ZZPRGPLA-ZZPRGPDM_sex-check.psam.

334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.

Note: No phenotype data present.

Calculating allele frequencies... done.

--check-sex chrX: 

Error: .pgen file read failure: File appears to be corrupted.

[pgl] PgfiMultiread() called with variant_uidx_start=0, variant_uidx_end=65536, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 449192 byte(s) from 1004490.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=65536, variant_uidx_end=131072, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 423196 byte(s) from 1453682.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=131072, variant_uidx_end=196608, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 507291 byte(s) from 1876878.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=196608, variant_uidx_end=262144, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 460534 byte(s) from 2384169.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=262144, variant_uidx_end=327680, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 390753 byte(s) from 2844703.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=327680, variant_uidx_end=334810, load_variant_ct=7130

[pgl] PgfiMultiread() attempting to read 92609 byte(s) from 3235456.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=0, variant_uidx_end=65536, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 449192 byte(s) from 1004490.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=65536, variant_uidx_end=131072, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 423196 byte(s) from 1453682.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=131072, variant_uidx_end=196608, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 507291 byte(s) from 1876878.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=196608, variant_uidx_end=262144, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 460534 byte(s) from 2384169.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=262144, variant_uidx_end=327680, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 390753 byte(s) from 2844703.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=327680, variant_uidx_end=334810, load_variant_ct=25

[pgl] PgfiMultiread() attempting to read 301 byte(s) from 3235456.

[pgl] PgfiMultiread() attempting to read 18446744073706223551 byte(s) from 3328065.

[pgl] PgfiMultiread() fread(3328065, 18446744073706223551) failed, block_offset=3235456.


Christopher Chang

unread,
Dec 17, 2024, 12:00:59 PM12/17/24
to plink2-users
Thanks, second 64-bit debug build has been posted to https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20241217a.zip .

nzeltser

unread,
Dec 17, 2024, 12:54:00 PM12/17/24
to plink2-users
Great, here is the next log:

PLINK v2.0.0-a.6.4.bLM 64-bit Intel (17 Dec 2024)

Options in effect:

  --check-sex

  --debug

  --out debug-02-check-sex

  --pfile ZZPRGPLA-ZZPRGPDM_sex-check


Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and

max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you

should look at the distributions of xf and yrate in the .checksex output file,

and then rerun --check-sex with data-derived thresholds.

Random number seed: 1734457784

3923 MiB RAM detected, ~330 available; reserving 640 MiB for main workspace.

Using up to 2 compute threads.

961 samples (0 females, 961 males; 961 founders) loaded from

ZZPRGPLA-ZZPRGPDM_sex-check.psam.

334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.

Note: No phenotype data present.

Calculating allele frequencies... done.

CheckOrImputeSex(): x_start=0  x_end=327694  used_variant_ct_x=327694  PopcountBitRange: 327694

--check-sex chrX: 

Error: .pgen file read failure: File appears to be corrupted.

[pgl] PgfiMultiread() called with variant_uidx_start=0, variant_uidx_end=65536,

load_variant_ct=65536

[pgl] PgfiMultiread(): PopcountBitRange(0, 65536)=65536

[pgl] PgfiMultiread() advanced variant_uidx_start to 0

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=65536,

variant_uidx_start=0, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 449192 byte(s) from 1004490.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=65536,

variant_uidx_end=131072, load_variant_ct=65536

[pgl] PgfiMultiread(): PopcountBitRange(65536, 131072)=65536

[pgl] PgfiMultiread() advanced variant_uidx_start to 65536

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=131072,

variant_uidx_start=65536, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 423196 byte(s) from 1453682.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=131072,

variant_uidx_end=196608, load_variant_ct=65536

[pgl] PgfiMultiread(): PopcountBitRange(131072, 196608)=65536

[pgl] PgfiMultiread() advanced variant_uidx_start to 131072

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=196608,

variant_uidx_start=131072, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 507291 byte(s) from 1876878.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=196608,

variant_uidx_end=262144, load_variant_ct=65536

[pgl] PgfiMultiread(): PopcountBitRange(196608, 262144)=65536

[pgl] PgfiMultiread() advanced variant_uidx_start to 196608

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=262144,

variant_uidx_start=196608, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 460534 byte(s) from 2384169.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=262144,

variant_uidx_end=327680, load_variant_ct=65536

[pgl] PgfiMultiread(): PopcountBitRange(262144, 327680)=65536

[pgl] PgfiMultiread() advanced variant_uidx_start to 262144

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=327680,

variant_uidx_start=262144, load_variant_ct=65536

[pgl] PgfiMultiread() attempting to read 390753 byte(s) from 2844703.

[pgl] PgfiMultiread() returned success

[pgl] PgfiMultiread() called with variant_uidx_start=327680,

variant_uidx_end=334810, load_variant_ct=25

[pgl] PgfiMultiread(): PopcountBitRange(327680, 334810)=14

[pgl] PgfiMultiread() advanced variant_uidx_start to 327680

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=334810,

variant_uidx_start=327680, load_variant_ct=25

[pgl] PgfiMultiread(): cur_read_uidx_end updated to 327694,

cur_read_end_fpos=3235757, load_variant_ct updated to 11

[pgl] PgfiMultiread(): variant_uidx_start updated to 334810,

next_read_start_fpos=3328065

[pgl] PgfiMultiread() attempting to read 301 byte(s) from 3235456.

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=334810,

variant_uidx_start=334810, load_variant_ct=11

[pgl] PgfiMultiread(): cur_read_uidx_end updated to 334811,

cur_read_end_fpos=0, load_variant_ct updated to 10

[pgl] PgfiMultiread(): variant_uidx_start updated to 334817,

next_read_start_fpos=4607182418800017408

[pgl] PgfiMultiread() attempting to read 18446744073706223551 byte(s) from

3328065.

[pgl] PgfiMultiread() fread(3328065, 18446744073706223551) failed,

block_offset=3235456.



Christopher Chang

unread,
Dec 17, 2024, 3:27:43 PM12/17/24
to plink2-users
Ok, I think I've nailed down the bug; see if the problem is gone with https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20241217b.zip .

nzeltser

unread,
Dec 17, 2024, 4:25:03 PM12/17/24
to plink2-users
Perfect! That worked. I confirmed `--impute-sex` as well.

PLINK v2.0.0-a.6.4.cLM 64-bit Intel (17 Dec 2024)

Options in effect:

  --check-sex

  --debug

  --out debug-03-check-sex

  --pfile ZZPRGPLA-ZZPRGPDM_sex-check

Warning: No --check-sex thresholds specified. Setting min-male-xf=1 and

max-female-yrate=0; if you aren't just sanity-checking pre-cleaned data, you

should look at the distributions of xf and yrate in the .checksex output file,

and then rerun --check-sex with data-derived thresholds.

Random number seed: 1734470208

3923 MiB RAM detected, ~504 available; reserving 640 MiB for main workspace.

Using up to 2 compute threads.

961 samples (0 females, 961 males; 961 founders) loaded from

ZZPRGPLA-ZZPRGPDM_sex-check.psam.

334810 variants loaded from ZZPRGPLA-ZZPRGPDM_sex-check.pvar.

Note: No phenotype data present.

Calculating allele frequencies... done.

CheckOrImputeSex(): x_start=0  x_end=327694  used_variant_ct_x=327694  PopcountBitRange: 327694

--check-sex chrX: done.

Warning: 323287 variants skipped because they were monomorphic. You may want to

use --read-freq to provide more accurate allele frequency estimates.

variant_uidx_end=334810, load_variant_ct=14

[pgl] PgfiMultiread(): PopcountBitRange(327680, 334810)=14

[pgl] PgfiMultiread() advanced variant_uidx_start to 327680

[pgl] PgfiMultiread(): restarting inner loop, cur_read_uidx_end=334810,

variant_uidx_start=327680, load_variant_ct=14

[pgl] PgfiMultiread(): cur_read_uidx_end updated to 327694,

cur_read_end_fpos=3235757, load_variant_ct updated to 0

[pgl] PgfiMultiread() attempting to read 301 byte(s) from 3235456.

[pgl] PgfiMultiread() returned success

--check-sex: Calculating chrY valid genotype call rates... done.

--check-sex: 327694 chrX variants and 7116 variants scanned, 961 problems

detected. Report written to debug-03-check-sex.sexcheck .


Reply all
Reply to author
Forward
0 new messages