(0-based) variant in .pgen file

86 views
Skip to first unread message

Damian Ulrich

unread,
Jan 21, 2026, 4:24:21 PMJan 21
to plink2-users
Hi, 

So I'm working with sets of samples (studies) and PLINK2 has worked fine so far, but for some reason in this one group of samples,  it essentially just errors and I cannot fix it whatever I try to do. I do probably have to note that I ran everything per chromosome and then merged the chromosomal PLINK files with a pmerge list. But this works for every other set of samples I used so far, so I don't see this to be the issue. 

This is the log:

PLINK v2.0.0-a.7LM 64-bit Intel (26 Oct 2025)
Options in effect:
  --freq
  --missing
  --out output/Studies/Wirka_et_al_2019/Wirka_et_al_2019_unfiltered
  --pfile output/Studies/Wirka_et_al_2019/Wirka_et_al_2019

Hostname: 
Working directory: 
Start time: Wed Jan 21 21:46:42 2026

Random number seed: 1769028402
385050 MiB RAM detected, ~288607 available; reserving 192525 MiB for main
workspace.
Using up to 2 compute threads.
8 samples (0 females, 0 males, 8 ambiguous; 8 founders) loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.psam.
606524 variants loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.pvar.
Note: No phenotype data present.
Calculating sample missingness rates...
Error: Failed to unpack (0-based) variant #65553 in .pgen file.
You can use --validate to check whether it is malformed.
* If it is malformed, you probably need to either re-download the file, or
  address an error in the command that generated the input .pgen.
* If it appears to be valid, you have probably encountered a plink2 bug.  If
  you report the error on GitHub or the plink2-users Google group (make sure to
  include the full .log file in your report), we'll try to address it.

End time: Wed Jan 21 21:46:42 2026

I have used --validate but it just gives the same message as the log. I have regenerated the PLINK binary files multiple times and I've tried to somehow exclude that variant, but the moment i run the --pfile flag on the Wirka_et_al_2019 binary files it just greets me with the same error messages as within the log and I don't think I can exclude without PLINK as it complains specifically about the .pgen file.

Does anyone have any ideas on how to fix this?

Chris Chang

unread,
Jan 21, 2026, 4:25:44 PMJan 21
to Damian Ulrich, plink2-users
Do you have a .log from a command used to create the problematic .pgen file?

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/1fa05106-0c8d-4721-8fad-84e8a7df04ffn%40googlegroups.com.

Chris Chang

unread,
Jan 21, 2026, 6:33:49 PMJan 21
to Damian Ulrich, plink2-users
And are you able to post a set of input files which I can replicate the buggy merge with?  (Could be just the first few chromosomes, if that's enough to reproduce the problem.)  If not, will you be able to run a sequence of debug builds?

Damian Ulrich

unread,
Jan 22, 2026, 4:31:05 AMJan 22
to plink2-users
I think the 'corrupt' .pgen file occurs after merging (So Wirka_et_al_2019.pgen, but I could be wrong) so I assume you mean the log of the PLINK binary file merge. I have shared that log down below:


PLINK v2.0.0-a.7LM 64-bit Intel (26 Oct 2025)
Options in effect:
  --make-pgen
  --out output/Studies/Wirka_et_al_2019/Wirka_et_al_2019
  --pmerge-list output/Studies/Wirka_et_al_2019/chromosomeMergeList.lst

Hostname: 
Working directory: 
Start time: Thu Jan 22 10:24:55 2026

Random number seed: 1769073895
385050 MiB RAM detected, ~252007 available; reserving 192525 MiB for main

workspace.
Using up to 2 compute threads.
--pmerge-list: 22 filesets specified.
--pmerge-list: 8 samples present.
--pmerge-list: Merged .psam written to
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.psam .
--pmerge-list: 22 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 606524/606524 variants complete.
Results written to output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.pgen
+ output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.pvar .

8 samples (0 females, 0 males, 8 ambiguous; 8 founders) loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.psam.
606524 variants loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.pvar.

Note: No phenotype data present.
Writing output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.psam ... done.
Writing output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.pvar ... done.
Writing output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.pgen ... done.

End time: Thu Jan 22 10:24:58 2026


Op woensdag 21 januari 2026 om 22:25:44 UTC+1 schreef chrch...@gmail.com:

Damian Ulrich

unread,
Jan 22, 2026, 4:32:49 AMJan 22
to plink2-users
The PLINK binary files are computed based on patient VCFs so I don't think I can share them. I'm fairly new to PLINK so could you further specify what you mean with "Running a sequence of debug builds"

Op donderdag 22 januari 2026 om 00:33:49 UTC+1 schreef chrch...@gmail.com:

Chris Chang

unread,
Jan 22, 2026, 5:36:24 AMJan 22
to Damian Ulrich, plink2-users
I will post a debug build of PLINK2 which produces extra logging information when run with --debug.  Then you can use it to run the merge and validate commands (with --debug added to your command line), and post the resulting .log files.  I will look at those .log files, and maybe send you another debug build of PLINK2 to run, etc. until the problem has been nailed down.

Damian Ulrich

unread,
Jan 22, 2026, 5:55:14 AMJan 22
to plink2-users
Sounds good. I don't really care too much about the variant causing the issue, so if I could identify what the variant is and get rid of it in the actual vcf that would also suffice. Thing is everytime I query for the variant in the PVAR file using the #62464 as line number and then exclude it with bcftools from the vcf, the corrupt variant seems to shift one position, indicating that I'm somehow not deleting the right one. Another thing I noticed is that upon rerunning this morning the corrupt variant changed from #65553 to #62464. 

Op donderdag 22 januari 2026 om 11:36:24 UTC+1 schreef chrch...@gmail.com:

Chris Chang

unread,
Jan 22, 2026, 10:43:46 AMJan 22
to Damian Ulrich, plink2-users
What is the merge .log if you add “—memory 8000 —randmem —seed 1” to your command line?

Chris Chang

unread,
Jan 22, 2026, 10:44:29 AMJan 22
to Damian Ulrich, plink2-users
(correction, merge and validate logs)

Rui Zhang

unread,
Feb 28, 2026, 7:41:26 PMFeb 28
to plink2-users
Hi! I encountered the same issue when I trying to estimate PCA loadings using plink2 version Jan.10, here is my log file:

PLINK v2.0.0-a.7LM 64-bit Intel (10 Jan 2026)

Options in effect:

  --extract /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA/02_result/Pakistan_n16842_unrel_QCed_LD3.prune.in

  --freq counts

  --out /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA/02_result/Pakistan_n16842_unrel_PCA_loadings

  --pca allele-wts 10 vcols=chrom,ref,alt

  --pfile /mnt/rzhang-disk-1/Pakistan_n16842/02_QC/02_QCed_data/09_af_comp/Pakistan_n16842_QCed

  --remove /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA/02_result/Pakistan_n16842_related_sample.txt

  --validate

Hostname: rzhang-vm

Working directory: /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA/01_scripts

Start time: Sat Feb 28 19:31:44 2026


Random number seed: 1772325104

32103 MiB RAM detected, ~31065 available; reserving 16051 MiB for main

workspace.

Using up to 16 threads (change this with --threads).

16665 samples (5484 females, 11181 males; 16665 founders) loaded from

/mnt/rzhang-disk-1/Pakistan_n16842/02_QC/02_QCed_data/09_af_comp/Pakistan_n16842_QCed.psam.

13857821 variants loaded from

/mnt/rzhang-disk-1/Pakistan_n16842/02_QC/02_QCed_data/09_af_comp/Pakistan_n16842_QCed.pvar.

Validating

/mnt/rzhang-disk-1/Pakistan_n16842/02_QC/02_QCed_data/09_af_comp/Pakistan_n16842_QCed.pgen...

done.

1 binary phenotype loaded (8289 cases, 8376 controls).

--extract: 59205 variants remaining.

--remove: 16407 samples remaining.

16407 samples (5370 females, 11037 males; 16407 founders) remaining after main

filters.

8190 cases and 8217 controls remaining after main filters.

Calculating allele frequencies... done.

--freq counts: Allele counts (founders only) written to

/mnt/rzhang-disk-1/Pakistan_n16842/03_PCA/02_result/Pakistan_n16842_unrel_PCA_loadings.acount

.

59205 variants remaining after main filters.

Constructing GRM: done.

Correcting for missingness... 

Error: Failed to unpack (0-based) variant #16390 in .pgen file.

You can use --validate to check whether it is malformed.

* If it is malformed, you probably need to either re-download the file, or

  address an error in the command that generated the input .pgen.

* If it appears to be valid, you have probably encountered a plink2 bug.  If

  you report the error on GitHub or the plink2-users Google group (make sure to

  include the full .log file in your report), we'll try to address it.


End time: Sat Feb 28 19:37:30 2026

Chris Chang

unread,
Feb 28, 2026, 7:48:09 PMFeb 28
to Rui Zhang, plink2-users
Thanks for reporting this.
1. If you rerun that command with the alpha 6.32 build, does it still fail?
2. Are you able to post a set of files which I can replicate this error with?  If not, will you be able to run a sequence of debug builds?

Rui Zhang

unread,
Feb 28, 2026, 7:58:54 PMFeb 28
to plink2-users
Thank you for your quick response!

1. I just submitted the job, and got the same error message: Error: Failed to unpack (0-based) variant #16390 in .pgen file.
2. I'm not able to share the files since the data isn't public, but I can run a sequence of debug builds to help resolve this problem.

I was wondering whether this could be a bug related to sample size, since it works fine on a smaller cohort?

For this larger cohort, I was able to run successfully with a previous version (couldn't remember the exact version number), but that version has a bug with --extract-if-info, so I'm now using the latest version.

By the way, is there a place where I could download the previous version?

Chris Chang

unread,
Feb 28, 2026, 8:00:56 PMFeb 28
to Rui Zhang, plink2-users

Rui Zhang

unread,
Feb 28, 2026, 8:15:46 PMFeb 28
to plink2-users
Yes! I tested it using alpha 6.32 and got the same error message

Chris Chang

unread,
Feb 28, 2026, 8:18:51 PMFeb 28
to Rui Zhang, plink2-users
Ok.  If you add "--threads 1" to your command line, does the error still occur?

Rui Zhang

unread,
Feb 28, 2026, 8:52:09 PMFeb 28
to plink2-users
I used "--threads 1" and alpha 6.32 to generate PCA loadings, but still got the same error message.
Do I need to start from regenerating the pgen file? since I used the same pgen file, the LD pruning worked well, but PCA loadings generation failed

Chris Chang

unread,
Feb 28, 2026, 8:59:42 PMFeb 28
to Rui Zhang, plink2-users
Since --validate didn't complain, you probably don't need to regenerate the pgen.

First debug build has been posted to https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20260228a.zip (or you could build from source on GitHub); try running that with --debug added to your command line (you can omit "--threads 1") and report the .log output.

Rui Zhang

unread,
Feb 28, 2026, 9:23:54 PMFeb 28
to plink2-users

Thank you! Here is the log using the first debug build:

PLINK v2.0.0-a.7LM 64-bit Intel (28 Feb 2026)

Options in effect:

  --debug

  --extract /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA_debug/02_result/Pakistan_n16842_unrel_QCed_LD3.prune>

  --freq counts

  --out /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA_debug/02_result/Pakistan_n16842_unrel_PCA_loadings

  --pca allele-wts 10 vcols=chrom,ref,alt

  --pfile /mnt/rzhang-disk-1/Pakistan_n16842/02_QC/02_QCed_data/09_af_comp/Pakistan_n16842_QCed

  --remove /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA_debug/02_result/Pakistan_n16842_related_sample.txt


Hostname: rzhang-vm

Working directory: /mnt/rzhang-disk-1/Pakistan_n16842/03_PCA_debug/01_scripts

Start time: Sat Feb 28 21:15:16 2026


Random number seed: 1772331316

15999 MiB RAM detected, ~6802 available; reserving 6738 MiB for main workspace.

Using up to 16 threads (change this with --threads).

16665 samples (5484 females, 11181 males; 16665 founders) loaded from

/mnt/rzhang-disk-1/Pakistan_n16842/02_QC/02_QCed_data/09_af_comp/Pakistan_n16842_QCed.psam.

13857821 variants loaded from

/mnt/rzhang-disk-1/Pakistan_n16842/02_QC/02_QCed_data/09_af_comp/Pakistan_n16842_QCed.pvar.

1 binary phenotype loaded (8289 cases, 8376 controls).

--extract: 59205 variants remaining.

--remove: 16407 samples remaining.

16407 samples (5370 females, 11037 males; 16407 founders) remaining after main

filters.

8190 cases and 8217 controls remaining after main filters.

Calculating allele frequencies... done.

--freq counts: Allele counts (founders only) written to

/mnt/rzhang-disk-1/Pakistan_n16842/03_PCA_debug/02_result/Pakistan_n16842_unrel_PCA_loadings.acount

.

59205 variants remaining after main filters.

Constructing GRM: done.

Correcting for missingness... vrtype: 176  dosage_is_relevant: 1

branch 5: ParseAndSaveDeltalistAsBitarr failure


Error: Failed to unpack (0-based) variant #16390 in .pgen file.

You can use --validate to check whether it is malformed.

* If it is malformed, you probably need to either re-download the file, or

  address an error in the command that generated the input .pgen.

* If it appears to be valid, you have probably encountered a plink2 bug.  If

  you report the error on GitHub or the plink2-users Google group (make sure to

  include the full .log file in your report), we'll try to address it.


End time: Sat Feb 28 21:21:22 2026

Chris Chang

unread,
Feb 28, 2026, 9:47:38 PMFeb 28
to Rui Zhang, plink2-users
Thanks.  Second debug build has been posted to https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20260228b.zip and GitHub; run the same command (with --debug) and report the .log.

Rui Zhang

unread,
Feb 28, 2026, 10:10:17 PMFeb 28
to plink2-users
Thank you! Here is the second debug log (I only copied this part, please let me know if more details are needed):

Constructing GRM: done.

Correcting for missingness... 0%record size: 9043 bytes

entering ParseAndSaveDeltalistAsBitarr: 3906 bytes remaining

deltalist_len=22

raw_sample_idx=44801  raw_sample_ct=16665  group_idx=0  raw_deltalist_idx_lowbits=0

Chris Chang

unread,
Feb 28, 2026, 10:44:14 PMFeb 28
to Rui Zhang, plink2-users
Third debug build has been posted to https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20260228c.zip and GitHub; run the same command (with --debug) and report the .log.

Rui Zhang

unread,
Feb 28, 2026, 10:51:47 PMFeb 28
to plink2-users
Here is the log of third debug build:

Constructing GRM: done.

Correcting for missingness... 0%after ReadRawGenovec, 4876 bytes remaining

aux2_first_part_byte_ct: 970

after ParseDifflistHeader, 3903 bytes remaining

Chris Chang

unread,
Feb 28, 2026, 11:01:43 PMFeb 28
to Rui Zhang, plink2-users
Thanks, I think I identified the bug.  Try your original command line (no --debug needed) with https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20260228d.zip .

Rui Zhang

unread,
Feb 28, 2026, 11:23:19 PMFeb 28
to plink2-users
It works! Thank you so much!
I just want to double-check, the bug is not related to the generation of the pgen file, correct? I'm asking because I recently completed QC on about 20 cohorts using the Jan 10 version of plink2, and I want to make sure I don't need to redo that work. Thank you for your help and time!

Chris Chang

unread,
Feb 28, 2026, 11:24:30 PMFeb 28
to Rui Zhang, plink2-users
Correct, the pgen file looks fine.

Rui Zhang

unread,
Feb 28, 2026, 11:25:02 PMFeb 28
to plink2-users
Got it! Thank you!
Reply all
Reply to author
Forward
0 new messages