(0-based) variant in .pgen file

25 views
Skip to first unread message

Damian Ulrich

unread,
Jan 21, 2026, 4:24:21 PM (14 days ago) Jan 21
to plink2-users
Hi, 

So I'm working with sets of samples (studies) and PLINK2 has worked fine so far, but for some reason in this one group of samples,  it essentially just errors and I cannot fix it whatever I try to do. I do probably have to note that I ran everything per chromosome and then merged the chromosomal PLINK files with a pmerge list. But this works for every other set of samples I used so far, so I don't see this to be the issue. 

This is the log:

PLINK v2.0.0-a.7LM 64-bit Intel (26 Oct 2025)
Options in effect:
  --freq
  --missing
  --out output/Studies/Wirka_et_al_2019/Wirka_et_al_2019_unfiltered
  --pfile output/Studies/Wirka_et_al_2019/Wirka_et_al_2019

Hostname: 
Working directory: 
Start time: Wed Jan 21 21:46:42 2026

Random number seed: 1769028402
385050 MiB RAM detected, ~288607 available; reserving 192525 MiB for main
workspace.
Using up to 2 compute threads.
8 samples (0 females, 0 males, 8 ambiguous; 8 founders) loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.psam.
606524 variants loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.pvar.
Note: No phenotype data present.
Calculating sample missingness rates...
Error: Failed to unpack (0-based) variant #65553 in .pgen file.
You can use --validate to check whether it is malformed.
* If it is malformed, you probably need to either re-download the file, or
  address an error in the command that generated the input .pgen.
* If it appears to be valid, you have probably encountered a plink2 bug.  If
  you report the error on GitHub or the plink2-users Google group (make sure to
  include the full .log file in your report), we'll try to address it.

End time: Wed Jan 21 21:46:42 2026

I have used --validate but it just gives the same message as the log. I have regenerated the PLINK binary files multiple times and I've tried to somehow exclude that variant, but the moment i run the --pfile flag on the Wirka_et_al_2019 binary files it just greets me with the same error messages as within the log and I don't think I can exclude without PLINK as it complains specifically about the .pgen file.

Does anyone have any ideas on how to fix this?

Chris Chang

unread,
Jan 21, 2026, 4:25:44 PM (14 days ago) Jan 21
to Damian Ulrich, plink2-users
Do you have a .log from a command used to create the problematic .pgen file?

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/1fa05106-0c8d-4721-8fad-84e8a7df04ffn%40googlegroups.com.

Chris Chang

unread,
Jan 21, 2026, 6:33:49 PM (14 days ago) Jan 21
to Damian Ulrich, plink2-users
And are you able to post a set of input files which I can replicate the buggy merge with?  (Could be just the first few chromosomes, if that's enough to reproduce the problem.)  If not, will you be able to run a sequence of debug builds?

Damian Ulrich

unread,
Jan 22, 2026, 4:31:05 AM (13 days ago) Jan 22
to plink2-users
I think the 'corrupt' .pgen file occurs after merging (So Wirka_et_al_2019.pgen, but I could be wrong) so I assume you mean the log of the PLINK binary file merge. I have shared that log down below:


PLINK v2.0.0-a.7LM 64-bit Intel (26 Oct 2025)
Options in effect:
  --make-pgen
  --out output/Studies/Wirka_et_al_2019/Wirka_et_al_2019
  --pmerge-list output/Studies/Wirka_et_al_2019/chromosomeMergeList.lst

Hostname: 
Working directory: 
Start time: Thu Jan 22 10:24:55 2026

Random number seed: 1769073895
385050 MiB RAM detected, ~252007 available; reserving 192525 MiB for main

workspace.
Using up to 2 compute threads.
--pmerge-list: 22 filesets specified.
--pmerge-list: 8 samples present.
--pmerge-list: Merged .psam written to
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.psam .
--pmerge-list: 22 .pvar files scanned, headers merged.
Concatenation job detected.
Concatenating... 606524/606524 variants complete.
Results written to output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.pgen
+ output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.pvar .

8 samples (0 females, 0 males, 8 ambiguous; 8 founders) loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.psam.
606524 variants loaded from
output/Studies/Wirka_et_al_2019/Wirka_et_al_2019-merge.pvar.

Note: No phenotype data present.
Writing output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.psam ... done.
Writing output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.pvar ... done.
Writing output/Studies/Wirka_et_al_2019/Wirka_et_al_2019.pgen ... done.

End time: Thu Jan 22 10:24:58 2026


Op woensdag 21 januari 2026 om 22:25:44 UTC+1 schreef chrch...@gmail.com:

Damian Ulrich

unread,
Jan 22, 2026, 4:32:49 AM (13 days ago) Jan 22
to plink2-users
The PLINK binary files are computed based on patient VCFs so I don't think I can share them. I'm fairly new to PLINK so could you further specify what you mean with "Running a sequence of debug builds"

Op donderdag 22 januari 2026 om 00:33:49 UTC+1 schreef chrch...@gmail.com:

Chris Chang

unread,
Jan 22, 2026, 5:36:24 AM (13 days ago) Jan 22
to Damian Ulrich, plink2-users
I will post a debug build of PLINK2 which produces extra logging information when run with --debug.  Then you can use it to run the merge and validate commands (with --debug added to your command line), and post the resulting .log files.  I will look at those .log files, and maybe send you another debug build of PLINK2 to run, etc. until the problem has been nailed down.

Damian Ulrich

unread,
Jan 22, 2026, 5:55:14 AM (13 days ago) Jan 22
to plink2-users
Sounds good. I don't really care too much about the variant causing the issue, so if I could identify what the variant is and get rid of it in the actual vcf that would also suffice. Thing is everytime I query for the variant in the PVAR file using the #62464 as line number and then exclude it with bcftools from the vcf, the corrupt variant seems to shift one position, indicating that I'm somehow not deleting the right one. Another thing I noticed is that upon rerunning this morning the corrupt variant changed from #65553 to #62464. 

Op donderdag 22 januari 2026 om 11:36:24 UTC+1 schreef chrch...@gmail.com:

Chris Chang

unread,
Jan 22, 2026, 10:43:46 AM (13 days ago) Jan 22
to Damian Ulrich, plink2-users
What is the merge .log if you add “—memory 8000 —randmem —seed 1” to your command line?

Chris Chang

unread,
Jan 22, 2026, 10:44:29 AM (13 days ago) Jan 22
to Damian Ulrich, plink2-users
(correction, merge and validate logs)
Reply all
Reply to author
Forward
0 new messages