Error when converting VCF to BED: Error: Failed to unpack (0-based) variant #536 in .pgen file.

87 views
Skip to first unread message

Oliver Ruebenacker

unread,
Mar 28, 2023, 3:08:25 PM3/28/23
to plink2-users

     Hello,

  I got the following error when trying to convert VCF files to BED. I can't share the data. I should note that I have tens of thousands of such files, and it worked for all except one. Log is below. Thanks!

     Best, Oliver


PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)
Options in effect:
  --debug
  --maf 0.05
  --make-bed
  --max-alleles 2
  --memory 12000
  --out bed_file_53
  --vcf input_53.vcf.gz
Hostname: job-GQVXQX8JjVFPFGXYvXQzp0jJ
Working directory: /home/dnanexus
Start time: Tue Mar 28 18:54:53 2023
Random number seed: 1680029693
15713 MiB RAM detected; reserving 12000 MiB for main workspace.
Using up to 4 compute threads.
--vcf: 650 variants scanned.
--vcf: bed_file_53-temporary.pgen + bed_file_53-temporary.pvar.zst +
bed_file_53-temporary.psam written.
150119 samples (0 females, 0 males, 150119 ambiguous; 150119 founders) loaded
from bed_file_53-temporary.psam.
626 out of 650 variants loaded from bed_file_53-temporary.pvar.zst.
Note: No phenotype data present.
Calculating allele frequencies...
Error: Failed to unpack (0-based) variant #536 in .pgen file.
You can use --validate to check whether it is malformed.
* If it is malformed, you probably need to either re-download the file, or
  address an error in the command that generated the input .pgen.
* If it appears to be valid, you have probably encountered a plink2 bug.  If
  you report the error on GitHub or the plink2-users Google group (make sure to
  include the full .log file in your report), we'll try to address it.
End time: Tue Mar 28 18:54:57 2023

Christopher Chang

unread,
Mar 28, 2023, 3:25:22 PM3/28/23
to plink2-users
This is almost certainly a plink2 bug.  Since you cannot share the data, I will post a sequence of debug builds for you to run; the first one should be available tonight.

In the meantime, if you run "plink2 --vcf input_53.vcf.gz --memory 12000 --out pgen_file_53" followed by "plink2 --pfile pgen_file_53 --validate", does --validate complain?

Oliver Ruebenacker

unread,
Mar 28, 2023, 4:25:52 PM3/28/23
to Christopher Chang, plink2-users

     Hello Christopher,

  Thank you for the quick response! So I got:

  Stderr:

Error: .pgen header indicates that file size should be 5658618 bytes, but
actual file size is 5724154 bytes.

  Stdout:

PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to pgen_file.log.
Options in effect:
  --memory 12000
  --out pgen_file
  --vcf input_53.vcf.gz
Start time: Tue Mar 28 20:13:58 2023

15713 MiB RAM detected; reserving 12000 MiB for main workspace.
Using up to 4 compute threads.

--vcf: 650 variants scanned.

--vcf: 0k variants converted.    
--vcf: pgen_file.pgen + pgen_file.pvar + pgen_file.psam written.
End time: Tue Mar 28 20:14:01 2023
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --pfile pgen_file
  --validate
Start time: Tue Mar 28 20:14:01 2023
15713 MiB RAM detected; reserving 7856 MiB for main workspace.

Using up to 4 compute threads.
150119 samples (0 females, 0 males, 150119 ambiguous; 150119 founders) loaded
from pgen_file.psam.
650 variants loaded from pgen_file.pvar.
Validating pgen_file.pgen...
End time: Tue Mar 28 20:14:01 2023

  Thanks!

     Best, Oliver

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/73b74354-a19d-4b0d-82f0-0447528502abn%40googlegroups.com.


--
Oliver Ruebenacker, Ph.D. (he)

Christopher Chang

unread,
Mar 29, 2023, 12:00:48 AM3/29/23
to plink2-users
Thanks.  First debug build is posted to https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20230328.zip and GitHub.  Try running just "plink2 --debug --memory 12000 --out pgen_file --vcf input_53.vcf.gz" with it, and post/send the resulting log.

Oliver Ruebenacker

unread,
Mar 29, 2023, 2:55:58 PM3/29/23
to Christopher Chang, plink2-users

     Hello,

  Output now is:

PLINK v2.00a4LM 64-bit Intel (28 Mar 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to pgen_file.log.
Options in effect:
  --memory 12000
  --out pgen_file
  --vcf input_53.vcf.gz
Start time: Wed Mar 29 15:37:03 2023
15713 MiB RAM detected, ~14426 available; reserving 12000 MiB for main

workspace.
Using up to 4 compute threads.

--vcf: 650 variants scanned.

--vcf: 0k variants converted.    
--vcf: pgen_file.pgen + pgen_file.pvar + pgen_file.psam written.
End time: Wed Mar 29 15:37:07 2023

  Thanks!

     Best, Oliver

Christopher Chang

unread,
Mar 29, 2023, 3:06:27 PM3/29/23
to plink2-users
I need the output with the --debug flag included.

Christopher Chang

unread,
Mar 29, 2023, 4:11:48 PM3/29/23
to plink2-users
Correction: I should have specified "console output" rather than "log", these debug-prints weren't going into the log.  I will try to change this in any subsequent debug builds.

Oliver Ruebenacker

unread,
Mar 29, 2023, 4:16:42 PM3/29/23
to Christopher Chang, plink2-users

     Hello Christopher,

  Ah, sorry, here is the output with --debug:

PLINK v2.00a4LM 64-bit Intel (28 Mar 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to pgen_file.log.
Options in effect:
  --debug
  --memory 12000
  --out pgen_file
  --vcf input_53.vcf.gz
Start time: Wed Mar 29 19:49:36 2023
15713 MiB RAM detected, ~14438 available; reserving 12000 MiB for main

workspace.
Using up to 4 compute threads.

--vcf: 650 variants scanned.
max_vrec_len: 56298  vrec_len_byte_ct: 2

--vcf: 0k variants converted.     PwcAppendBiallelicGenovec[470]: vrec_len=336  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[471]: vrec_len=365  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[472]: vrec_len=367  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[473]: vrec_len=377  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[474]: vrec_len=395  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[475]: vrec_len=6079  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[476]: vrec_len=5193  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[477]: vrec_len=2390  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[478]: vrec_len=4474  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[479]: vrec_len=1242  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[480]: vrec_len=572  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[481]: vrec_len=182  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[482]: vrec_len=289  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[483]: vrec_len=177  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[484]: vrec_len=6537  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[485]: vrec_len=1666  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[486]: vrec_len=594  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[487]: vrec_len=651  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[488]: vrec_len=37530  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[489]: vrec_len=27871  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[490]: vrec_len=2052  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[491]: vrec_len=1435  vrec_len_byte_ct: 2
PwcAppendMultiallelicSparse[492]: vrec_len=1573  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[493]: vrec_len=1559  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[494]: vrec_len=4517  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[495]: vrec_len=3554  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[496]: vrec_len=37530  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[497]: vrec_len=2346  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[498]: vrec_len=785  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[499]: vrec_len=384  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[500]: vrec_len=299  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[501]: vrec_len=149  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[502]: vrec_len=18374  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[503]: vrec_len=815  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[504]: vrec_len=775  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[505]: vrec_len=8672  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[506]: vrec_len=3485  vrec_len_byte_ct: 2
PwcAppendMultiallelicSparse[507]: vrec_len=170  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[508]: vrec_len=8271  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[509]: vrec_len=4575  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[510]: vrec_len=9171  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[511]: vrec_len=640  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[512]: vrec_len=619  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[513]: vrec_len=334  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[514]: vrec_len=219  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[515]: vrec_len=346  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[516]: vrec_len=431  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[517]: vrec_len=6587  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[518]: vrec_len=22074  vrec_len_byte_ct: 2
PwcAppendMultiallelicSparse[519]: vrec_len=26374  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[520]: vrec_len=1178  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[521]: vrec_len=891  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[522]: vrec_len=844  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[523]: vrec_len=3445  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[524]: vrec_len=1185  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[525]: vrec_len=21608  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[526]: vrec_len=20757  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[527]: vrec_len=3597  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[528]: vrec_len=780  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[529]: vrec_len=62  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[530]: vrec_len=58  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[531]: vrec_len=544  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[532]: vrec_len=714  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[533]: vrec_len=1125  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[534]: vrec_len=14486  vrec_len_byte_ct: 2
PwcAppendMultiallelicSparse[535]: vrec_len=69640  vrec_len_byte_ct: 2
PwcAppendBiallelicGenovec[536]: vrec_len=20184  vrec_len_byte_ct: 2
len[470]: 336
len[471]: 365
len[472]: 367
len[473]: 377
len[474]: 395
len[475]: 6079
len[476]: 5193
len[477]: 2390
len[478]: 4474
len[479]: 1242
len[480]: 572
len[481]: 182
len[482]: 289
len[483]: 177
len[484]: 6537
len[485]: 1666
len[486]: 594
len[487]: 651
len[488]: 37530
len[489]: 27871
len[490]: 2052
len[491]: 1435
len[492]: 1573
len[493]: 1559
len[494]: 4517
len[495]: 3554
len[496]: 37530
len[497]: 2346
len[498]: 785
len[499]: 384
len[500]: 299
len[501]: 149
len[502]: 18374
len[503]: 815
len[504]: 775
len[505]: 8672
len[506]: 3485
len[507]: 170
len[508]: 8271
len[509]: 4575
len[510]: 9171
len[511]: 640
len[512]: 619
len[513]: 334
len[514]: 219
len[515]: 346
len[516]: 431
len[517]: 6587
len[518]: 22074
len[519]: 26374
len[520]: 1178
len[521]: 891
len[522]: 844
len[523]: 3445
len[524]: 1185
len[525]: 21608
len[526]: 20757
len[527]: 3597
len[528]: 780
len[529]: 62
len[530]: 58
len[531]: 544
len[532]: 714
len[533]: 1125
len[534]: 14486
len[535]: 4104
len[536]: 20184

--vcf: pgen_file.pgen + pgen_file.pvar + pgen_file.psam written.
End time: Wed Mar 29 19:49:40 2023

  Thanks!

     Best, Oliver

Christopher Chang

unread,
Mar 29, 2023, 4:52:13 PM3/29/23
to plink2-users
Thanks; second debug build posted to https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20230329a.zip and GitHub.  You can run the same command.

Oliver Ruebenacker

unread,
Mar 29, 2023, 6:47:41 PM3/29/23
to Christopher Chang, plink2-users

     Hello Christopher,

  Thanks for looking into this, here is the output:

PLINK v2.00a4LM 64-bit Intel (29 Mar 2023)     www.cog-genomics.org/plink/2.0/

(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to pgen_file.log.
Options in effect:
  --debug
  --memory 12000
  --out pgen_file
  --vcf input_53.vcf.gz
Start time: Wed Mar 29 21:55:45 2023
15545 MiB RAM detected, ~14262 available; reserving 12000 MiB for main

workspace.
Using up to 4 compute threads.
alt_ct[535]: 2

--vcf: 650 variants scanned.

--vcf: 0k variants converted.     branch 1
allele_ct[535]: 3
patch_01_ct[535]: 18867
popcount(patch_01_set)[535]: 18867
patch_10_ct[535]: 118631
popcount(patch_10_set)[535]: 118631


--vcf: pgen_file.pgen + pgen_file.pvar + pgen_file.psam written.
End time: Wed Mar 29 21:55:48 2023

     Best, Oliver

Christopher Chang

unread,
Mar 29, 2023, 8:55:50 PM3/29/23
to plink2-users
Okay, I think I figured out the issue.  Let me know if today's build does not solve the problem.

Oliver Ruebenacker

unread,
Mar 30, 2023, 9:07:59 AM3/30/23
to Christopher Chang, plink2-users

     Hello Christopher,

  Thank you so much! The latest dev version of plink does indeed complete successfully.

     Best, Oliver

Reply all
Reply to author
Forward
0 new messages