Malformed fileset when importing vcf with make-pgen multiallelics=- and --set-all-var-ids @_#_$r_$a

22 views
Skip to first unread message

Gabriel Doctor

unread,
Nov 22, 2025, 5:30:13 PM (12 days ago) Nov 22
to plink2-users
Hi Chris, 
Working in UKB with bulk vcf data. 

Running the command:

plink2 --bcf test1.bcf \
 --make-pgen multiallelics=- \
 --set-all-var-ids @_#_\$r_\$a \
 --import-max-alleles 4 \
 --new-id-max-allele-len 2 missing \
 --out test1

Results in misnamed loci at some indels, and resutling fileset that is unusable:
(When I try run simple commands on the resulting binaries, i get an error:
line xxx has fewer tokens than expected.) 

22      10562860        22_10562860_C_T C       T       99.3889 PASS
22      10562862        .^@     CTT     C       99.7007 PASS
22      10562862        .^@     CTT     GTT     99.7007 PASS
22      10562862        .^@     CTT     TTT     99.7007 PASS
22      10562889        22_10562889_G_T G       T       99.2429 PASS
22      10562892        22_10562892_T_C T       C       91.7276 PASS
22      10562892        22_10562892_T_TA        T       TA      91.7276 PASS
22      10562894        22_10562894_A_C A       C       99.2429 PASS
22      10562895        22_10562895_A_C A       C       99.8792 PASS
22      10562895        22_10562895_A_T A       T       99.8792 PASS
22      10562910        22_10562910_TC_GC       TC      GC      99.1101 PASS
22      10562910        22_10562910_TC_T        TC      T       99.1101 PASS
22      10562911        22_10562911_CT_AT       CT      AT      98.5931 PASS
22      10562911        22_10562911_CT_C        CT      C       98.5931 PASS

Note that it is not just all multi-allelic indels affected as there seem to be some which are fine towards the bottom of the excerpt. 

If I split multi-allelics and rename without including the allele-names in variant id naming (e.g. --set-all-var-ids @_#) , the file is well-formed. 
Just thought I would flag  to you. 

Thanks as ever for an amazing piece of software. 

Chris Chang

unread,
Nov 22, 2025, 5:32:14 PM (12 days ago) Nov 22
to Gabriel Doctor, plink2-users
Please post full .log file(s) when reporting a bug.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/8e4c8847-6020-46ef-a165-30419e19c3fcn%40googlegroups.com.

Gabriel Doctor

unread,
Nov 22, 2025, 5:45:38 PM (12 days ago) Nov 22
to plink2-users
Sorry, below is the logfile. 
I have realised that the problem comes when combing these three commands in one step:
 --make-pgen multiallelics=- \
 --set-all-var-ids @_#_\$r_\$a \
 --new-id-max-allele-len 2 missing \
 
PLINK v2.0.0-a.7LM AVX2 Intel (18 Nov 2025)
Options in effect:
  --bcf test1.bcf
  --make-pgen multiallelics=-
  --memory 16000

  --new-id-max-allele-len 2 missing
  --out test1
  --set-all-var-ids @_#_$r_$a
  --threads 4

Hostname: job-J4V28gQJfk45B07y140Gyg13
Working directory: /home/dnanexus
Start time: Sat Nov 22 22:37:48 2025

Random number seed: 1763851068
15615 MiB RAM detected, ~14246 available; reserving 14182 MiB for main
workspace.
Using up to 4 compute threads.
--bcf: 657 variants scanned.
--bcf: test1-temporary.pgen + test1-temporary.pvar.zst + test1-temporary.psam
written.
490541 samples (0 females, 0 males, 490541 ambiguous; 490541 founders) loaded
from test1-temporary.psam.
Warning: 10 variant IDs erased by --set-all-var-ids due to allele code length.
657 variants loaded from test1-temporary.pvar.zst.
Note: No phenotype data present.
Writing test1.psam ... done.
Writing test1.pvar ... done.
Writing test1.pgen ... done.
Multiallelic split: 891 variants written.

End time: Sat Nov 22 22:37:49 2025


If I import and split in one step, and then in the next step rename and truncate, i get the expected output (as below) and the fileset is fine:
22      10562862        .       CTT     C       99.7007 PASS
22      10562862        .       CTT     GTT     99.7007 PASS
22      10562862        .       CTT     TTT     99.7007 PASS

Chris Chang

unread,
Nov 22, 2025, 7:27:00 PM (12 days ago) Nov 22
to Gabriel Doctor, plink2-users
Thanks, I've replicated this and will post a fix tomorrow.

Gabriel Doctor

unread,
Nov 23, 2025, 1:14:52 PM (11 days ago) Nov 23
to plink2-users
Many thanks this works well. 
Reply all
Reply to author
Forward
0 new messages