Hello,
We're alphabetizing SNP IDs based on variants from a VCF file using plink2.
A command for example is:
# create plink format files
plink2 \
--set-all-var-ids @:#:\$1:\$2 \
--vcf samples.vcf.gz \
--out samples_alphabetical \
--make-bed \
--new-id-max-allele-len 70 missing
We're encountering an issue where plink creates SNP IDs based on only one ALT allele in multiallelic sites.
For example, given a multiallelic site in a vcf file, with two ALT alleles:
1 1234 . TA TAA,T
The created .pvar file, has an SNP ID based on only one of the alternative alleles – (here it's keeping the TAA alternative allele while ignoring the T alternative allele).
1 1234 1:1234:TA:TAA TA TAA,T
However, we would like to have in the .pvar file a SNP ID relying on the other ALT allele:
1:1234:T:TA
Is there a way to make plink generate two lines in the .pvar file in case of multiallelic sites, one for each ALT allele?
In this case, the .pvar files will have 2 lines for this position
1 1234 1:1234:TA:TAA …
1 1234 1:1234:T:TA. …
Alternatively, can we control somehow which ALT allele is taken for the creation of the SNP IDs?
Thank you!
Noe