Hi,
Not sure whether this issue has been reported previously. Due to occasional possible sequencing errors in the human reference sequence, some biallelic SNPs have 2 alleles listed in the ALT field of a VCF file, while the allele in the REF field is never observed. For those cases, genotype fields in the VCF can be one of 1/1, 1/2 and 2/2, and allele 0 is never observed. When converting a VCF to PLINK format, for these SNPs, PLINK always sets the A1 allele equal to the REF allele, A2 equal to the major allele, and everything else equal to missing. Is there a way to change this behavior? This affects hundreds to thousands of variants in large whole-genome sequencing datasets.
Thanks!
Jeroen