Why alleles of ref ALT convert to "." on output vcf file?

820 views
Skip to first unread message

Hiro Ama

unread,
Apr 6, 2021, 6:57:15 PM4/6/21
to plink2-users
Hi, I tried to convert the format of genotyping output file: .ped to .vcf. I found some variant information contain "." on the ALT column of vcf made by plink. Why were these alleles judged to period meaning deletion? In original ped file, there is variant information that does not contain '0' and converted to '.' in vcf. I can understand that when a variant information contain many '0' on the original ped file, the variant will be judged to be deletion. But there are some variants that have no '0' on the ped and convert to deletion on the vcf. So, I'd like to know a criterion to judge a positon deletion.

Christopher Chang

unread,
Apr 6, 2021, 7:41:46 PM4/6/21
to plink2-users
The issue is that, for some variants in a .ped file, only one allele code is present.  In this case, the .ped does not tell us what the other possible allele(s) are for the variant.

You should also be aware that .ped files don't track which alleles are REF vs. ALT.  If you don't want those to be incorrectly swapped in the VCF file, you'll need to use a command like plink 2.0's --ref-from-fa.

Hiro Ama

unread,
Apr 6, 2021, 10:09:33 PM4/6/21
to plink2-users
Sorry, I'm worry about whether I correctly understand that you told. So could you explain me the below questions well?
1. "for some variants in a .ped file, only one allele code is present" mean that when only one allele exit on the variant ID in the dataset (all of the dataset have only one allele on the location), the ALT allele on output vcf show "."?

2. ".ped files don't track which alleles are REF vs. ALT" means that we must not track the alleles using ped file?

3. if I use the option "--ref-from-fa" with reference fasta, I will be get the vcf that does not contain '0' meaning deletion? 

2021年4月7日水曜日 8:41:46 UTC+9 chrch...@gmail.com:

Christopher Chang

unread,
Apr 7, 2021, 11:55:32 AM4/7/21
to plink2-users
1. Yes.
2. Correct.  You shouldn't be using .ped files at all in 2021 if you can help it; it manages to be bloated, inefficient to parse even for its size, and information-losing, all at the same time.  Try to use .pgen+.pvar+.psam when you need to keep track of REF vs. ALT.
3. It is not clear what you're asking here.  But I will say two things which might be related:
  (a) --ref-from-fa is only reliable for SNPs.  It may not be able to determine which allele is REF vs. ALT when an insertion or deletion is involved.
  (b) --ref-from-fa only tries to fix the REF allele column.  It will not fill in missing ALT alleles.  If you need those, you'll need to track them down from another data source, such as dbSNP.

Hiro Ama

unread,
Apr 7, 2021, 8:52:02 PM4/7/21
to plink2-users
About the first question, I have a another question. Because of it, when the period on the ALT allele on output vcf, the period mean that having the same allele to the REF?, not mean deletion on the position? Sorry, it's a basic knowledge but I've assumed that on vcf file, the period on the ALT column mean a deletion, as well as the below columns showing sample alleles, for long time.

In addition, I had a mistake on my column yesterday. About the third question, I had to write "the vcf that does not contain '.' on the ALT". I apologize that you were confused about it.

2021年4月8日木曜日 0:55:32 UTC+9 chrch...@gmail.com:

Christopher Chang

unread,
Apr 7, 2021, 8:57:48 PM4/7/21
to plink2-users
'.' in the ALT column does not mean "deletion".  It means that there are no ALT alleles defined at all.  See the example at the top of the VCF specification, where the entry with ALT=. is described as "a site that is called monomorphic reference (i.e. with no alternate alleles)".

Hiro Ama

unread,
Apr 7, 2021, 9:53:07 PM4/7/21
to plink2-users
OK. I got it. Thank you for spending your time.

2021年4月8日木曜日 9:57:48 UTC+9 chrch...@gmail.com:

Hiro Ama

unread,
Apr 11, 2021, 10:55:53 PM4/11/21
to plink2-users
Sorry, I have a one more question. Aren't there any methods to output the ALT allele to the same character of the REF allele, not '.' when the ALT and REF alleles are same?

2021年4月8日木曜日 9:57:48 UTC+9 chrch...@gmail.com:
'.' in the ALT column does not mean "deletion".  It means that there are no ALT alleles defined at all.  See the example at the top of the VCF specification, where the entry with ALT=. is described as "a site that is called monomorphic reference (i.e. with no alternate alleles)".

Christopher Chang

unread,
Apr 12, 2021, 11:25:53 AM4/12/21
to plink2-users
That's a violation of the VCF specification, so it's not supported by plink.
Reply all
Reply to author
Forward
0 new messages