VCF polarization

30 views
Skip to first unread message

Miles Anderson

unread,
Feb 13, 2024, 8:41:29 AMFeb 13
to dadi-user
Hi, Ryan

I have a large VCF in which we included an outgroup sample. I have a python script that makes sure a base call is present at each position in the outgroup sample and then assigns that call as the ancestral allele if non-missing. Then the samples with a matching allele are assigned a 0 and a different allele are assigned 1. The issue is this ancestral/derived coding isn't supported when making a spectrum from the VCF as there is no explicit outgroup information in the VCF fields, I assume. The fs is always populated by zeroes.

Is there a way to circumvent this and use the 0 as an ancestral call and 1 as derived? Or is there another software you would recommend to polarize a VCF based on what base call one of the samples has? Hadn't had any luck before finding a way to get a reliable ancestral call in a VCF, hence why we wrote the python script.

Thanks,
Miles

Ryan Gutenkunst

unread,
Feb 13, 2024, 11:49:37 AMFeb 13
to dadi-user
Hi Miles,

If you’re already modifying the VCF file, the easiest way is to set the “AA=“ INFO flag in each row of the VCF file to match the outgroup sample.

If you don’t want to modify the VCF file, and you’re using dadi in Python, you can manually add the ‘outgroup_allele’ key to each element of the data dictionary derived from parsing the VCF file.

Best,
Ryan
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/007cc0ad-4ee0-4b2b-94f9-d080272e2a58n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages