no info for non variable sites in populations.all.vcf

165 views
Skip to first unread message

Zjon Coleman

unread,
Oct 10, 2023, 12:32:15 AM10/10/23
to Stacks
Hi  

I am trying to use the  --vcf-all  option in the populations program on stacks 2.65. It outputs a vcf file with all sites however the non variable sites do not have any information other than the genotype (GT). The fields for DP, AD, GQ and GL are all not present for non variable sites. Is the output --vcf-all file supposed to look like this and is there any way of obtaining this information in the output?    

Many thanks, 
Zjon 


Catchen, Julian

unread,
Oct 10, 2023, 3:25:18 PM10/10/23
to stacks...@googlegroups.com

Hi Zjon,

 

You are correct that detailed information per site is only recorded on sites found to be variable in the metapopulation. We do record all the site depths internally, but to record all the details of the model for every nucleotide would end up being a very large amount of data that has to be stored in the catalog. In principle, the depth could be output into the --vcf-all output, but what is your use case that would require this type of data?

 

Best,

 

julian

Zjon Coleman

unread,
Oct 11, 2023, 8:25:35 AM10/11/23
to Stacks

Hi Julian,

Thanks for getting back to me. We were wanting to apply further filtering on the resultant vcf file (including both a minimum and maximum depth cutoff, and genotype quality filtering) and then use that file for calculating nucleotide diversity as well as heterozygosity using all sites. Our concern is that if I filter the variable sites but not the linked invariant sites in each locus, we will get a biased estimate of nucleotide diversity. We also wanted to include invariant stacks for these calculations which currently have no depth information at all.    

Many thanks,

Zjon 

Tom Schmidt

unread,
Nov 19, 2023, 11:53:21 PM11/19/23
to Stacks
Hi all,

I would also greatly appreciate an option to have the DP and other fields included in the --vcf-all output, for the same reasons as stated by Zjon. This would be extremely useful for using this output in estimating individual-level heterozygosity.

Best regards
Tom

Reply all
Reply to author
Forward
0 new messages