Hi Julian and other users,
I think there may be some bugs in the output of vcf files in the latest version of STACKS (v1.29). here is a snippet of my VCF file for illustrative purposes:
1 un 28 1 C T . PASS NS=50;AF=0.970,0.030 GT:DP:AD:GL ./.:0:.,.:.,.,. ./.:0:.,.:.,.,. 0/0:16:8,8:.,22.18,. 0/0:11:5,6:.,15.25,.
2 un 77 2 A T . PASS NS=37;AF=0.973,0.027 GT:DP:AD:GL 0/0:13:6,7:.,18.02,. 0/0:9:4,5:.,12.48,. 0/0:7:3,4:.,9.7,. 0/0:12:6,6:.,16.64,.
4 un 201 4 C T . PASS NS=51;AF=0.902,0.098 GT:DP:AD:GL 0/1:26:11,15:.,36.04,. 0/0:14:7,7:.,19.41,. 0/0:11:5,6:.,15.25,. 0/0:25:12,13:.,34.66,.
5 un 415 7 C T . PASS NS=20;AF=0.925,0.075 GT:DP:AD:GL ./.:0:.,.:.,.,. ./.:0:.,.:.,.,. 0/0:5:2,3:.,6.93,. 0/0:5:2,3:.,6.93,.
6 un 479 8 G T . PASS NS=46;AF=0.935,0.065 GT:DP:AD:GL 0/0:11:5,6:.,15.25,. 0/0:11:5,6:.,15.25,. 0/0:4:2,2:.,5.55,. 0/0:17:8,9:.,23.57,.
ISSUE 1: Why do allele depths (AD) always appear in increasing order for the depths of the two alleles within a sample? I expected this to be random which makes me suspect something is amiss.
ISSUE 2: The reported allele depths do not seem to equate with the reported genotypes as homozygous (0/0, 1/1) or heterozygous (0/1). For example, you can see that all of the reported genotypes except 1 are 0/0 despite all of them having coverage for both alleles.
I only get a heterozygote called once when allele coverage was very high. certain the genotype calling model does not report a depth pf 7,7 to be a homozygote?
ISSUE 3: Why do all my loci appear to be heterozygotes based on the allele depth (AD) profiles? That is impossible!
Perhaps I am misunderstanding the vcf output here, or perhaps there are some serious BUGS in the VCF output.
Best wishes,
Jason
tail -n +10 batch1.vcf | cut -f 3 | uniq | wc -l
tail -n +2 batch1.haplotypes.tsv | wc -l