Dear all,
I have a question about the VCF output I obtained from Populations.
It looks like this:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BB04
un 106 2 C T . PASS NS=64;AF=0.961:0.039; GT:DP:GL 0/1:25:.,-14.2639,.
un 107 2 A C . PASS NS=62;AF=0.032:0.968; GT:DP:GL 1/1:25:.,.,.
un 111 2 A G . PASS NS=65;AF=0.008:0.992; GT:DP:GL 1/1:25:.,.,.
un 115 2 A G . PASS NS=65;AF=0.008:0.992; GT:DP:GL 1/1:25:.,.,.
un 117 2 C T . PASS NS=60;AF=0.700:0.300; GT:DP:GL 0/0:25:.,.,.
un 128 2 A G . PASS NS=65;AF=0.023:0.977; GT:DP:GL 1/1:25:.,.,.
un 134 2 A G . PASS NS=65;AF=0.992:0.008; GT:DP:GL 0/0:25:.,.,.
un 148 2 A T . PASS NS=65;AF=0.008:0.992; GT:DP:GL 1/1:25:.,.,.
un 149 2 C T . PASS NS=60;AF=0.025:0.975; GT:DP:GL 1/1:25:.,.,.
un 172 2 G T . PASS NS=61;AF=0.295:0.705; GT:DP:GL 1/1:25:.,.,.
un 173 2 A G . PASS NS=57;AF=0.509:0.491; GT:DP:GL 0/1:25:.,-20.9655,.
un 180 2 A T . PASS NS=65;AF=0.992:0.008; GT:DP:GL 0/0:25:.,.,.
un 198 3 C T . PASS NS=27;AF=0.611:0.389; GT:DP:GL .:0:.,.,.
un 202 3 C T . PASS NS=27;AF=0.963:0.037; GT:DP:GL .:0:.,.,.
un 212 3 A C . PASS NS=26;AF=0.962:0.038; GT:DP:GL .:0:.,.,.
un 213 3 C T . PASS NS=27;AF=0.981:0.019; GT:DP:GL .:0:.,.,.
I am struggling to understand a few things about it.
Why I have the same ID repeated so many times? What do I make of all these dots in the QUAL column and then in the last (individual) column?
Last question... I have 192 individuals in total, and by reading around I thought I could retain SNPs that were present in at least 80% of the total, but it doesn't seem to apply to my data, since the NS are all so low. Am i interpreting it right?
I have read the VCFtools pages and the Genome 1000 ones, but am lost.
The scripts I used are:
denovo_map.pl -m 5 -M 5 -n 2 -T 15 –t
\
-S –b 1 \
-o ~/stacks_denovo \
-s ~/stacks/lane1/BB04.fq \
[…]
populations -S -m 5 -r 0.02 -b 1 -s
-P ./ -k -M ./popmap --vcf --phylip --genepop --structure
I hope someone will be able to help me!
Thanks a lot
Ilaria.