VCF output

1,120 views
Skip to first unread message

IlaC

unread,
Apr 23, 2013, 11:15:57 AM4/23/13
to stacks...@googlegroups.com
Dear all,

I have a question about the VCF output I obtained from Populations.

It looks like this:

#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    BB04
un    106    2    C    T    .    PASS    NS=64;AF=0.961:0.039;    GT:DP:GL    0/1:25:.,-14.2639,.
un    107    2    A    C    .    PASS    NS=62;AF=0.032:0.968;    GT:DP:GL    1/1:25:.,.,.
un    111    2    A    G    .    PASS    NS=65;AF=0.008:0.992;    GT:DP:GL    1/1:25:.,.,.
un    115    2    A    G    .    PASS    NS=65;AF=0.008:0.992;    GT:DP:GL    1/1:25:.,.,.
un    117    2    C    T    .    PASS    NS=60;AF=0.700:0.300;    GT:DP:GL    0/0:25:.,.,.
un    128    2    A    G    .    PASS    NS=65;AF=0.023:0.977;    GT:DP:GL    1/1:25:.,.,.
un    134    2    A    G    .    PASS    NS=65;AF=0.992:0.008;    GT:DP:GL    0/0:25:.,.,.
un    148    2    A    T    .    PASS    NS=65;AF=0.008:0.992;    GT:DP:GL    1/1:25:.,.,.
un    149    2    C    T    .    PASS    NS=60;AF=0.025:0.975;    GT:DP:GL    1/1:25:.,.,.
un    172    2    G    T    .    PASS    NS=61;AF=0.295:0.705;    GT:DP:GL    1/1:25:.,.,.
un    173    2    A    G    .    PASS    NS=57;AF=0.509:0.491;    GT:DP:GL    0/1:25:.,-20.9655,.
un    180    2    A    T    .    PASS    NS=65;AF=0.992:0.008;    GT:DP:GL    0/0:25:.,.,.
un    198    3    C    T    .    PASS    NS=27;AF=0.611:0.389;    GT:DP:GL    .:0:.,.,.
un    202    3    C    T    .    PASS    NS=27;AF=0.963:0.037;    GT:DP:GL    .:0:.,.,.
un    212    3    A    C    .    PASS    NS=26;AF=0.962:0.038;    GT:DP:GL    .:0:.,.,.
un    213    3    C    T    .    PASS    NS=27;AF=0.981:0.019;    GT:DP:GL    .:0:.,.,.

I am struggling to understand a few things about it.
Why I have the same ID repeated so many times? What do I make of all these dots in the QUAL column and then in the last (individual) column?
Last question... I have 192 individuals in total, and by reading around I thought I could retain SNPs that were present in at least 80% of the total, but it doesn't seem to apply to my data, since the NS are all so low. Am i interpreting it right?

I have read the VCFtools pages and the Genome 1000 ones, but am lost.

The scripts I used are:

denovo_map.pl -m 5 -M 5 -n 2 -T 15 –t \

-S –b 1 \

-o ~/stacks_denovo \

-s ~/stacks/lane1/BB04.fq \

[…]




populations -S -m 5 -r 0.02 -b 1 -s -P ./ -k -M ./popmap --vcf --phylip --genepop --structure


I hope someone will be able to help me!

Thanks a lot

Ilaria.


pierre-alexandre gagnaire

unread,
Apr 23, 2013, 12:50:51 PM4/23/13
to stacks...@googlegroups.com
Dear Ilaria,

For a given locus ID, the second column gives the position of each variable nucleotide. So here it appears that for locus ID #2 you have 12 SNPs, that is 1 SNP every 6 bp on average. That's a lot if you consider that about 65 individuals were sequenced for that locus. However, maybe that your data contain highly polymorphic or highly structured populations? If you consider that this level of polymorphism is too high for your species, maybe you could try to increase the minimal number of identical reads to create a stacks (m) and decrease the number of mismatches between alleles (M). Applying a MAF threshold (e.g. 0.05) will also remove many SNPs for which the rare allele segregate at a low frequency.

The first 8 columns are mandatory columns in a VCF file. However, Stacks does not provide a phred-scaled quality score for the alternate allele and therefore does not write any value in the QUAL column, which is left empty with dots.
Dots in the genotype fields correspond to genotype likelihoods, which are only provided for the heterozygote genotype in heterozygote individuals.

If you
want to retain loci that are genotyped in at least 80% of your individuals, you have to write -r 0.8 (instead of -r 0.02) in your population command line.

Hope this helps,
Pierre
 



2013/4/23 IlaC <ilaria...@gmail.com>



--
--
For more options or to unsubscribe: http://groups.google.com/group/stacks-users
Stacks website: http://creskolab.uoregon.edu/stacks/
 
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Pierre-Alexandre GAGNAIRE Ph.D
ANR Post-doctoral Research Fellow

Institut des Sciences de l’Évolution (ISEM)
Université Montpellier 2
Station Méditerranéenne de l’Environnement Littoral
2, rue des Chantiers
34200 SETE
Tel: +33(0)4 67 46 33 75
Fax: +33(0)4 67 46 33 99
e-mail: paga...@univ-montp2.fr
Web page: https://sites.google.com/site/kaugfqfdg/home

IlaC

unread,
Apr 24, 2013, 4:08:36 AM4/24/13
to stacks...@googlegroups.com
Dear Pierre,

it most definitely helps!
One more thing is not very clear to me, is the relationship between -r used in populations and NS in the VCF file. If I set r at 0.8, does it meant that the NS values will be all above 80% of the total number of my samples?

Thanks a lot

Ilaria.

pierre-alexandre gagnaire

unread,
Apr 24, 2013, 4:36:00 AM4/24/13
to stacks...@googlegroups.com
Yes, there should be at least 80% of individuals for each population (I think that this filter is applied for each population instead of being applied to the total number of samples). So at the end the NS value should be at least 0.8*192 in your case.

Pierre


2013/4/24 IlaC <ilaria...@gmail.com>

Ilaria

unread,
Apr 24, 2013, 5:14:56 AM4/24/13
to stacks...@googlegroups.com
Thanks you very much. I'll run it again with new settings and see how the results change.

Ilaria.


You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/VOODrsI8qTc/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to stacks-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages