Fst differences between Populations and Genepop

1,328 views
Skip to first unread message

Manel Vera

unread,
Feb 5, 2014, 1:13:04 PM2/5/14
to stacks...@googlegroups.com
Dear all;
We have run Populations with the next command line to estimate pairwise Fst values among three populations:

populations -b 1 -P /home/geneticafc/gambusia/stacks/ -M /home/geneticafc/gambusia/amerpop -r 0.9 -p 3  -m 10 -a 0.05  --genepop

According to bachtX.fst_Y-Z.tsv files, the Fst values obtained were based on aprroximately 5000-13000 SNPs. However, the number of SNPs in the genepop file generated by Populations were up to 50000, and obviously estimated Fst values provided by Genepop software were different and much higher. What is the reason for the different amount of SNPs?  Does somebody know how the SNPs are exported to Genepop file? Are the -r, -p, -m and -a commands also filtering the SNPs exported to the Genepop file?

Thanks a lot for your help,
All the best
Manel

Alicia Mastretta

unread,
Feb 6, 2014, 7:07:18 AM2/6/14
to stacks...@googlegroups.com
I have also wondered about this. It sounds to me that the Fst are being calculated only with the loci that match the -a flag you are using (min allele freq 0.05) while genpop used all available loci. But not sure, so please let us know if you figure it out.

Manel Vera

unread,
Feb 6, 2014, 10:51:21 AM2/6/14
to stacks...@googlegroups.com
Dear Alicia;
Thanks for your reply. We are sure that SNPs with a MAF lower than 0.05, this SNP is removed in pairwise Fst calculated by Populations when the allele is involved. However, all the genotypes for the SNPs are included in the Genepop file. But we are not sure that this is the reason for the large differences among pairwise Fst values estimated with Populations and Genepop. Can somebody help us with it?
Thanks for your help
Manel

El dijous 6 de febrer de 2014 13:07:18 UTC+1, Alicia Mastretta va escriure:

Thierry Gosselin

unread,
Feb 6, 2014, 11:27:09 AM2/6/14
to stacks...@googlegroups.com
Dear Alicia and Manel,

I think the reason for these differences are because Stacks will use only the first SNP on the loci for the FST estimates
whereas the Genepop file generated with populations display all the SNP found in your sumstats.tsv file.
If you need to bypass this, use custom script or VCFTools with the VCF files exported from populations and/or PGDSpider to convert back to Genepop

hope this help
Thierry

Julian Catchen

unread,
Feb 23, 2014, 6:07:24 PM2/23/14
to stacks...@googlegroups.com, thierry...@me.com, manuel...@udg.edu, tic...@gmail.com
Hi All,

Stacks will calculate Fst values from every polymorphic site between every pair of populations. The "write_single_snp" option does not apply to the Fst statistics (at least, we haven't applied it yet), as we implemented that option only to support exports to file formats (like Structure) that do not want linked SNPs.

The filters in populations that require a locus to be present in X populations or Y individuals in a population are applied generally to the dataset immediately after all the data is read from disk. This means they will be applied to all downstream statistics including Fst. However, these filters apply to a RAD locus itself, not to each SNP in a RAD locus.

The Genepop data will be output for your entire dataset. So, if a site is polymorphic between populations 1 and 2, but not 2 and 3, it will appear in Stacks' Fst calculation for 1 and 2, but not for the Fst of 2 and 3, however, it will be output for 1, 2, and 3 in the genepop output, which does not make any distinctions (it just has to be polymorphic in one of the populations).

Finally, as you know, there are many ways to calculate Fst. Stacks provides two methods, one described in Hohenlohe, Bassham, et al. 2010, PLoS Genetics that involves sums of binomial coefficients and Pi, and a second method described in Weir's Data Analysis II, based on AMOVA (which is the current preferred method). I don't know what Genepop uses, but I would guess it is different, so you wouldn't expect Fst to match exactly.

But, this is an empirical question. The allele frequencies can be observed directly in the web interface for a particular locus, and the various Fst calculations made by hand in say R and checked against the values output in Stacks' Fst output. Stacks' populations also includes a diagnostic flag, --log_fst_comp, which will cause the intermediate values for the Fst calculation to be output to a file, so they can be verified.

I think if you compare one or two loci from your data set that give different Fst measures in Stacks and Genepop in this way it will become clear why they are different.

Best,

julian

Thierry Gosselin wrote:
--
Stacks website: http://creskolab.uoregon.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages