Fst calculation

1,383 views
Skip to first unread message

Carol Buitrago

unread,
Sep 15, 2016, 4:36:20 AM9/15/16
to Stacks
Hello stacks users,

I was wondering if you could help me solve another "basic" question. So after running the pipeline for SNPs calling I also performed the populations test (this off course after using the correction module, rxstacks, and re running the modules cstacks, sstacks).

This were the parameters I used:

# m=5 n=9 r=0.5 

/home/carol/RADseq/tools/stacks-1.39/populations -b 1 -P ./ -M /home/carol/RADseq/Pilot2_NexSeq/Demultiplexed_data/Trimmed/Stylo-alignments/stacks/rstacks/stylo_popmap_n9 -e pstI -k -r 0.5 -p 2 -m 5 -t 30 --fstats -f p_value --bootstrap --bootstrap_reps 1000 --verbose -s --vcf --vcf_haplotypes --genepop --structure --plink --fasta --fasta_strict --genomic
  
The summary results among my two populations was:

#324 loci retained. Removing 33 additional loci for which all variant sites were filtered... retained 291 loci.
#overall Fst=0.021047
#Pooled populations 'maq' and 'far' contained: 0 incompatible loci; 0 nucleotides covered by more than one RAD locus.

I further checked the table for the AMOVA Fst values and all those other inter-population statistics finding out that for some loci I have a high differentiation Fst=1 while for others differentiation was non-existent or very low Fst=0

I was then curious on how the overall Fst had been calculated. I found in one of the stacks group questions that you stated that it is the average of the Fst values that I get in the batch_1.fst_pop1-pop2.tsv. However, when I tried to corroborate that myself and I get a different value for the average AMOVA Fst (0.161577502), Fst (0.100752859) and the overall Fst calculated by the program (0.021047) (Files are attached). Could you help me figure out how is this overall Fst is being calculated?

Also, could you provide me the equation used to calculate the AMOVA Fst. I tried to go to the reference you give in the stacks manual (Genetic Data Analysis II, chapter 5) but the book is locked, so unless I buy it I cannot see how this AMOVA Fst is calculated.

Thanks in advance,

Carol
batch_1.fst_maq-far(bootstrapped).tsv
batch_1.fst_summary.tsv

Carol Buitrago

unread,
Oct 3, 2016, 6:38:50 AM10/3/16
to stacks...@googlegroups.com
Please!!!! someone help me with this. I'm really confused about the overal Fst calculation and how in particular AMOVA Fst is being calculated!!!

Thanks in advance, 

Carol

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to a topic in the Google Groups "Stacks" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stacks-users/BLcDbgSGVyQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stacks-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/stacks-users.
For more options, visit https://groups.google.com/d/optout.

Julian Catchen

unread,
Oct 11, 2016, 10:16:10 PM10/11/16
to stacks...@googlegroups.com, carol.b...@gmail.com
Hi Carol,

When you ran populations, you specified a p-value correction of the AMOVA Fst calculation. Therefore, if you average the values from the column, "Corrected AMOVA Fst" you will see that they match the value reported in the average for the whole genome (0.021047). If you disable the correction, you should see the value match the "AMOVA Fst" column. Stacks does not use the binomial Fst value ("Fst") as it is biased when sample sizes are different, however it remains in the software for historical reasons. We should probably rename the column so it seems less useful.

Here is how Fst is calculated:



i iterates from over populations 1 and 2, p_i is the allele frequency in population i, p^bar is the average allele frequency for both populations. n_i and n^bar are the sample sizes to account for different numbers of alleles in the two populations.

julian
September 15, 2016 at 3:36 AM

Carol Buitrago

unread,
Oct 13, 2016, 10:56:52 AM10/13/16
to Julian Catchen, stacks...@googlegroups.com
Dear Julian,

Thank you very much for the explanation. Now the average Fst value makes a lot more sense,

Sincerely, 

Carol 

Yasuo

unread,
Aug 11, 2017, 5:44:11 AM8/11/17
to Stacks, carol.b...@gmail.com, jcat...@illinois.edu
Hello Julian,

I'd like to use STACKS for calculating Fst. Please teach me what r in the formula is.

yasu

2016年10月12日水曜日 11時16分10秒 UTC+9 Julian Catche
Reply all
Reply to author
Forward
0 new messages