Hi Julian - Hi stacks community,
I understand that the Fst p-value is calculated using Fisher's exact test between the observed Fst values per window and 'an empirical null distribution', which is obtained by resampling with replacement for n times (bootstrap).
First question, just to clarify:
The p-value I am getting does not actually describe the significance of a given SNP but of the window centering at that SNP, right?
Next, could you please give me some more details on how the Fisher's exact test is done in this case? Please excuse my ignorance, but I understands the test requires a contingency table and I can't seem to figure out for certain how the table would look like in this case. Actually I have a hypothesis (see below), but need clarification if I am correct..
Let's say I have a window with 3 SNPs and the observed Fst values:
S1 = 0.895457823
S2 = 1.000000000
S3 = 0.723214286
Now, lets say I would do only 2 bootstraps and I get:
B1_1 = 0.065826331
B1_2 = 0.209597523
B1_3 = 0.038699690
...
B2_1 = 0.097135741
B2_2 = 0.074468085
B2_3 = 0.031250000
What would my table look like?
I was thinking I could round all Fst values to e.g. 1 decimal..
S1 = 0.9
S2 = 1.0
S3 = 0.7
B1_1 = 0.1
B1_2 = 0.2
B1_3 = 0.0
...
B2_1 = 0.1
B2_2 = 0.1
B2_3 = 0.0
bin them and then make a count table like so:
FST Sample Bootstrap
0.0 0 2
0.1 0 3
0.2 0 1
0.7 1 0
0.9 1 0
1.0 1 0
Fisher's exact in R gives me p-value = 0.1071 for this test.
Is this how stacks does it??
if yes I was wondering at what level the binning is done? In the above example I rounded to 1 decimal.
If no, please could anybody enlighten me? Any info would be much appreciated!!
Thanks in advance for your time!
cheers,
Christoph