Hi A!
> Thank you so much for your reply, and your explanations and
> suggestions. I think what Gregor suggested was that weighted average
> of FST across chromosomes does not equate to the genome-wide
> calculation of FST as FST is a ratio statistic. In any case, I will
> try to work with the options available.
Ah, yes, I missed that, thanks! So the code snippet I posted is indeed
incorrect, I guess – for those reading this thread, watch out!
You can see the Eidos source code for calcFST() by doing:
functionSource("calcFST")
in the Eidos console. So if you want to get into it, you could
copy/paste calcFST()'s code to get mean(H_s) and mean(H_t) separately,
for each chromosome, add them together to get the whole-genome numerator
and denominator, and then calculate the final FST as 1 - H_s/H_t as
SLiM's code does, but with the whole-genome values. But Gregor did also
write "I don't know how critical this is however!" I'm not math-y
enough to have a sense of that either – whether this is a tiny detail,
or whether it might actually make an important difference. Perhaps
someone more math-y like Peter could comment on that. :-> If it does
potentially make an important difference, then I suppose there is a bug
here – SLiM ought to provide a way to calculate the correct statistic
across multiple chromosomes. (And likewise, then, for other popgen
metrics that SLiM offers, I suppose.)
> What I was trying to do in SLiM was to alter gene flow to yield
> similar genome-wide population pairwise FST to the observed statistics
> in my species, essentially creating populations with the same
> population structure as the observation. Alternatively, I can estimate
> the geneflow first (e.g., through demographic inference softwares like
> fastsimcoal2) before I run SLiM, but as the species has some specific
> biology that does not necessarily follow Wright-Fisher (e.g.,
> monogamy, potentially selection), I was hoping to do it in SLiM to
> take that into account (and also nice to play around in SLiM).
Yes, this makes sense to me as an approach.
> I was doing this by altering gene flow in SLiM depending on how the
> population pairwise FST compares to the observed genome-wide
> population pairwise FST (e.g., increase gene flow if simulated
> population pairwise FST is significantly higher than the observed
> population pairwise FST).
>
> What I think I have observed with respect to this (in SLiM 4.3) is
> that, population pairwise FST calculated from multiple linkage groups
> seem much more stable, and therefore, is easier to find the optimal
> value. Due to that, I am reluctant to use a value from just one
> chromosome.
OK, fair enough. Are you modeling full-length chromosomes for your
species of interest, or just subsets of the full chromosomes? And are
your population sizes very small? Probably FST will become more "jumpy"
the shorter the chromosomes get, the smaller the population sizes, and
the smaller the migration rates, I would imagine...? Probably also for
a low mutation rate (less mutational data for it to be based upon), and
perhaps other factors. So yes, I can see needing to average across
multiple chromosomes in some cases.
It sounds like you have an empirical genome-wide FST that you're
comparing against? In that case, I'd recommend calculating FST in SLiM
in the same way that that empirical estimate was calculated, so that
you're truly matching the empirical value. I wonder whether they
followed the procedure recommended by Gregor, because FST is a ratio, or
whether they just did a weighted average or even an unweighted average?
Do you know? I'd be curious to know what people do in "the real world",
i.e., with empirical data.
Cheers,
-B.