Understanding Fst

65 views
Skip to first unread message

Richard Kerr

unread,
Oct 6, 2020, 8:29:53 PM10/6/20
to msprime-users
Hi,

I am using msprime to better understand provenance structure in a eucalyptus species that has been sampled widely in its native range.

I am deliberating simulating provenances by defining an array of msprime.PopulationConfigurations. Amongst these populations there will be merges (looking back in time) and migration etc.

ts = msprime(simulate
....,  population_configurations=pop_configs, ...)

Using the tskit help pages I was able to work out how to obtain Fst between populations. Suppose I have three populations (provenances)

A = ts.samples(0)
B = ts.samples(1)
C = ts.samples(2)

ts.Fst( sample_sets=[A, B, C], indexes=[(0,1), (0,2), (1,2)] )

I get back pairwise FST values that look similar to what you get if were to get Fst values from STRUCTURE-defined genetic clusters (if you were to dump your samples (diploidised) into a vcf file and analyse it with STRUCTURE or ADMIXTURE).

However Jacquard talks about a statistic which he calls the "inbreeding coefficient of a population", and Cockerham speaks of "the coancestry of the line with itself". So how would you use ts.Fst to get a Fst value within a population, which would be a measure of homozygosity within a population? 

Georgia Tsambos

unread,
Oct 6, 2020, 10:41:16 PM10/6/20
to Richard Kerr, msprime-users
Hi Richard,

Thanks for your message! I think you should be able to specify that by comparing each sample set with itself, ie. by adding (0,0), (1,1) and (2,2) into your list of indexes for populations A, B and C respectively.

Cheers,
Georgia 

--
You received this message because you are subscribed to the Google Groups "msprime-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to msprime-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/msprime-users/ef89e806-82f6-4a9c-a1a2-79aa9beb9231n%40googlegroups.com.

Georgia Tsambos

unread,
Oct 6, 2020, 10:42:25 PM10/6/20
to Richard Kerr, msprime-users
Ps. Nice to see msprime getting some use on native Australian species!

Peter Ralph

unread,
Oct 7, 2020, 1:07:17 AM10/7/20
to Georgia Tsambos, Richard Kerr, msprime-users
Yes, thanks, and good question! A correction, though: you *can*
compare a sample set to itself, but doing so you get 0.0, so that's no
good.

I think you're asking about F_IS? Unfortunately, this requires
computing individual heterozygosity, which is something we haven't
worked out an efficient way to do, yet. It's a very good question! One
could do it by (a) computing the mean heterozygosity of each
individual using ts.diversity( ), and then (b) using this, along with
ts.Fst, to get F_IS. This should work fine, just be a little slow.
Perhaps we should open an issue on msprime's github to figure out how
to do this?

-- peter
> To view this discussion on the web, visit https://groups.google.com/d/msgid/msprime-users/CAEZ2ae2tBWaps36vgWb6J14M_mbnjnPAqegfQQL_veF0z3xovw%40mail.gmail.com.

Richard Kerr

unread,
Oct 7, 2020, 7:09:10 PM10/7/20
to msprime-users
Thanks Peter and Georgia,

I did try with indexes (0,0), (1,1), (2,2). Upon getting 0.0 as the answer, I realised that wasn't going to work.

The idea about F_IS sounds good, though I am a little sketchy on the details. Are you able to elaborate on your step (b) using the estimates of individual diversity along with ts.Fst to get F_IS?

I support opening an issue. I can do this if you like, but I very much have my training wheels on (unsure of the processes of opening issues and assigning it to milestones, people etc? ). I certainly would like to get more involved because I think this is great software and seems well designed.

Richard

Peter Ralph

unread,
Oct 7, 2020, 7:33:35 PM10/7/20
to Richard Kerr, msprime-users
I've written out thoughts over here:
https://github.com/tskit-dev/tskit/issues/166#issuecomment-705246942
... please feel free to weigh in there.

* peter
> To view this discussion on the web, visit https://groups.google.com/d/msgid/msprime-users/3399aad3-0aea-4a0f-ad64-fd72e0a7051an%40googlegroups.com.

Richard Kerr

unread,
Oct 12, 2020, 1:34:20 AM10/12/20
to msprime-users

Hi Peter,

I am still very much a msprime novice and your suggestion to compute mean individual heterozygosity has confused me somewhat. I gather from reading issue #166 that you originally proposed a method for allowing people to deal with (diploid) individuals.

Thus in your post to https://github.com/tskit-dev/tskit/issues/166 you have a code snippet that says

def mean_heterozygosity(ts):

  ind_nodes = []

  for ind in ts.individuals():

    ind_nodes.append(ind.nodes)

 

If I get a ts object returned from msprime.simulate, then ts.individuals is basically null. So, I guess I must be missing a step, in which I will create some diploid individuals from my haploid samples. Is this correct? I probably have missed the discussion proper about getting this feature implemented. Any help will be kindly received.

 

Cheers,

Richard

Reply all
Reply to author
Forward
0 new messages