Two population demographic models- snm and bottlegrowth

546 views
Skip to first unread message

Li

unread,
Apr 20, 2016, 10:10:19 AM4/20/16
to dadi-user

Dear Ryan and dadi users


I'm trying to understand what these models means.
Is the two population demographic model.snm mean that we have two standard neutral populations that never diverge but are completely separate? or these "two" populations are actually one pop?


Same question for the two population demographic model.bottlegrowth:  we have two pop and their both had experience a bottleneck and then had an exponential growth separately, or again we actually giving the model two pop samples but they are from the same population?
And way this model don’t return fs but  " bottlegrowth_split_mig((nuB,nuF,0,T,0), ns, pts)"


I'm asking that because in my case I'm not sure if I see two groups that are actually from the same population, meaning completely panmixia in the total population, or that one population is a sub group of the other one. In this case we can be looking at a founder effect.
To test the second hypothesis I can use a model of two population model.IM, and to add a bottleneck affect to one population and then exponential growth to the population that had a bottleneck.
To test if the first hypothesis is right I'm not sure if to take the two "pop" that I have and create a an input file for the data dictionary that count the two pops as one, and then pot them in the one pop models, or to live them as two pop and test the two pop models- snm or bottlegrowth.

I hope I was clear enough and you can give me an advice


Tanks

Li 

Gutenkunst, Ryan N - (rgutenk)

unread,
Apr 21, 2016, 5:43:04 PM4/21/16
to dadi...@googlegroups.com
Hello Li,

On Apr 20, 2016, at 7:10 AM, Li <b12l...@gmail.com> wrote:

Dear Ryan and dadi users

I'm trying to understand what these models means.
Is the two population demographic model.snm mean that we have two standard neutral populations that never diverge but are completely separate? or these "two" populations are actually one pop?

It's a model of one snm population being teated as two.

Same question for the two population demographic model.bottlegrowth:  we have two pop and their both had experience a bottleneck and then had an exponential growth separately, or again we actually giving the model two pop samples but they are from the same population?
And way this model don’t return fs but  " bottlegrowth_split_mig((nuB,nuF,0,T,0), ns, pts)"


Yes, same as the 1D bottlegrowth. It's implemented that way, because it's simply a special case of the bottlegrowth_split model.

This is really a holdover from early versions of dadi when it wasn't easy to hold variables constant during optimization. Now I'd probably just use the fixed_params argument in the optimizers.

I'm asking that because in my case I'm not sure if I see two groups that are actually from the same population, meaning completely panmixia in the total population, or that one population is a sub group of the other one. In this case we can be looking at a founder effect. 

To test the second hypothesis I can use a model of two population model.IM, and to add a bottleneck affect to one population and then exponential growth to the population that had a bottleneck.
To test if the first hypothesis is right I'm not sure if to take the two "pop" that I have and create a an input file for the data dictionary that count the two pops as one, and then pot them in the one pop models, or to live them as two pop and test the two pop models- snm or bottlegrowth.

For the first hypothesis, it's probably better to do a permutation test directly. Essentially, if you permute the population labels between your sampled individuals, do you get frequency spectrum that are significantly different from the observed spectrum? There's a built-in method in dadi (fs.scramble_pop_ids) that can help with this. It generates the average spectrum expected over all permutations of the individuals in your data. So you would be asking whether your observed spectrum is statistically different from that spectrum. This is a better approach than modeling the history explicitly, because you're robust to errors you might make in that modeling.

Best,
Ryan

I hope I was clear enough and you can give me an advice

Tanks

Li 


--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.

--
Ryan Gutenkunst
Assistant Professor of Molecular and Cellular Biology, University of Arizona
phone: (520) 626-0569, office: LSS 325, web: http://gutengroup.mcb.arizona.edu

Latest papers: "Triallelic population genomics for inferring correlated fitness effects of same site nonsynonymous mutations"
"Whole genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection"

Li

unread,
Apr 27, 2016, 2:05:24 PM4/27/16
to dadi-user
Hello Ryan
Thank you for your replay, 

I don't understand how to compare between the "scramble" output and the real data.
I tried to do it with the commend : dadi.Plotting.plot_2d_comp_multinom
I'm not sure how to interpret it and I also get an warning so I'm pretty sure I'm in the wrong way.

the warning:
warnings.warn("Warning: converting a masked element to nan.")

the script:

import dadi
import numpy,scipy,pylab
dd = dadi.Misc.make_data_dict('dadi/lirandata/datasnpr75.txt')
data = dadi.Spectrum.from_data_dict(dd, ['GR','KN','SA'], [18,14,14], polarized=False)
fsGRKN = data.marginalize([2])
scramble=dadi.Spectrum.scramble_pop_ids(fsGRs, mask_corners=True)
dadi.Plotting.plot_2d_comp_multinom(scramble, fsGR, vmin=0.1, resid_range=3,pop_ids = ('GR','KN'))

Thanks
Li

Gutenkunst, Ryan N - (rgutenk)

unread,
Apr 27, 2016, 5:23:51 PM4/27/16
to dadi...@googlegroups.com
Hello Li,

I’m assuming that there are some typos in the code below, because the fs variable name is inconsistent. In any case, what output are you getting from your plotting command? The code below is a reasonable way of going about it.

Best,
Ryan

Li

unread,
Apr 28, 2016, 7:03:20 AM4/28/16
to dadi-user
Hi Ryan

I fixed the previous mistakes with the fs variable name, but I still get the same warning.
Attach it the output, it seems like a good fit but I'm not sure if good enough.

Background: when I did this kind of permutation test creating a distribution of the number of differences between all pairs individuals. I found that  pop1 isn't different from the total population(pop1+pop2) but pop2 is significantly different from the total population. From that point I came to dadi to try to unravel the relationship between the two populations in a better way.

Many thanks
Li
scramble.png

Gutenkunst, Ryan N - (rgutenk)

unread,
May 2, 2016, 7:42:59 PM5/2/16
to dadi...@googlegroups.com
Hello Li,

There are some systematic residuals indicating some differentiation between your populations beyond that expected if they were well-mixed, but it’s hard to judge by eye how significant those are. My approach would be to use permutations to create a null distribution, then check whether your real data fit to the permuted model is unlikely under the null distribution.

In more detail:
1) I would measure deviation between your data and the permutation model using a chi^2 statistic, like we did in the PLoS paper. In code, that would be chi2 = numpy.sum((data - model)^2/numpy.sqrt((data+model/2)))
2) To create your null distribution of chi2, I would permute individuals among GR and KN many times (in the real data), calculating the frequency spectrum each time. Then I would calculate chi2 for each of those permuted data sets. This measures, under your null, what the distribution of chi2 should be, account for all the linkage, etc. that might be in your data.
3) Then I would compare your real chi2 (from 1) with the null distribution (from 2). If real chi2 is in the high tail of the distribution, then you can reject the model that the two populations aren’t diverged.

Best,
Ryan

<scramble.png>
Reply all
Reply to author
Forward
0 new messages