Hi Ryan,
Thanks for your help and time, I greatly appreciate it.
I hadn't actually ran single population models, which was an oversight. Also, I hadn't noticed the similar ancestral pop sizes! Yeah, nref is on a similar order of magnitude.
After running single-population models and a coalescent model (smc++) it looks like the "mountain" population has more likely had a decrease in population size over time, and a bottleneck and recovery is less appropriate. The coast and south populations (which better match the old paper) both have best models with an ancestral shrink in population size ~30kga, followed by a population size increase over time. I may edit the models for that one in place, but to focus on the simpler case, in case I am making errors in the usage of dadi in general
In general, it looks like numerically, the two-pop model has similar parameters, but lower theta by an order of magnitude, and larger pop sizes at bottleneck.
The present-day effective population sizes look more realistic for single-population models, and better match previous dadi results. For instance, current south population size is estimated at 23.8k in the 2018 paper, 26.3k with the single population model, as opposed to 2.2k in the two-population model.
I'm not sure if there's something in the method that's way off, or if I just need to have better parameterization of migration rate? perhaps I should code in the population size changes explicitly (with independent bottlenecks and parameters set), and just define free parameters for the timeline and rate of migration?
Any advice is appreciated, thank you again,
Isaac
Details for "Coast" and "Mountain" populations:
I ran single-population models with a bottleneck or period of bottleneck followed by exponential growth, linear growth, or instantaneous growth. Both had a best model of instantaneous bottleneck followed by linear population growth :
def instant_linear_change(params, ns, pts):
#instant change from nref into nuA, which changes to nuB across TB generations until present
nuA, nuB, TB = params
xx = Numerics.default_grid(pts)
phi = PhiManip.phi_1D(xx)
def nu_func(t):
return nuA + (nuB - nuA) * (t / TB)
phi = Integration.one_pop(phi, xx, TB, nu_func)
fs = Spectrum.from_phi(phi, ns, (xx,))
return fs
I also ran two-population models with a split followed by migration ( or not) with constant population size, two epochs, exponential size change and linear size change. For the Coast-Mountain comparison, the best model was a split followed by a linear change in population size over time, with continuous migration:
def split_linear_mig(params, ns, pts):
#split from nref into nu1A and nuu2A, which change to nu1B and nu2B across TB generations until present
nu1A, nu1B, nu2A, nu2B, TB, m12, m21= params
xx = Numerics.default_grid(pts)
phi = PhiManip.phi_1D(xx)
phi = PhiManip.phi_1D_to_2D(xx, phi)
def nu1_func(t):
return nu1A + (nu1B - nu1A) * (t / TB)
def nu2_func(t):
return nu2A + (nu2B - nu2A) * (t / TB)
phi = Integration.two_pops(phi, xx, tB,nu1_func, nu2_func,m12=m12, m21=m21)
fs = Spectrum.from_phi(phi, ns, (xx, xx))
return fs
And here are those model outputs:
Single population:
pop loglik theta nuB
Coast -222.7535 1798824 0.56951649
South -3503.3312 1461253 0.28979192
TB nuA nref
0.13469882 0.005873020 111808.69
0.13221309 0.01044610 90826.44
nB nA Tgen
63676.894 6.566547e+02 30120.997
26320.768 9.487821e+02 24016.887
Two population (Coast=1, South=2)
loglik theta nu1A nu1B
-16220.55 136357.2 0.04929965 0.5186923
nu2A nu2B TB m12 m21
0.020855868 0.2542678 0.13902554 0.28489234 0.76129817
nref n1B n2B n1A n2A
8475.4964 4396.175 2155.046 417.839 176.76383
Tgen
2356.6209
Looking at these, the models actually have some similar parameters before scaling.
nuB-Coast: 0.570
nu1B: 0.519
nuB-South: 0.290
nu2B: 0.254
TB-Coast: 0.135
TB-South: 0.132
TB-Two-pop: 0.139
but the bottleneck sizes and theta are different:
nuA-Coast: 0.0059
nu1A: 0.0492
nuA-South: 0.010
nu2A: 0.021
Coast Theta & nref: 1798824 111808.69
South Theta & nref: 1461253 90826.44
Two pop theta & nref: 136357.2 8475.4964