Confirming results 1D model growth vs two-epochs vs bottlegrowth

61 views
Skip to first unread message

Mathilde SALAMON

unread,
Dec 18, 2024, 10:19:14 PM12/18/24
to dadi-user
Hello Ryan,

I ran 1D demographic models for a population which we suspect has undergone population contraction at a known time T in the past; T was thus fixed in the model (and theta inferred). I optimized four possible models: neutral, growth, two-epochs and bottlegrowth.

After the optimization, out of 4 models ranked by log likelihood (growth -43614 > bottleneck + growth -79543 > two-epochs -98970 > neutral -45699816), the best model by far is the model with growth, with a lot of difference in log likelihood between models (i.e. well differentiated). However, the growth parameter inferred nu (ratio of contemporary to ancient effective population size), is very small (0.00003). Am I understanding well that this translates into a contraction of the effective population size at the known time T ? It's a bit surprising that the best model isn't the two epochs then.

Here are the final plots for each model:

Growth
LUM_REC_recovery_growth_opt18.png
BottlegrowthLUM_REC_recovery_bottlegrowth_opt21.png
Two-epochs
LUM_RECrecovery_twoepochs_opt6.png

Neutral
LUM_REC.png
And here is the growth function that I used:

def growth(params, ns, pts):
    """
    Exponential growth beginning some time ago.
    Tg is a fixed parameter and theta is inferred with the other parameters using multinom = False during optimization

    params = (nu,theta)
    ns = (n1,)

    nu: Ratio of contemporary to ancient population size
    Tg: Time in the past at which growth began (in units of 2*Na
       generations)
    n1: Number of samples in resulting Spectrum
    pts: Number of grid points to use in integration.
    """
    nu,theta = params

    xx = dadi.Numerics.default_grid(pts)
    phi = dadi.PhiManip.phi_1D(xx, theta0=theta)

    Tg = (168*2.64e-09*319822866)/theta
    nu_func = lambda t: np.exp(np.log(nu) * t/Tg)
    phi = dadi.Integration.one_pop(phi, xx, T = Tg,  nu = nu_func, theta0 = theta)

    fs = dadi.Spectrum.from_phi(phi, ns, (xx,))
    return fs

These results were obtained with ns = 400 (large size because of pool-seq) and a minimum grid size of 510.

Thank you for your help,
Best wishes,
Mathilde Salamon

Ryan Gutenkunst

unread,
Dec 19, 2024, 12:56:38 PM12/19/24
to dadi-user
Hello Mathilde,

Yes, that very small nu parameter in the growth model would imply an exponential decline to a very small final population size.

Note that all of these fits are terrible. What is the theta estimate for the snm model? (Can’t use a time to fix theta for that case…) That should help you see what’s plausible with your other modeling.

I suspect something is erroneous in how you’re fixing T and the interaction with theta. Since T_gen = 2*Na*T_model and theta = 4*Na*mu*L, we have T_model = T_gen/(2 *Na) = 2*mu*L*T_gen/theta. So from your expression I guess the population size is suspected to have changed 84 generations ago. Are you getting reasonable estimates of Na from this procedure?

Best,
Ryan

On Dec 18, 2024, at 8:19 PM, Mathilde SALAMON <mathilde...@gmail.com> wrote:

Hello Ryan,

I ran 1D demographic models for a population which we suspect has undergone population contraction at a known time T in the past; T was thus fixed in the model (and theta inferred). I optimized four possible models: neutral, growth, two-epochs and bottlegrowth.

After the optimization, out of 4 models ranked by log likelihood (growth -43614 > bottleneck + growth -79543 > two-epochs -98970 > neutral -45699816), the best model by far is the model with growth, with a lot of difference in log likelihood between models (i.e. well differentiated). However, the growth parameter inferred nu (ratio of contemporary to ancient effective population size), is very small (0.00003). Am I understanding well that this translates into a contraction of the effective population size at the known time T ? It's a bit surprising that the best model isn't the two epochs then.

Here are the final plots for each model:

Growth
<LUM_REC_recovery_growth_opt18.png>
Bottlegrowth<LUM_REC_recovery_bottlegrowth_opt21.png>
Two-epochs
<LUM_RECrecovery_twoepochs_opt6.png>

Neutral
<LUM_REC.png>
And here is the growth function that I used:

def growth(params, ns, pts):
    """
    Exponential growth beginning some time ago.
    Tg is a fixed parameter and theta is inferred with the other parameters using multinom = False during optimization

    params = (nu,theta)
    ns = (n1,)

    nu: Ratio of contemporary to ancient population size
    Tg: Time in the past at which growth began (in units of 2*Na
       generations)
    n1: Number of samples in resulting Spectrum
    pts: Number of grid points to use in integration.
    """
    nu,theta = params

    xx = dadi.Numerics.default_grid(pts)
    phi = dadi.PhiManip.phi_1D(xx, theta0=theta)

    Tg = (168*2.64e-09*319822866)/theta
    nu_func = lambda t: np.exp(np.log(nu) * t/Tg)
    phi = dadi.Integration.one_pop(phi, xx, T = Tg,  nu = nu_func, theta0 = theta)

    fs = dadi.Spectrum.from_phi(phi, ns, (xx,))
    return fs

These results were obtained with ns = 400 (large size because of pool-seq) and a minimum grid size of 510.

Thank you for your help,
Best wishes,
Mathilde Salamon


--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/fa6c7c4a-4786-4e09-94b5-3108568e1ef4n%40googlegroups.com.
<LUM_REC_recovery_bottlegrowth_opt21.png><LUM_REC_recovery_growth_opt18.png><LUM_RECrecovery_twoepochs_opt6.png><LUM_REC.png>

Mathilde SALAMON

unread,
Dec 19, 2024, 9:32:13 PM12/19/24
to dadi-user
Hello Ryan,

thank you for your fast answer and help, I really appreciate it.

The optimal value that I get for theta in the neutral model is 551,856.

I believe your formulation and mine match: The change in population size is suspected to have happened ~42 years from when the sample originated, with 2 generations per year (0.5 year generation time) for the species, so indeed 84 generations ago. I started from T_gen = 2*Na*T_model*generation time, then got to T_model = (42*4*mu*L)/theta. Did I get something wrong ?

Based on theta = 551,856, it gives me Na = 84,562, which doesn't seem completely unrealistic. We are studying a lake copepods (zooplankton) population. This recent study from marine copepods (likely much larger population sizes) found an Na value of ~400K, also using dadi. We used different mutation rates: I used mu = 2.64 × 10 −9 based on a study on snapping shrimps, whereas they used a mutation rate of 1 × 10 −9 based on another study on arthropods, although the rate they used doesn't match the rate reported in the study cited. An older study on lake copepods also found small effective population sizes.

Is it possible that the models have poor fits because there might be better alternative models ? Or could it have something to do with the observed SFS ? This data was generated with pool-seq, which I know hasn't been thoroughly validated for dadi.

Happy holidays,
Mathilde

Ryan Gutenkunst

unread,
Dec 20, 2024, 5:23:43 PM12/20/24
to dadi-user
Hi Mathilde,

Qualitatively, your data have a lot of intermediate-frequency variants (almost flat SFS, or maybe even increasing past frequency 50/200) and a relative deficit of low-frequency variants. To try and match that, the model wants to infer a population size reduction. But with that best-fit theta, your T_model ends up being 0.00025, so the population decline is very, very, very recent in terms of population genetic units. To generate a substantial change in allele frequencies in that very short interval, the population size decline needs to be similarly severe.

I’m nervous about the increase in intermediate frequency variants compared to low. That’s very unusual and it might be the result of some data processing step.

I would also see what you get if you fit a two-epoch model without fixing the time of the decline. I suspect you’ll infer a much older decline that is much less severe. It’s possible that dominates the signal and the recent decline is just two recent to be seen in SFS-based analyses.

Best,
Ryan

Mathilde SALAMON

unread,
Jan 5, 2025, 3:42:54 PMJan 5
to dadi-user
Hi Ryan,

Happy new year ! Thank you for your insights, they are very helpful.

Our study species produce resting eggs in non-overwintering individuals which hatch in spring, whereas wintering individuals reproduce then die in spring, thus populations are a mix of current and previous generations (possibly > 10 generations in the past). Another factor is that the genome was potentially affected by balancing selection, as we believe that the event which caused the possible population reduction in the past also caused the selection of resistant phenotypes, after which selection was inverted when the environment recovered. Is it possible that these processes could lead to the excess of intermediate allele frequency variants ? Resting stages seem to reduced intermediate MAF in this paper, although our study species does not reproduce asexually in a cyclical fashion as Daphnia.

I will run the two epoch model without a fixed time of decline as you suggested.

Best wishes,
Mathilde Salamon

Ryan Gutenkunst

unread,
Jan 15, 2025, 10:47:10 AMJan 15
to dadi-user
Hello Mathilde,

Sorry for the slow reply. I don’t know about the complex generation structure; that would require some careful modeling to think through the implications. Certainly widespread balancing selection (or strong linkage to balanced loci) could generate a strong excess of intermediate frequency alleles. If you know enough about your genome to identify regions that are less likely to be under strong selection (far from genes for example), those might give a better picture of the actual demographic history.

Best,
Ryan

Reply all
Reply to author
Forward
0 new messages