Nested models 1D LRT test

131 views

Skip to first unread message

Mathilde SALAMON

unread,

Jan 3, 2023, 4:52:41 PM1/3/23

to dadi-user

Hello Ryan,

I am trying to do a LRT test on two 1D models that I thought were nested: the more complex model is a bottlegrowth with a known time for the bottleneck (fixed parameter and inferred theta) and the nested model is a two epochs. Both models give very similar results for the log likelihood (more complex = -3979, nested = -3972) and spectrums/residuals.

I thought I would be able to do the LRT test with nuF the recovery population size as the nested parameter but I am running into errors related to this parameter being set as 0:

Traceback (most recent call last):
File "lrt_test_GEO.py", line 108, in <module>
adj = dadi.Godambe.LRT_adjust(func_ex, pts_l, all_boot, p_lrt, data,
File "/home/msalomon/.local/lib/python3.8/site-packages/dadi/Godambe.py", line 377, in LRT_adjust
GIM, H, J, cU = get_godambe(diff_func, grid_pts, all_boot, p_nested, data,
File "/home/msalomon/.local/lib/python3.8/site-packages/dadi/Godambe.py", line 221, in get_godambe
hess = -get_hess(func, p0, eps, args=[data])
File "/home/msalomon/.local/lib/python3.8/site-packages/dadi/Godambe.py", line 119, in get_hess
f0 = func(p0, *args)
File "/home/msalomon/.local/lib/python3.8/site-packages/dadi/Godambe.py", line 211, in func
cache[key] = func_ex(params, ns, grid_pts)
File "/home/msalomon/.local/lib/python3.8/site-packages/dadi/Godambe.py", line 374, in diff_func
return func_ex(full_params, ns, grid_pts)
File "/home/msalomon/.local/lib/python3.8/site-packages/dadi/Numerics.py", line 374, in extrap_func
result_l = list(map(partial_func, pts_l))
File "lrt_test_GEO.py", line 41, in bottlegrowth_1d
phi = dadi.Integration.one_pop(phi, xx, T = Tb, nu = nu_func, theta0 = theta)
File "/home/msalomon/.local/lib/python3.8/site-packages/dadi/Integration.py", line 243, in one_pop
raise ValueError('A population size is 0. Has the model been '
ValueError: A population size is 0. Has the model been mis-specified?

Is there a way to write these models so that they are nested ?

I am sending the code for the LRT test below, and the SFS figures in attachment.

Thank you,

Mathilde

####### Setup #######

# Numpy is the numerical library dadi is built upon

import numpy as np

import dadi

import dadi.Godambe

####### Import data and generate fs #######

dd = dadi.Misc.make_data_dict("dadi_input_killarney_GEO_REC")

pop_ids, ns = ['GEO'], [400]

fs = dadi.Spectrum.from_data_dict(dd, pop_ids, ns, polarized = False) # fs folded because the ancestral state of SNPs is unknown

# We can save our extracted spectrum to disk

fs.mask[0:20] = True # mask the first 20 alleles because low confidence in frequency calling

fs.to_file('GEO.fs')

##### Likelihood ratio test #####

# Redefine the two custom demography models

def bottlegrowth_1d(params, ns, pts):

"""

Instantanous size change followed by exponential growth.

Tb is a fixed parameter and theta is inferred with the other parameters using multinom = False during optimization

params = (nuB,nuF,T)

ns = (n1,)

nuB: Ratio of population size after instantanous change to ancient

population size

nuF: Ratio of contemporary to ancient population size

Tb: Time in the past at which instantaneous change happened and growth began

(in units of 2*Na generations) = is a fixed parameter

n1: Number of samples in resulting Spectrum

pts: Number of grid points to use in integration.

"""

nuB,nuF,theta = params

xx = dadi.Numerics.default_grid(pts)

phi = dadi.PhiManip.phi_1D(xx, theta0=theta)

Tb = (48*2*2.64e-09*617993816)/theta

nu_func = lambda t: nuB*np.exp(np.log(nuF/nuB) * t/Tb)

phi = dadi.Integration.one_pop(phi, xx, T = Tb, nu = nu_func, theta0 = theta)

fs = dadi.Spectrum.from_phi(phi, ns, (xx,))

return fs

def two_epoch(params, ns, pts):

"""

Instantaneous size change some time ago.

Tb is a fixed parameter and theta is inferred with the other parameters using multinom = False during optimization

params = (nu,theta)

ns = (n1,)

nu: Ratio of contemporary to ancient population size

T: Time in the past at which size change happened (in units of 2*Na

generations)

n1: Number of samples in resulting Spectrum

pts: Number of grid points to use in integration.

"""

nu,theta = params

xx = dadi.Numerics.default_grid(pts)

phi = dadi.PhiManip.phi_1D(xx, theta0=theta)

Tb = (48*2.64e-09*617993816)/theta

phi = dadi.Integration.one_pop(phi, xx, T = Tb, nu = nu, theta0=theta)

fs = dadi.Spectrum.from_phi(phi, ns, (xx,))

return fs

# These are the grid point settings we will use for extrapolation.

pts_l = [410,420,430] # good results are obtained by setting the smallest grid size slightly larger than the largest population sample size

# Select the more complex demographic function

func = bottlegrowth_1d

# Make the extrapolating version of our demographic model function.

func_ex = dadi.Numerics.make_extrap_log_func(func)

# These are the best-fit parameters (nuB, nuF, Theta), which we found by multiple optimizations

popt = np.array([8.44E-06, 29.941, 1.31E+06])

# Calculate the best-fit model AFS.

model = func_ex(popt, ns, pts_l)

# Likelihood of the data given the model AFS.

ll_model = dadi.Inference.ll(model, fs)

# Load the data

data = fs

ns = data.sample_sizes

all_boot = [dadi.Spectrum.from_file('/home/msalomon/temporary/dadi_killarney_recovery/killarney_1D_GEO_fixedtime/bootstrapping/minGEO.subset.boot_{0}.fs'.format(ii))

for ii in range(100)]

### Let's do a likelihood-ratio test comparing models with and without recovery

# The no recovery model is implemented as two_epoch

func_two_epoch = two_epoch

func_ex_two_epoch = dadi.Numerics.make_extrap_log_func(func_two_epoch)

# These are the best-fit parameters (nu, Theta), which we found by multiple optimizations

popt_two_epoch = np.array([0.00013,1.32E+06])

model_two_epoch = func_ex_two_epoch(popt_two_epoch, ns, pts_l)

ll_two_epoch = dadi.Inference.ll(model_two_epoch, data)

# Since LRT evaluates the complex model using the best-fit parameters from the

# simple model, we need to create list of parameters for the complex model (nuB, nuF, Theta)

# using the simple (no recovery) best-fit params. Since evalution is done with more

# complex model, need to insert zero values at corresponding

# parameters index in complex model. And we need to tell the LRT adjust function

# that the 1st parameter (counting from 0) is the nested one.

p_lrt = [8.44E-06, 0, 1.31E+06]

adj = dadi.Godambe.LRT_adjust(func_ex, pts_l, all_boot, p_lrt, data,

nested_indices=[1], multinom=False)

D_adj = adj*2*(ll_model - ll_two_epoch)

print('Adjusted D statistic: {0:.4f}'.format(D_adj))

# Because this is test of a parameter on the boundary of parameter space

# (nuF cannot be less than zero), our null distribution is an even proportion

# of chi^2 distributions with 0 and 1 d.o.f. To evaluate the p-value, we use the

# point percent function for a weighted sum of chi^2 dists.

pval = dadi.Godambe.sum_chi2_ppf(D_adj, weights=(0.5,0.5))

print('p-value for rejecting no-migration model: {0:.4f}'.format(pval))

recovery_twoepochs_opt13_GEO_REC.png

recovery_bottlegrowth_opt20_GEO_REC.png

Ryan Gutenkunst

unread,

Jan 9, 2023, 4:39:35 PM1/9/23

to dadi-user

Hello Mathilde,

There are some subtleties here…

First, we want to run the complex model with parameter values that make it the same model as the best-fit simple model. In your case, that would be
p_lrt = [0.00013,0.00013,1.32E+06]).

But there’s another subtlety in defining the nested indices. As written, the first two parameters in your bottlegrowth_1d function need to be manipulated simultaneously to create the simple two_epoch model. To fix that, I suggest rewriting your bottlegrowth_1d function, so the second parameter is the degree of recovery…

def bottlegrowth_1d_RNG(params, ns, pts):

"""
Instantanous size change followed by exponential growth.
Tb is a fixed parameter and theta is inferred with the other parameters using multinom = False during optimization

params = (nuB,recovery_ratio,T)

ns = (n1,)

nuB: Ratio of population size after instantanous change to ancient
population size

recovery_ratio: Ratio of contemporary to bottleneck population size

Tb: Time in the past at which instantaneous change happened and growth began
(in units of 2*Na generations) = is a fixed parameter
n1: Number of samples in resulting Spectrum
pts: Number of grid points to use in integration.
"""

nuB,recovery_ratio,theta = params

xx = dadi.Numerics.default_grid(pts)
phi = dadi.PhiManip.phi_1D(xx, theta0=theta)

Tb = (48*2*2.64e-09*617993816)/theta

nu_func = lambda t: nuB*np.exp(np.log(recovery_ratio) * t/Tb)

phi = dadi.Integration.one_pop(phi, xx, T = Tb, nu = nu_func, theta0 = theta)

fs = dadi.Spectrum.from_phi(phi, ns, (xx,))
return fs

So your best-fit parameters for this version of the model should be np.array([8.44E-06, 29.941/8.44E-06, 1.31E+06]).

And to reproduce your simple model, you would have p_lrt = [0.00013,1,1.32E+06]). And nested_indices = [1].

And in this case you’re testing a parameter on the interior of the domain (whether recovery_ratio=1), so you’d use a chi-squared with 1 d.o.f.
dadi.Godambe.sum_chi2_ppf(D_adj, weights=(0, 1))

The key here is thinking about the relationship between the two models in the simplest way possible, in terms of parameter values.

Best,
Ryan

> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/49fd9af1-a257-4aea-941e-3a72f31d0523n%40googlegroups.com.
> <recovery_twoepochs_opt13_GEO_REC.png><recovery_bottlegrowth_opt20_GEO_REC.png>

Reply all

Reply to author

Forward

0 new messages