WARNING:Numerics:Extrapolation may have failed. Check resulting frequency spectrum for unexpected results & WARNING:Inference:Model is masked in some entries where data is not.

792 views
Skip to first unread message

jlblanc...@gmail.com

unread,
May 23, 2017, 3:05:55 AM5/23/17
to dadi-user
Hi all, 

I am getting warning messages for ALL steps during optimization.

The first warning:
WARNING:Numerics:Extrapolation may have failed. Check resulting frequency spectrum for unexpected results.

This warning happens for this model:

def Model_A ((Na2, N6, N1_5, N7, M15_6, M7_15, T0, T1, T2), ns, pts):
 
#Define the grid we'll use
 xx
= dadi.Numerics.default_grid(pts)
 
#phi for the equilibrium ancestral population
 phi
= dadi.PhiManip.phi_1D(xx)
 
#Na1 population split
 phi
= dadi.PhiManip.phi_1D_to_2D(xx, phi)
 
#Time delay before the second population split
 
Integration.two_pops(phi, xx, T0, nu1=N6, nu2=Na2, m12=0, m21=0)
 
#Na2 population split
 phi
= dadi.PhiManip.phi_2D_to_3D_split_2(xx, phi)
 
#Time delay before admixture events
 phi
= Integration.three_pops (phi, xx, T1,
 nu1
=N6, nu2=N1_5, nu3=N7,
 m12
=0, m13=0, m21=0,
 m23
=0, m31=0, m32=0)
 
#Admixture events
 phi
= Integration.three_pops (phi, xx, T2,
 nu1
=N6, nu2=N1_5, nu3=N7,
 m12
=0, m13=0, m21=M15_6,
 m23
=0, m31=0, m32=M7_15)
 fs
= Spectrum.from_phi(phi,ns,(xx,xx,xx))
 
return fs

The second warning (example):

WARNING:Numerics:Extrapolation may have failed. Check resulting frequency spectrum for unexpected results.
8.0 134929.0
WARNING:Inference:Model is masked in some entries where data is not.
WARNING:Inference:Number of affected entries is 47248. Sum of data in those entries is 8:
154     , -268318     , array([ 2.56353    ,  0.29348    ,  1.39267    ,  2.2643     ,  0.564593   ,  0.413592   ,  0.0205958  ,  0.0157483  ,  0.00499629 ])

Here is my second model:

def Model_B ((Na2, N6, N1_5T1, N1_5T2, N1_5, N7, M15_6, M7_15, T0, T1, T2), ns, pts):
 
#Define the grid we'll use
 xx
= dadi.Numerics.default_grid(pts)
 
#phi for the equilibrium ancestral population
 phi
= dadi.PhiManip.phi_1D(xx)
 
#Na1 population split
 phi
= dadi.PhiManip.phi_1D_to_2D(xx, phi)
 
#Time delay before the second population split
 
Integration.two_pops(phi, xx, T0, nu1=N6, nu2=Na2, m12=0, m21=0)
 
#Na2 population split
 phi
= dadi.PhiManip.phi_2D_to_3D_split_2(xx, phi)
 
#Time delay before admixture events
 N1_5_func1
=  lambda t: N1_5T1 + (N1_5T2 - N1_5T1)*t/T1
 phi
= Integration.three_pops (phi, xx, T1,
 nu1
=N6, nu2=N1_5_func1, nu3=N7,
 m12
=0, m13=0, m21=0,
 m23
=0, m31=0, m32=0)
 
#Admixture events
 N1_5_func2
=  lambda t: N1_5T2 + (N1_5 - N1_5T2)*t/T2
 phi
= Integration.three_pops (phi, xx, T2,
 nu1
=N6, nu2=N1_5_func2, nu3=N7,
 m12
=0, m13=0, m21=M15_6,
 m23
=0, m31=0, m32=M7_15)
 fs
= Spectrum.from_phi(phi,ns,(xx,xx,xx))
 
return fs


The parameters of the analyses:

dd = Misc.make_data_dict('dadi_input_100.txt')

count_dict
= dadi.Misc.count_data_dict(dd, pop_ids=['clade1_5', 'cluster6', 'cluster7'])
pts_l
= [300,320,330]

funcA
= dadi_models.Model_A
funcB
= dadi_models.Model_B


# Parameters model A are: (Na1, Na2, N6, N1_5, N7, M15_6, M7_15, T0, T1, T2)
upper_boundA
= [100, 100, 100, 100, 2, 2, 0.18, 0.18, 0.18]
lower_boundA
= [1e-2, 1e-2, 1e-2, 1e-2, 0, 0, 5.3e-4, 5.3e-4, 5.3e-4]


# Parameters model B are: (Na1, Na2, N6, N1_50, N1_5, N7, M15_6, M7_15, T0, T1, T2)
upper_boundB
= [100, 100, 100, 100, 100, 100, 2, 2, 0.18, 0.18, 0.18]
lower_boundB
= [1e-2, 1e-2, 1e-2, 1e-2, 1e-2, 1e-2, 0, 0, 5.3e-4, 5.3e-4, 5.3e-4]


# This is our initial guess for the parameters, which is somewhat arbitrary.
p0A
= [1,1,1,1,1,1,1e-2,1e-2,1e-2]
p0B
= [1,1,1,1,1,1,1,1,1e-2,1e-2,1e-2]


# Make the extrapolating version of our demographic model function.

funcA_ex
= dadi.Numerics.make_extrap_log_func(funcA)
funcB_ex
= dadi.Numerics.make_extrap_log_func(funcB)


# Perturb our parameters before optimization. This does so by taking each
# parameter a up to a factor of two up or down.
p0A1
= dadi.Misc.perturb_params(p0A, fold=1, upper_bound=upper_boundA,
                              lower_bound
=lower_boundA)


p0B1
= dadi.Misc.perturb_params(p0B, fold=1, upper_bound=upper_boundB,
                              lower_bound
=lower_boundB)


poptA1
= dadi.Inference.optimize(p0A1, data, funcA_ex, pts_l,
                                   lower_bound
=lower_boundA,
                                   upper_bound
=upper_boundA,
                                   verbose
=1, maxiter=20)


poptB1
= dadi.Inference.optimize(p0B1, data, funcB_ex, pts_l,
                                   lower_bound
=lower_boundB,
                                   upper_bound
=upper_boundB,
                                   verbose
=1, maxiter=20)


I have checked solutions in other posts (narrow parameter bounds, increase grid points) but nothing seems to work. Any ideas?

Thanks!

José

jlblanc...@gmail.com

unread,
May 23, 2017, 3:08:29 AM5/23/17
to dadi-user, jlblanc...@gmail.com

I forgot to mention. I have 3 populations with 50 individuals each (100 gene copies each)

José.

Gutenkunst, Ryan N - (rgutenk)

unread,
May 24, 2017, 4:40:26 PM5/24/17
to dadi...@googlegroups.com
Hello Jose,

Try linear rather than logarithmic extrapolation. That can be more stable.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.

--
Ryan Gutenkunst
Associate Professor of Molecular and Cellular Biology, University of Arizona
phone: (520) 626-0569, office: LSS 325, web: http://gutengroup.mcb.arizona.edu

Latest papers: 
“Selection on network dynamics drives differential rates of protein domain evolution”
PLoS Genetics; http://dx.doi.org/10.1371/journal.pgen.1006132
"Inferring demographic history using two-locus statistics"
Genetics; 
http://doi.org/10.1534/genetics.117.201251

jlblanc...@gmail.com

unread,
May 29, 2017, 12:12:36 PM5/29/17
to dadi-user, rgu...@email.arizona.edu
Hi Ryan,
Thanks for the response.

After changing to linear extrapolation, the first set of warnings dissapeared but not the second:

WARNING:Inference:Model is < 0 where data is not masked.
WARNING:Inference:Number of affected entries is 36736. Sum of data in those entries is 1:


but warning persist.

Thanks again,
José


El miércoles, 24 de mayo de 2017, 22:40:26 (UTC+2), Gutenkunst, Ryan N - (rgutenk) escribió:
Hello Jose,

Try linear rather than logarithmic extrapolation. That can be more stable.

Best,
Ryan

Gutenkunst, Ryan N - (rgutenk)

unread,
May 31, 2017, 7:23:33 PM5/31/17
to dadi...@googlegroups.com
Hello Jose,

You’re probably getting very small negative entries for low probability regions in the frequency spectrum (where there’s not much data). If this is happening occasionally, it’s fine. If it’s happening on many steps, then you will want to manually run models for the problematic parameter sets and examine the resulting spectra.

Best,
Ryan
Message has been deleted
Message has been deleted

jlblanc...@gmail.com

unread,
Jun 1, 2017, 3:01:49 AM6/1/17
to dadi-user, rgu...@email.arizona.edu

Hi Ryan, 

Thanks for your response.

This was happening on all steps, but if I let the analysis run for a few days with the warning messages suddenly stop at some point as shown in the picture below.


 

It seems that the optimizer founds a parameter space with non problematic values. But I think the optimizer is taking too long to stabilize. Is there a way to make the optimizer find this stable model parameters faster?.

 

José 

Gutenkunst, Ryan N - (rgutenk)

unread,
Jun 1, 2017, 7:06:34 PM6/1/17
to dadi...@googlegroups.com
Hi Jose,

The issue appears to be caused by the 4th parameter in your model, which is changing from  almost exactly 1 to something non-1. Is that some sort of fractional split of population size? If so, you’re probably seeing issues in which the population size is getting very small, leading to very fast drift and poor integration results. I suggest restricting your parameter space to exclude those extreme values, if you don’t think they’re biologically plausible.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages