WARNING:Inference:Model is nan in some entries where data is not masked.

956 views
Skip to first unread message

Giordano

unread,
Apr 28, 2015, 1:18:07 PM4/28/15
to dadi...@googlegroups.com
Hi Ryan,

I'm trying to model very simple demographic scenarios between two populations (e.g. split, split whith gene flow, split with gene flow and changes in the Ne) from genomic data. Unfortunately I keep getting this:
WARNING:Inference:Model is nan in some entries where data is not masked.
WARNING:Inference:Number of affected entries is 6888. Sum of data in those entries is 2.22286e+06:

I've tried what you suggested in a previous post: increase the  number of grid points, tight the range of the upper-lower parameters values, but with no success.
I've tried even models with increased complexity and number of parameters and to reduce the number of variants.  
As far as only one population at a time was modeled  everything worked fine.

Please find the script I used and attached the sfs plot.

Thanks a lot for your support! Looking forward to provide any further details which could help in resolve the issue.

Best regards,
Giordano   

def split_mig( params , ns , pts ):
    nu1 , nu2 ,T , m = params
    xx = Numerics.default_grid( pts )
    phi = PhiManip.phi_1D( xx )
    phi = PhiManip.phi_1D_to_2D( xx , phi )
    phi = Integration.two_pops(phi , xx , T , nu1 , nu2 , m12 =m , m21 = m )
    fs = Spectrum.from_phi( phi , ns , ( xx , xx ))
    return fs

fs = fs_3L + fs_3R
fig = pylab.figure(1)
dadi.Plotting.plot_single_2d_sfs(fs, vmin=1)
ns = fs.sample_sizes

pts = [300, 300]

params = array([1, 1, 1, 1])
lower_bound = [0.01, 0.01, 0, 0]
upper_bound = [100, 100, 5, 10]

func = split_mig
    
func_ex = dadi.Numerics.make_extrap_log_func(func)



l = list()
for i in range(50):
    p0 = dadi.Misc.perturb_params(params, fold=1.5, lower_bound=lower_bound, upper_bound=upper_bound)

    popt = dadi.Inference.optimize_log(p0, fs, func_ex, pts, lower_bound=lower_bound, upper_bound=upper_bound, 
                                   verbose=len(params), maxiter = 300, multinom=True)
    l.append(popt)
Screen Shot 2015-04-28 at 18.17.03.png

Gutenkunst, Ryan N - (rgutenk)

unread,
May 7, 2015, 1:20:49 PM5/7/15
to dadi...@googlegroups.com
Hello Giordano,

Apologies for my slow reply. I've been traveling recently, and this slipped through the cracks.

The issue is in defining your pts list for extrapolation. You want it to be an increasing number of points, whereas you just have [300, 300]. We typically recommend quadratic extrapolation, so a list of 3 pts settings. Something like [200, 220, 240] should work well for your problem. You could probably get by with coarse grids like [150, 160, 170] if you need faster speed.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.
<Screen Shot 2015-04-28 at 18.17.03.png>

--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325

Giordano

unread,
May 11, 2015, 1:18:50 PM5/11/15
to dadi...@googlegroups.com
Hi Ryan,

thanks a lot for your time and support! 

Unfortunately the issue is still present even defining the pts list with an increasing number of points as [200, 220, 240].  But if I project the spectra to a smaller number of samples, let's say 20 chromosomes for pop and using this grid [40, 50, 60], the optimisation procedure works properly. The issue is present again at 60 chromosomes for pop with this grid [100, 110, 120].
Should I consider to use a smaller number of samples or is there a way to get around it?

Thanks a lot again.

Cheers,
Giordano 

Gutenkunst, Ryan N - (rgutenk)

unread,
May 12, 2015, 6:06:00 AM5/12/15
to dadi...@googlegroups.com
Hi Giordano,

Is this happening for all parameter values during the optimization, or just a few? Are your fits going to extreme parameter values? (Such as migration rates > 10.) That can cause numerical problems even with relatively large numbers of grid points. In any case, going to even larger grid points should fix it, at the cost of speed.

Best,
Ryan

Giordano

unread,
May 14, 2015, 2:50:30 PM5/14/15
to dadi...@googlegroups.com
Hi Ryan,

Is this happening for all parameter values during the optimization, or just a few?

- Sorry, could you give me a clue on how to figure out which parameters is causing the issue during the optimization?

Are your fits going to extreme parameter values? (Such as migration rates > 10.)

- No, they are not going to extreme values, for example migration rate between 0.1 and 2.

I'll try using a very big pts.

Thanks a lot again!  

Cheers,
Giordano

Gutenkunst, Ryan N - (rgutenk)

unread,
May 14, 2015, 3:54:56 PM5/14/15
to dadi...@googlegroups.com
Hi Giordano,

On May 14, 2015, at 11:50 AM, Giordano <botta.g...@gmail.com> wrote:
Is this happening for all parameter values during the optimization, or just a few?

- Sorry, could you give me a clue on how to figure out which parameters is causing the issue during the optimization?

When calling the optimization function, add the argument "verbose=1". This will print out every parameter set that is tried. Then the question is: Are you getting warnings for all or most of your parameter sets, or is it only occasionally?

Best,
Ryan

Giordano

unread,
May 15, 2015, 7:52:17 AM5/15/15
to dadi...@googlegroups.com
Hi Ryan,

please find attached a notebook with an example analysis with the Warnings and the spectra used, maybe is the best way to figure out what is happening.  

Thanks a lot for your support!

Best,
Giordano
Screen Shot 2015-05-15 at 13.46.58.png
20150513_dadi_polarized_example.html

Gutenkunst, Ryan N - (rgutenk)

unread,
May 25, 2015, 5:40:21 PM5/25/15
to dadi...@googlegroups.com
Hi Giordano,

It's not clear to me what's causing the problem, because the model is very complex. Have you seen similar problems with simpler models (without so many parameters)?

Best,
Ryan

<Screen Shot 2015-05-15 at 13.46.58.png><20150513_dadi_polarized_example.html>

Giordano

unread,
Jun 16, 2015, 7:04:02 PM6/16/15
to dadi...@googlegroups.com
Hi Ryan,

sorry for the very slow reply, I finally found the time to run simple models like split and asymmetric migration. There're always some warning, but very sporadic and the optimisation procedures were always successful.

Thanks a lot!

Cheers,
Giordano
Message has been deleted

daniqu...@gmail.com

unread,
Sep 15, 2016, 11:09:08 AM9/15/16
to dadi-user
Hello, Ryan!
My problem is exactly like Giordano's. However, my model is not so complex, is the IM and I also tried versions with no migration or simetric mig. It is just not working! Even with bigger grid, I get a lot of warming messages. What could it be?
Thanks!

Gutenkunst, Ryan N - (rgutenk)

unread,
Sep 16, 2016, 7:32:18 PM9/16/16
to dadi...@googlegroups.com
Hello Dani,

It’s not obvious to me. If you have a small working example that exhibits the problem, I could look into it.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Ryan Gutenkunst
Assistant Professor of Molecular and Cellular Biology, University of Arizona
phone: (520) 626-0569, office: LSS 325, web: http://gutengroup.mcb.arizona.edu

Latest papers: 
“Selection on network dynamics drives differential rates of protein domain evolution”
PLoS Genetics; http://dx.doi.org/10.1371/journal.pgen.1006132
"Triallelic population genomics for inferring correlated fitness effects of same site nonsynonymous mutations"
Genetics; http://dx.doi.org/10.1534/genetics.115.184812
"Whole genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection"
Genome Research; http://dx.doi.org/10.1101/gr.192971.115

daniqu...@gmail.com

unread,
Sep 19, 2016, 2:04:03 PM9/19/16
to dadi-user
Here is my script:

import dadi
import numpy
dd = dadi.Misc.make_data_dict("c:\Python27\input_dadi.txt")
fs2 = dadi.Spectrum.from_data_dict(dd, pop_ids=['ama1','ama2'], projections=[16,14], polarized=False)
    def IM(params, ns, pts):

        s,nu1,nu2,T,m = params

        xx = dadi.Numerics.default_grid(pts)

        phi = dadi.PhiManip.phi_1D(xx)
        phi = dadi.PhiManip.phi_1D_to_2D(xx, phi)

        nu1_func = lambda t: s * (nu1/s)**(t/T)
        nu2_func = lambda t: (1-s) * (nu2/(1-s))**(t/T)
        phi = dadi.Integration.two_pops(phi, xx, T, nu1_func, nu2_func,
                               m12=m, m21=m)

        fs = dadi.Spectrum.from_phi(phi, ns, (xx,xx))
        return fs
    def IM_nomig(params, ns, pts):
        s,nu1,nu2,T = params
        return IM((s,nu1,nu2,T,0),ns,pts)
    my_extrap_func = dadi.Numerics.make_extrap_log_func( IM_nomig )
    params = ["s","nu1","nu2","T"]
    ns = [16,14]
    pts_1 = [30,40,50] 
    upper_bound = [100,100, 100, 5]
    lower_bound = [1e-2, 1e-2, 1e-2, 0]
    p0 = [1, 1, 2, 0.3]
    p0 = dadi.Misc.perturb_params(p0, fold=1, upper_bound=upper_bound,lower_bound=lower_bound)
    popt = dadi.Inference.optimize_log(p0, fs2, my_extrap_func, pts_1,
                                       lower_bound=lower_bound,
                                       upper_bound=upper_bound,
                                       verbose=len(p0), maxiter=20)
    print('Best-fit parameters: {0}'.format(popt))
    model = my_extrap_func(popt, ns, pts_1)
    ll_model = dadi.Inference.ll_multinom(model, fs2)
    print('Maximum log composite likelihood: {0}'.format(ll_model))
    theta = dadi.Inference.optimal_sfs_scaling(model, fs2)
    print('Optimal value of theta: {0}'.format(theta))

Na here is the end of my result:

WARNING:Inference:Model is nan in some entries where data is not masked.
WARNING:Inference:Number of affected entries is 134. Sum of data in those entries is 1048.37:

WARNING:Inference:Model is nan in some entries where data is not masked.
WARNING:Inference:Number of affected entries is 134. Sum of data in those entries is 1048.37:

WARNING:Inference:Model is nan in some entries where data is not masked.
WARNING:Inference:Number of affected entries is 134. Sum of data in those entries is 1048.37:

WARNING:Inference:Model is nan in some entries where data is not masked.
WARNING:Inference:Number of affected entries is 134. Sum of data in those entries is 1048.37:

WARNING:Inference:Model is nan in some entries where data is not masked.
WARNING:Inference:Number of affected entries is 134. Sum of data in those entries is 1048.37:
196     , nan         , array([ 1.32184    ,  1.89026    ,  1.19633    ,  0.172601   ])
Best-fit parameters: [ 1.3218422   1.890255    1.19633365  0.17242872]WARNING:Inference:Model is nan in some entries where data is not masked.
WARNING:Inference:Number of affected entries is 134. Sum of data in those entries is 1048.37:

Maximum log composite likelihood: --
Optimal value of theta: nan

I tried different bounds and pts and I always get that. I'm not seeing my mistake here.
Thank you!! :)

Dani





Gutenkunst, Ryan N - (rgutenk)

unread,
Sep 19, 2016, 6:05:30 PM9/19/16
to dadi...@googlegroups.com
Note that the first parameter “s” is a fraction of a population size, so it must be between zero and one. That’s not how you have set your bounds.
-- 
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona

daniqu...@gmail.com

unread,
Sep 20, 2016, 7:20:34 PM9/20/16
to dadi-user
Unfortunately,it didn't solve the problem. :(( I still got the messages and I got this weird optimization:

7190    , -68.4258    , array([ 0.0586315  ,  6.45198    ,  8.86061    ,  3.48912    ,  2.92685    ])

712.353290862 1558.53641517
712.353290862 1558.53641517
712.353290862 1558.53641517
712.353290862 1558.53641517
712.353290862 1558.53641517
7195    , -68.4322    , array([ 0.0586315  ,  6.45198    ,  8.86061    ,  3.49261    ,  2.92685    ])

712.353290862 1558.53641517
712.353290862 1558.53641517
712.353290862 1558.53641517
712.353290862 1558.53641517
712.353290862 1558.53641517
7200    , -68.4076    , array([ 0.0586338  ,  6.45866    ,  8.84925    ,  3.48913    ,  2.92524    ])

712.353290862 1558.53641517
712.353290862 1558.53641517
712.353290862 1558.53641517
Best-fit parameters: [ 0.05857313  6.44624362  9.16115863  3.48900291  2.96894307]
712.353290862 1558.53641517
Maximum log composite likelihood: -69.0765847091
Is there anything else I can do?
And what if just one parameter is reaching the upper bounds? This is another model, the bottleneck. Only Nuf always reach the upper bound, even if i put it to 200!!While the other parameters remain low. How should I proceed?
Thanks!
Dani

Gutenkunst, Ryan N - (rgutenk)

unread,
Sep 28, 2016, 3:56:15 PM9/28/16
to dadi...@googlegroups.com
Hi Dani,

It’s not clear to me what you find to be weird about this optimization.

If a parameter is reaching upper bounds, it may be a sign that that model isn’t appropriate for your data. Looking at the residuals is often informative.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.

--
Ryan Gutenkunst
Assistant Professor of Molecular and Cellular Biology, University of Arizona
Reply all
Reply to author
Forward
0 new messages