Re: BioGeoBEARS error

852 views
Skip to first unread message

Nick Matzke

unread,
Feb 14, 2014, 1:35:12 PM2/14/14
to samuel...@utulsa.edu, bioge...@googlegroups.com
Hi! So, it's hard for me to diagnose without the input files.  Typically this is some kind of precision issue where a probability which should be zero gets calculated as negative, or something like that.

If it's really true that you "change those two randomly resolved branches' lengths from 0 to 1.0e-16 (approximately twice as small as the smallest real branch in my tree)"...a branch length of 1.0e-16, or even twice that, is super-super short, and some probability calculation on that short branch may be producing an error.  I would put your tree in a timescale of millions of years, and make the shortest branch only, say, 0.01 million years, and try that.

The next step is to modify the "min" and "max" for your free parameters, especially to move the minimum further away from zero...

Cheers!
Nick



On Fri, Feb 14, 2014 at 1:10 PM, <samuel...@utulsa.edu> wrote:
Hello Nick;

I love the idea of your program BioGeoBEARS and was able to run the Psychotria example script without any problems. When trying to use my own dataset, however, I did have some issues with two polytomous notes in my phylogeny; ape's multi2di function can only resolve polytomies to 0-length branches... If there's a better way to get trees with neither polytomies nor zero-length branches, I'm sure many people would benefit from your mentioning it somewhere (based on reading other users' issues). On your input check function/page (http://127.0.0.1:11003/library/BioGeoBEARS/html/check_BioGeoBEARS_run.html) you link to ape's multi2di function but by itself this function will not resolve the issue. My workaround was to manually edit the .newick tree file and change those two randomly resolved branches' lengths from 0 to 1.0e-16 (approximately twice as small as the smallest real branch in my tree).

Now the analysis will proceed to log quite a few generations, before coughing up a new error (one I can't find any information on):
> check_BioGeoBEARS_run(BioGeoBEARS_run_object)
Read 5 items
Read 570 items
[1] TRUE
>
> runslow = TRUE
> resfn = "Cyprinid_DEC_M0_unconstrained_v1.Rdata"
> if (runslow)
+     {
+     res = bears_optim_run(BioGeoBEARS_run_object)
+     res
+
+     save(res, file=resfn)
+     resDEC = res
+     } else {
+     # Loads to "res"
+     load(resfn)
+     resDEC = res
+     }
Read 5 items
Read 570 items
     d    e a b x u j ysv ys y s v mx01 mx01j mx01y mx01s mx01v mx01r  mf dp fdp
1 0.01 0.01 0 1 0 0 0   3  2 1 1 1    0     0     0     0     0   0.5 0.1  1   0
       LnL
1 -674.655
Maximizing -- use negfn and neggr
     d    e a b x u j ysv ys y s v mx01 mx01j mx01y mx01s mx01v mx01r  mf dp fdp
1 0.01 0.01 0 1 0 0 0   3  2 1 1 1    0     0     0     0     0   0.5 0.1  1   0
       LnL

....(many lines not copied here)...

  d e a b x u j ysv ys y s v mx01 mx01j mx01y mx01s mx01v mx01r  mf dp fdp
1 5 0 0 1 0 0 0   3  2 1 1 1    0     0     0     0     0   0.5 0.1  1   0
       LnL
1 -438.927
Error in params_into_BioGeoBEARS_model_object(BioGeoBEARS_model_object,  :
  replacement has length zero
In addition: Warning messages:
1: In optimx.setup(par, fn, gr, hess, lower, upper, method, itnmax,  :
  optimx: No match to available methods
2: In optimx.setup(par, fn, gr, hess, lower, upper, method, itnmax,  :
  Default method when bounds specified is L-BFGS-B to match optim()
3: In optim(par = par, fn = ufn, gr = ugr, lower = lower, upper = upper,  :
  method L-BFGS-B uses 'factr' (and 'pgtol') instead of 'reltol' and 'abstol'

My tree contains 285 taxa, and there are only three possible ancestral areas. Please let me know if you need any other information, and thanks so much for your time!

Sam Martin
University of Tulsa

Nick Matzke

unread,
Feb 19, 2014, 6:15:09 PM2/19/14
to bioge...@googlegroups.com
Hints for Sam's questions in this email.  The next email contains solutions, which hopefully will help others...

---------- Forwarded message ----------
From: Nick Matzke <mat...@nimbios.org>
Date: Wed, Feb 19, 2014 at 4:24 PM
Subject: Re: BioGeoBEARS error
To: Samuel Martin <samuel...@utulsa.edu>


Hi, I'm happy to look at it.  I think you should email me your data, successfully modifying trees to be valid newick objects is nontrivial.

Sorry about the complexity of the script -- but the flexibility of BioGeoBEARS comes along with the complexity of the setup.  The more familiar you get with R, the easier it gets.  In the long run, it's better for people to be taking the example script and modifying it for their purposes, rather than having everything "black-boxed" in one function where no one but me knows what is going on.

I can look at it if you email me the inputs, but some hints:


On Wed, Feb 19, 2014 at 3:50 PM, Samuel Martin <samuel...@utulsa.edu> wrote:
Nick;

I've tried my analysis again using a different tree without any polytomous nodes (to avoid any error I might have introduced by editing the lengths of arbitrarily resolved branches) and am getting the same error (Error in params_into_BioGeoBEARS_model_object"...BioGeoBEARS_model_object,:  replacement has length zero..."). You suggested that I should 'modify the "min" and "max" for the free parameters,' but I can't seem to find any reference to this in the manual, other than mentions of the number of parameters in other models (e.g. 2 in Lagrange). Could you elaborate on this a little bit?


==============
Just as you can access the variable type for "j", the 
initial starting value of "j", etc., like this: 

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","type"] 
= "free" 

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","init"] 
= 0.01 


...you can access the "min" and "max" for "j" (or "d", "e", 
whatever), like this: 

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","min"] 

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","max"] 

Often, narrowing the limits a teeny bit fixes NaN/Inf 
problems during the ML search. 

You can access the full parameters table with: 

BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table 
==============


So I would try things like:

# Change j's min from 0.00001 to 0.001

# Initial parameter inputs
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

# Make the change
BioGeoBEARS_run_object$
BioGeoBEARS_model_object@params_table["j","min"] = 0.00001
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j","min"] = 0.001

# Resulting parameter inputs
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table



BUT, this may not be the issue at all, it could be some weirdness with your tree or something.



I'm also a bit confused with the length of the script; if I change the references for my input tree and data file in the R script, and also put my files in the same working directory as the tutorial files, the R script still seems to be calling references to other Psychotria files for some reason. Are there other necessary files for a BioGeoBEARS analysis besides the tree and geographic data file?


Probably you are just looking at the title of the plots.  These are written as "Psychotria" in the default script. Do a search/replace with your own taxon name to fix this...


 
I'd be happy to provide my input files if that would help diagnose my issue, but I'm having difficulty troubleshooting the tutorial due to its lengths and somewhat sparse explanations (I am quite the R novice). Sorry for the trouble and thanks so much for your time!

Sam Martin

It's hard to cover everybody, some people want just-the-code, other people want every line explained...if you're learning, run each line one-at-a-time on the default script, (or each group between { } for for-loops and if /else statements), and then look at what changed.  E.g.

# Look at overall table
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

# Look at what you are changing
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j", "type"]

# Change something
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j", "type"] = "fixed"

# Look at result
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j", "type"]

# Change again
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j", "type"] = "free"

# Look at result again
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table["j", "type"]

# Look at overall table
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

...etc...

 
--
Sam Martin
PhD. Candidate
Department of Biological Sciences
The University of Tulsa
Office: Oliphant Hall 311 N


Nick Matzke

unread,
Feb 19, 2014, 6:19:09 PM2/19/14
to bioge...@googlegroups.com
The main issues in this case were (1) branches too short and perhaps (2) making sure a 3-area dataset has max_range_size = 3

We also had a temporary issue with the tip labels in the tree not matching those in the geography file, but Sam had already fixed that.

I've deleted the attachments etc...


---------- Forwarded message ----------
From: Nick Matzke <mat...@nimbios.org>
Date: Wed, Feb 19, 2014 at 5:49 PM
Subject: Re: BioGeoBEARS error
To: Samuel Martin <samuel...@utulsa.edu>


Hi Sam,

OK I found 2 issues. One was the tree thing, I pruned the tree myself but I could have used your pruned tree.

The other problem was that the branches on your tree are all extremely short.  BioGeoBEARS defaults assume that branches are in millions of years.  Your tree has many branches of length, say, 1e-6, which BioGeoBEARS thinks of as one calendar year.

This can cause 2 problems:

1. Precision issues and underflow problems

2. BioGeoBEARS assumes by default that branches of length shorter than 0.000001 are "direct ancestors" rather than side-branches, and changes the calculations accordingly.  This was done so that I could experiment with putting in fossil ranges in rare cases they are known (this could not be done with e.g. LAGRANGE).  The cladogenesis model is "turned off" for these branching points. 

This default can be changed, but it's easier to multiply all your branches by 1,000,000, which I believe I suggested before.  I've done it here, saved to a new Newick file, and it all works...

In the attached script, search on "NJM_EDIT" to see the parts where I changed things or fixed your tree.

Also, you have 3 areas, so make sure that:

max_range_size = 3

...in the script.

Cheers!

Nick

PS Model test results. Congratulations, you've got one of the fairly rare datasets where DEC is virtually as good as DEC+J! (although, technically, it looks like the null hypothesis that they are equally good is still rejected with p=0.03).

Cheers,
Nick

> restable
     LnL numparams           d            e           j
1 -204.3         2 0.004174491 1.000000e-15 0.000000000
2 -202.1         3 0.003808284 1.000000e-15 0.004492989
3 -214.0         2 0.004919439 1.000000e-15 0.000000000
4 -211.8         3 0.004277351 1.000000e-15 0.006272245
5 -268.5         2 0.003848836 5.189778e-03 0.000000000
6 -222.7         3 0.003241068 1.000000e-15 0.015554738
> teststable
   alt null LnLalt LnLnull DFalt DFnull DF Dstatistic    pval        test       tail  AIC1  AIC2 AICwt1  AICwt2
11          -202.1  -204.3     3      2  1       4.26   0.039 chi-squared one-tailed 410.2 412.5   0.76    0.24
12          -211.8    -214     3      2  1       4.52   0.034 chi-squared one-tailed 429.6 432.1   0.78    0.22
13          -222.7  -268.5     3      2  1      91.58 1.1e-21 chi-squared one-tailed 451.5 541.1      1 3.5e-20
   AICweight_ratio_model1 AICweight_ratio_model2
11                   3.09                   0.32
12                   3.52                   0.28
13               2.83e+19                3.5e-20



On Wed, Feb 19, 2014 at 5:22 PM, Samuel Martin <samuel...@utulsa.edu> wrote:
Nick;

Which tree file was that from? I might've sent you the wrong one by mistake; those are my outgroups from the phylogenetic analysis which I trimmed in R before trying to run BioGeoBEARS. I did do a taxon check with my trees and didn't get any errors. This is the correct tree. Sorry about that!


On Wed, Feb 19, 2014 at 4:13 PM, Nick Matzke <mat...@nimbios.org> wrote:
Hmm, I get this line of results in the script, did you get this?

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

======================
check_BioGeoBEARS_run(BioGeoBEARS_run_object)
Read 5 items
Read 570 items

FATAL ERROR in inputs: 15 tree tips are not in the geographic ranges file.  These are:
Blicca_bjoerkna, Catostomus_commersonii, Danio_rerio, Hypentelium_nigricans, Myxocyprinus_asiaticus, Notemigonus_crysoleucas, Pelecus_cultratus, Phoxinus_phoxinus, Rutilus_rutilus, Squaliobarbus_curriculus, Tanakia_himantegus, Tanakia_lanceolata, Tanichthys_albonubes, Tinca_tinca, Tribolodon_hakonensis

======================


Reply all
Reply to author
Forward
0 new messages