error in dispersal multipliers and areas allowed matrix

2,389 views
Skip to first unread message

Mariana Vasconcellos

unread,
Mar 25, 2014, 2:43:21 PM3/25/14
to bioge...@googlegroups.com
Hi Nick:
I successfully adapted your BioGeoBEARS script to run my dataset and I ran the unconstrained model fine. However, I would like to run a constrained model with dispersal multipliers matrix and removing non-adjacent areas. I tried to set up one file for each matrix as you advise in your earlier post: https://groups.google.com/forum/#!topic/biogeobears/XXaJqmI232o
But, I still get the following error:

Error in BioGeoBEARS_run_object$list_of_areas_allowed_mats[[1]] : 
  subscript out of bounds
In addition: Warning messages:
1: In readLines(dispersal_multipliers_fn) :
  incomplete final line found on '/Users/marianavasconcellos/Documents/MARIANA/DOUTORADO/Hypsiboas_gr_pulchellus/ANALYSES/BioGeoBEARS/input_files/INPUTdispersal_multipliers_M1.txt'
2: In readLines(areas_allowed_fn) :
  incomplete final line found on '/Users/marianavasconcellos/Documents/MARIANA/DOUTORADO/Hypsiboas_gr_pulchellus/ANALYSES/BioGeoBEARS/input_files/INPUTareas_allowed_M1.txt'

Could you tell me what I am doing wrong?
Thank you,
Mariana

Here is how my file for areas allowed looks like: ( I want to remove AF, BC, BD, CF, DF)

A B C D E F

1 1 1 1 1 0

1 1 0 0 1 1

1 0 1 1 1 0

1 0 1 1 1 0

1 1 1 1 1 1

0 1 0 0 1 1

END

And here is how my file for dispersal multipliers looks like:

A B C D E F

1 1 1 1 1 0.5

0 1 0 0 0 1

1 0.5 1 1 1 0.5

1 0.5 1 1 1 0.5

1 1 1 1 1 1

0 1 0 0 0 1

END

And here is how I include them in the analyses:


# Input dispersal multipleirs matrix (only for M1)

dispersal_multipliers_fn = np(paste(addslash(extdata_dir), "INPUTdispersal_multipliers_M1.txt", sep=""))

BioGeoBEARS_run_object$dispersal_multipliers_fn = dispersal_multipliers_fn

# Input areas allowed (only for M1)

areas_allowed_fn = np(paste(addslash(extdata_dir), "INPUTareas_allowed_M1.txt", sep=""))

BioGeoBEARS_run_object$areas_allowed_fn = areas_allowed_fn

Francisco

unread,
Mar 25, 2014, 8:47:58 PM3/25/14
to bioge...@googlegroups.com
Hi there, did you check your matrices? a good way to do this is show hidden characters, this way you can compare your dispersal matrices with the ones provided by Nick, so you can figure out what the differences are and fix your matrices, this may solve your problem.

Nick Matzke

unread,
Mar 26, 2014, 11:43:49 AM3/26/14
to bioge...@googlegroups.com
Hi Mariana,

I think you need to:

1. Remove the blank lines between the rows in the dispersal
multiplier & areas-allowed matrices

2. Add tabs between the columns in the areas allowed matrix.

Cheers!
Nick
> --
> You received this message because you are subscribed to the
> Google Groups "BioGeoBEARS" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to
> biogeobears...@googlegroups.com
> <mailto:biogeobears...@googlegroups.com>.
> To post to this group, send email to
> bioge...@googlegroups.com
> <mailto:bioge...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/biogeobears.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/biogeobears/b0a685bd-e9d5-4d6f-b67d-e23289e881fc%40googlegroups.com
> <https://groups.google.com/d/msgid/biogeobears/b0a685bd-e9d5-4d6f-b67d-e23289e881fc%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
====================================================
Nicholas J. Matzke, Ph.D.
NIMBioS Postdoctoral Fellow in Mathematical Biology
National Institute for Mathematical and Biological Synthesis
(NIMBioS, www.nimbios.org)
Cell: 510-301-0179
Email: mat...@nimbios.org
Links to CV, R packages, etc.:
http://phylo.wikidot.com/nicholas-j-matzke

Also: Brian O'Meara Lab
Postdoc office: 425a Hesler
Department of Ecology and Evolutionary Biology
University of Tennessee, Knoxville
http://www.brianomeara.info/

NIMBioS Office:
Claxton Bldg. #110B
Office phone: 865-974-4873

NIMBioS:
1122 Volunteer Blvd., Suite 106
University of Tennessee
Knoxville, TN 37996-3410
Phone: (865) 974-9334
Fax: (865) 974-9300

-----------------------------------------------------
"[W]hen people thought the earth was flat, they were wrong.
When people thought the earth was spherical, they were
wrong. But if you think that thinking the earth is spherical
is just as wrong as thinking the earth is flat, then your
view is wronger than both of them put together."

Isaac Asimov (1989). "The Relativity of Wrong." The
Skeptical Inquirer, 14(1), 35-44. Fall 1989.
http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
====================================================

Nick Matzke

unread,
Mar 26, 2014, 11:45:32 AM3/26/14
to bioge...@googlegroups.com
Update: I got your files, it looks like they have the
spacing/tabs correct, I will take a look...

Nick

Nick Matzke

unread,
Mar 26, 2014, 1:23:06 PM3/26/14
to bioge...@googlegroups.com
Hi again,

So, the basic issue with the files was just that they need a
blank line after the END line.

After I fixed that, there was still a crash, as the
areas_allowed option was really designed for time-stratified
analysis, rather than non-stratified analysis. (I admit I
probably did not document this anywhere.)

- The way I programmed non-stratified analysis, if certain
combinations of areas are disallowed, you would manually
remove them from the BioGeoBEARS_run_object$states_list
before analysis, rather than use the areas_allowed matrix.

- In a time-stratified analysis, the allowed ranges are
changing in different time periods, according to the
areas-allowed matrix.

This means that, to use the areas-allowed matrix file for a
single timeperiod, all you have to do is set up a "fake"
time-stratified analysis, with a timeperiods.txt file, and
with the same matrix repeated twice in the areas_allowed
file as well as any other files (e.g. the dispersal
multipliers file).

I'll send the exact files to Mariana.

Cheers!
Nick

Nick Matzke

unread,
Sep 4, 2014, 3:47:20 PM9/4/14
to bioge...@googlegroups.com
Hi all,

If you want to replicate the "adjacent areas" feature of Python LAGRANGE DEC -- that is, only allow ranges in the state space that consist of adjacent areas which are specified in an adjacent areas matrix, which (as Mariana figured out) isn't quite the same thing as the areas_allowed feature -- here is how. 

I have confirmed that this gets the same likelihoods and parameter inferences as a Python Lagrange DEC analysis. 

For now, this only works for non-time-stratified analysis.  If anyone has a pressing need for the time-stratified version of this I can push it higher in my list of revisions.

Modification to example script to exclude non-adjacent ranges from the state space, non-stratified analysis:
=================
# Let's modify the states list to match the adjacency matrix 
# from Mariana's Python LAGRANGE analysis


# area_adjacency': [[1, 1, 1, 0, 1, 0],
#                     [1, 1, 1, 0, 1, 1],
#                     [1, 1, 1, 1, 1, 0],
#                     [0, 0, 1, 1, 1, 0],
#                     [1, 1, 1, 1, 1, 1],
#                     [0, 1, 0, 0, 1, 1]]
#                     

area_adjacency_matrix = matrix(data=
 c(1, 1, 1, 0, 1, 0,
1, 1, 1, 0, 1, 1,
1, 1, 1, 1, 1, 0,
0, 0, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1,
0, 1, 0, 0, 1, 1), nrow=6, ncol=6, byrow=TRUE)

# BioGeoBEARS_run_object$states_list
areas = names(tipranges@df)
states_list = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=TRUE)
states_list

# Set up list of TRUEs
numstates = length(states_list)
states_to_keep = rep(TRUE, times=numstates)
states_to_keep

# Delete states that are not adjacent in the adjacency matrix
# Skip the first state, if it's null
if ( is.na(states_list[[1]]) || states_list[[1]] == "_" )
{
start_i = 2
} else {
start_i = 1
}

for (i in start_i:length(states_list))
{
areas_in_this_state_range_0based = states_list[[i]]
areas_in_this_state_range_1based = areas_in_this_state_range_0based + 1
adjacent_if_1s = area_adjacency_matrix[areas_in_this_state_range_1based, areas_in_this_state_range_1based]
print(adjacent_if_1s)
if ( length(adjacent_if_1s) == sum(adjacent_if_1s) )
{
states_to_keep[i] = TRUE
} else {
states_to_keep[i] = FALSE
}
}

# Modify the states_list
states_list
states_list[states_to_keep]

states_list = states_list[states_to_keep]

# Put the new states_list into the BioGeoBEARS_run_object
BioGeoBEARS_run_object$states_list = states_list

# For a slow analysis, run once, then set runslow=FALSE to just 
# load the saved result.
runslow = TRUE
resfn = "Mariana_DEC_M1_unconstrained_v1.Rdata"
if (runslow)
    {
    res = bears_optim_run(BioGeoBEARS_run_object)
    res    

    save(res, file=resfn)
    resDEC = res
    } else {
    # Loads to "res"
    load(resfn)
    resDEC = res
    }

=================



Python Lagrange results:
===============
Global ML at root node:
  -lnL = 62.44
  dispersal = 0.01835
  extinction = 2.496e-09
===============

BioGeoBEARS DEC results:
===============
This is the output from optim or optimx. Check the help on those functions to
interpret this output and check for convergence issues:

               p1          p2     value fevals gevals niter convcode  kkt1 kkt2 xtimes
bobyqa 0.01834769 5.66894e-09 -62.43986     32     NA    NA        0 FALSE TRUE  2.021

===============

Cheers!
Nick


PS: Discussion with Mariana below, sans data -- note that Mariana's "M1" model isn't the same idea as the Ree & Smith (2007) or Matzke (2014) M1 models -- Mariana's "M1" is DEC, plus manual dispersal multipliers, plus area_adjacency matrix.




---------- Forwarded message ----------
From: Nick Matzke <mat...@nimbios.org>
Date: Thu, Sep 4, 2014 at 3:38 PM
Subject: Re: [BioGeoBEARS] error in dispersal multipliers and areas allowed matrix
To: Mariana Vasconcellos <mari...@utexas.edu>


Hi Mariana,

I was able to get the same likelihoods and parameter inferences in DEC by manually modifying the states_list to remove non-adjacent states. This strategy won't work for a time-stratified analysis unless I update the code (I've now got it on the list to add an area_adjacency file to replicate this aspect of LAGRANGE, which you've shown isn't quite the same as areas_allowed).  But it works fine for the six standard models with these changes to the script...see zipfile. I'll post the basics to the google group as well.

Cheers!
Nick



On Wed, Sep 3, 2014 at 11:04 AM, Mariana Vasconcellos <mari...@utexas.edu> wrote:
Hi Nick:
Thanks for your reply. I haven’t figure out this yet and I would like to use the DEC M1 in my paper. I was wondering if there is another way to implement the M1 model.
Thank you very much! And, congratulations on your paper!

----------------------
Mariana Mira Vasconcellos
PhD candidate
Ecology, Evolution & Behavior
Integrative Biology
The University of Texas at Austin




On Sep 2, 2014, at 10:52 PM, Nick Matzke <mat...@nimbios.org> wrote:

Hi,

I'm finally getting back to Knoxville tomorrow. Briefly, it might be an optimization issue rather than a difference in the calculations -- the way to check is to force the same parameters in BioGeoBEARS DEC and see if that likelihood matches LAGRANGE DEC.  In other tests I've run the M1 models are the same. 

If you haven't figured it out I can take a look on Thurs. or Friday...


On Mon, Aug 25, 2014 at 8:07 PM, Mariana Vasconcellos <mari...@utexas.edu> wrote:
Hi Nick:

I posted this question in March and I thought I had it solved until recently when I realized that the likelihood values of the dispersal constrained model (M1) is different between LAGRANGE and BioGeoBEARS (DEC). 

Just to remind you of my problem: I wanted to run a model excluding some non-adjacent ranges and also including a dispersal multipliers matrix. You suggested to run a “fake” time-stratified model where the two time periods would have the same dispersal matrix and areas allowed. The problem is that running the model this way, the likelihood value differs from the LAGRANGE original run, so I think the model is not the same. This doesn’t happen in the unconstrained model where both LAGRANGE and BioGeoBEARS have the same likelihood values.

I am sending my input files from the BioGeoBEARS run and also my output with the same model using LAGRANGE non-stratified model. The likelihood using two maximum ranges is -63.83 for BioGeoBEARS and -62.44 for LAGRANGE. This makes me think that the “fake” time-stratified model is not the same as a non-stratified one. Since I am using the likelihood values to test hypothesis, I would like to get it correctly. 

Could you help me to figure out how to run a constrained dispersal model without using a stratified model since this is biasing the results somehow?

Feel free to post your answer in the Google groups. I am sending you the e-mail here because I want to give you my input files so you understand my constrained model.

Thank you very much for you help!

 









----------------------
Mariana Mira Vasconcellos
PhD candidate
Ecology, Evolution & Behavior
Integrative Biology
The University of Texas at Austin




On Mar 26, 2014, at 12:25 PM, Nick Matzke <mat...@nimbios.org> wrote:

Hi Mariana,

I got it to work with the following.  Let me know if it "looks right" as you work with it, I haven't tested the areas_allowed stuff extensively beyond "sinking" single islands back in time.

Cheers!
Nick
<Hypsiboas_DEC_vs_DECj_M1_constrained_v1.pdf><Hypsiboas_DEC_M1_constrained_v1.Rdata><Hypsiboas_DECJ_M1_constrained_v1.Rdata><BioGeoBEARS_H_pulchellus_NM1.R><INPUTdispersal_multipliers_M1_NMv1.txt><INPUTareas_allowed_M1_NM1_v1.txt><timeperiods.txt>






Mariana Vasconcellos

unread,
Sep 9, 2014, 3:10:48 PM9/9/14
to bioge...@googlegroups.com, mat...@nimbios.org
Hi Nick:
This script for adjacency areas matrix worked well when the max_range_size = 2, however when I ran with 3 max_range_size the states_list is not accurate. It is removing more areas than it should. For example, areas AF are non-adjacent, but AB and BF are. So, in the 3 max_range scenario the new range ABF should not be removed, because considering the 3 areas together they are adjacent. But, currently the script eliminates this range possibility. I feel like this part of the code should be modified to include possibilities with 3 or more max_range_size. But, I don't know how this could be done.

if ( length(adjacent_if_1s) == sum(adjacent_if_1s) )

{

states_to_keep[i] = TRUE

} else {

states_to_keep[i] = FALSE

}

Thank you for all your help! And, sorry to bother again!
Mariana

Nick Matzke

unread,
Sep 9, 2014, 11:29:42 PM9/9/14
to bioge...@googlegroups.com
Hi Mariana, 

You ask an interesting question, but you are right that a solution isn't necessarily simple. I suppose if we declare that the adjacency matrix represents only the pairwise relationships between areas and not the joint areas that are allowed we can write an algorithm that works as you suggest. I'll think about it a bit.

However, for the moment you can of course just manually edit states_list as you like.  E.g. if you have 10 states, to remove state 5 do this:

TF = rep(TRUE, 10)
TF[5] = FALSE
new_states_list = states_list[TF]

Then put the new_states_list into BioGeoBEARS_run_object$states_list

...this strategy lets users customize the list of possible ranges however they see fit.

Cheers!
Nick



To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears...@googlegroups.com.
To post to this group, send email to bioge...@googlegroups.com.

Robin van Velzen

unread,
Oct 13, 2016, 6:44:27 PM10/13/16
to BioGeoBEARS, mat...@nimbios.org

Dear Nick, 

I am trying to setup an analysis excluding ranges of non-adjacent regions, following your various suggestions and example scripts. But I am getting an error or R is crashing and I am not sure how to solve the issues. Using my own data I get the following error:

Your geography file has a tip range which is not allowed in your specified state space (states_list).
The tip is: XXXXXX
The range is: 0

Please find below this message a minimal script based on the example data (this does not generate the error message but simply crashes R...).

Any help or suggestion would be very much appreciated! 

Thanks, Robin


scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))
trfn = "Psychotria_5.2.newick"
geogfn = "Psychotria_geog.data"

# Look at your geographic range data:
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
areas = getareas_from_tipranges_object(tipranges)

max_range_size = 2

#get states (ranges) list 
states_list_0based = rcpp_areas_list_to_states_list(areas=areas, maxareas=max_range_size, include_null_range=TRUE)
ranges_list = NULL
for (i in 1:length(states_list_0based))
{    
  if ( (length(states_list_0based[[i]]) == 1) && (is.na(states_list_0based[[i]])) )
  {
    tmprange = "_"
  } else {
    tmprange = paste(areas[states_list_0based[[i]]+1], collapse="")
  }
  ranges_list = c(ranges_list, tmprange)
}
ranges_list

#exclude non ranges of non-adjacent regions
nonadjacent=c("KM","OM")
ranges_list<-ranges_list[ranges_list %in% nonadjacent ==F]

# Initiate analysis run object
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$min_branchlength = 0.000001    # Min to treat tip as a direct ancestor (no speciation event)
BioGeoBEARS_run_object$include_null_range = TRUE    # set to FALSE for e.g. DEC* model, DEC*+J, etc.
BioGeoBEARS_run_object$speedup = TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE     # if FALSE, use optim() instead of optimx()
BioGeoBEARS_run_object$num_cores_to_use = 1
BioGeoBEARS_run_object$force_sparse = FALSE    # force_sparse=TRUE causes pathology & isn't much faster at this scale
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE    # get ancestral states from optim run

# Apply custom ranges list THIS CRASHES R
BioGeoBEARS_run_object$states_list = ranges_list 

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

res = bears_optim_run(BioGeoBEARS_run_object) 
Reply all
Reply to author
Forward
0 new messages