"Areas allowed", "distances matrix" and "adjacency matrix"

1,438 views
Skip to first unread message

Sara Ceccarelli

unread,
Aug 5, 2014, 1:23:30 PM8/5/14
to bioge...@googlegroups.com
Hello Nick,

A quick question just to clarify the options for geography input files in the stratified analyses. As I understand it, the "areas allowed" file for BGB tells the program whether an area existed or not during distinct time periods and is therefore not the same as Lagrange's "area adjacency matrix", correct? Would the "area adjacency" in this case be inferred through the "distances matrix" file by BGB (i.e. areas bordering each other would have a distance of 0)?

Thanks!

Nick Matzke

unread,
Aug 6, 2014, 12:22:31 AM8/6/14
to bioge...@googlegroups.com
Hi Sara,

I believe, although I don't remember for sure, that the Lagrange area adjacency matrix defines allowed combinations of areas. Ah yes, from Ree's Lagrange Configurator:

===============
In the "adjacency matrix" below, for each pair of areas ask whether a single species could plausibly inhabit those two areas and no others. In "maximum range size" set the maximum number of areas permitted by ancestors. Then, click "Recalculate possible ranges" to generate a list of valid states in the model. This list can be further customized by excluding particular ranges. 

Why reduce the set of allowed ranges? First, some ranges may not make emprirical sense, based on biology and the spatial configuration of areas. Second, it makes the analysis run faster. If your analysis includes >400 or so possible ranges, expect to wait days or weeks for it to finish! 

Caution: reducing the list of allowed ranges by these three means (adjacency, maximum size, and manual selection) may result in some ranges becoming "disconnected" and unreachable from any other range in the model by discrete events of disperal and local extinction. This is often manifested in Lagrange runs as convergence errors.
===============


The admittedly sparse help in BioGeoBEARS says:

=========================
?read_areas_allowed_fn

Read in the area areas by time

Description

areas_allowed file is just a list of 1/0 matrices, separated by blank lines, from youngest to oldest. Ones represent allowed combinations of areas.
=========================


So the "ones" specify allowed which areas are allowed to be in the same range as the respective diagonal. E.g., 

======================
K O M H
1 1 0 0
1 1 1 0
0 1 1 1
0 0 1 1
======================

...would mean that the allowed ranges in the state space are K, O, M, H, KO, OM, MH.  This is close to the situation where max_range_size=2, but as you can see, ranges like KH are disallowed.

So, areas_allowed works a lot like the adjacency matrix in Lagrange, except that you can also totally remove areas by setting the diagonal to zero.  E.g., for a time-stratified Hawaii analysis, areas_allowed.txt could be:

======================
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1

K O M H
1 1 1 0
1 1 1 0
1 1 1 0
0 0 0 0

K O M H
1 1 0 0
1 1 0 0
0 0 0 0
0 0 0 0

K O M H
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0

K O M H
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0

END
======================

...representing the sinking of islands as we travel back in time.  As far as I can tell, one cannot do this in Lagrange.  Restricting dispersal isn't quite the same thing, because it does not disallow the possibility of long-term persistence on an island.

Nick

PS: I have been meaning to revisit the areas_allowed calculation. At the moment, it works in a stratified analysis by zeroing out the likelihood for disallowed states, when doing the calculations at nodes. Note that this is not quite the same thing as disallowing those states in transitions along the branches. You can disallow those transitions by setting the relevant dispersal events to probability zero with the dispersal_multipliers matrix.  

The Hawaii example was the context in which I thought this up in the first place, when I notice that just having the dispersal_multipliers matrix didn't actually exclude the possibility of the ancestor of Psychotria living on the Big Island (age: 0.5 Ma) back at 5.2 mya.  So, I wouldn't use areas_allowed without also specifying the dispersal_multipliers matrix.  

For non-stratified analyses, it is probably preferable to limit the state space directly, either by cutting out areas you aren't using, setting the maximum range size, and/or manually editing the states_list_0based to exclude certain ranges from the state space. Although it may make little difference.

Ideally, rather than the multiplying-the-likelihood-of-disallowed-ranges-by-0-at-the-nodes strategy, I should just literally have different state spaces and different transition matrices in each different time period of a time-stratified analysis. However, this was rather more complex to program, when it was an afterthought in the first place.  It's also a bit complex as the downpass and uppass may have to be treated differently, I think. I did one test to compare the downpass likelihood back when I programmed this, and the two strategies came out the same, but thinking about it now this might not have covered all cases. Hopefully I will be able to add the change-the-state-space-directly strategy at some point this fall, but it may be a bit given everything else that is going on. I suspect the difference it will make would be minimal anyway...

Cheers!
Nick















--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears...@googlegroups.com.
To post to this group, send email to bioge...@googlegroups.com.
Visit this group at http://groups.google.com/group/biogeobears.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeobears/fe1c190d-9347-4f94-8337-71887bf9fc35%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Sara Ceccarelli

unread,
Aug 6, 2014, 5:04:49 PM8/6/14
to bioge...@googlegroups.com, mat...@nimbios.org
Hi Nick,

Thanks for your reply. OK, so the areas_allowed file is more informative than I thought, that's great! So in the example you give:



======================
K O M H
1 1 0 0
1 1 1 0
0 1 1 1
0 0 1 1
======================

If the max_range_size is set to 3, would that allow for a range KOM even if KM is disallowed?

Cheers,

Sara

Nick Matzke

unread,
Aug 13, 2014, 1:31:03 PM8/13/14
to bioge...@googlegroups.com, mat...@nimbios.org


On Wednesday, August 6, 2014 2:04:49 PM UTC-7, Sara Ceccarelli wrote:
Hi Nick,

Thanks for your reply. OK, so the areas_allowed file is more informative than I thought, that's great! So in the example you give:


======================
K O M H
1 1 0 0
1 1 1 0
0 1 1 1
0 0 1 1
======================

If the max_range_size is set to 3, would that allow for a range KOM even if KM is disallowed?

Cheers,

Sara


It looks like the answer is...no -- see test code below.  It's always possible to modify it of course, this areas allowed stuff was initally set up for the emergence-of-islands-in-a-nice-regular-order-like-in-Hawaii case, which might not cover everything.  

Although, it's difficult to think of a situation where KOM should be allowed but KM should not be, since all that is required to go from KOM -> KM is an extirpation in O...

========================
library(BioGeoBEARS)

areas_allowed_fn = "/Dropbox/_njm/__packages/BioGeoBEARS_setup/inst/extdata/examples/Psychotria_M3strat/BGB/areas_allowed.txt"
moref(areas_allowed_fn)
tmpinputs = NULL
tmpinputs$areas_allowed_fn = areas_allowed_fn
list_of_areas_allowed_mats = read_areas_allowed_fn(inputs=tmpinputs)
list_of_areas_allowed_mats
areas_allowed_mat = list_of_areas_allowed_mats[[1]]
areas_allowed_mat
areas = c("K", "O", "M", "H")
states_list_0based_index = rcpp_areas_list_to_states_list(areas)
states_list_0based_index
prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[1]])
prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[2]])
prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[3]])
prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[4]])

areas_allowed_Sara = matrix(data=c(1,1,0,0,1,1,1,0,0,1,1,1,0,0,1,1), ncol=4, byrow=TRUE)
areas_allowed_Sara
prune_states_list(states_list_0based_index, areas_allowed_Sara)
========================

output:
=============

> areas_allowed_fn = "/Dropbox/_njm/__packages/BioGeoBEARS_setup/inst/extdata/examples/Psychotria_M3strat/BGB/areas_allowed.txt"
> moref(areas_allowed_fn)
Read 26 items
> tmpinputs = NULL
> tmpinputs$areas_allowed_fn = areas_allowed_fn
> list_of_areas_allowed_mats = read_areas_allowed_fn(inputs=tmpinputs)
> list_of_areas_allowed_mats
[[1]]
  K O M H
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 1 1

[[2]]
  K O M H
1 1 1 1 0
2 1 1 1 0
3 1 1 1 0
4 0 0 0 0

[[3]]
  K O M H
1 1 1 0 0
2 1 1 0 0
3 0 0 0 0
4 0 0 0 0

[[4]]
  K O M H
1 1 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0

[[5]]
  K O M H
1 1 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0

> areas_allowed_mat = list_of_areas_allowed_mats[[1]]
> areas_allowed_mat
  K O M H
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 1 1
>
> areas = c("K", "O", "M", "H")
> states_list_0based_index = rcpp_areas_list_to_states_list(areas)
> states_list_0based_index
[[1]]
[1] NA

[[2]]
[1] 0

[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 3

[[6]]
[1] 0 1

[[7]]
[1] 0 2

[[8]]
[1] 0 3

[[9]]
[1] 1 2

[[10]]
[1] 1 3

[[11]]
[1] 2 3

[[12]]
[1] 0 1 2

[[13]]
[1] 0 1 3

[[14]]
[1] 0 2 3

[[15]]
[1] 1 2 3

[[16]]
[1] 0 1 2 3

>
> prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[1]])
[[1]]
[1] NA

[[2]]
[1] 0

[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 3

[[6]]
[1] 0 1

[[7]]
[1] 0 2

[[8]]
[1] 0 3

[[9]]
[1] 1 2

[[10]]
[1] 1 3

[[11]]
[1] 2 3

[[12]]
[1] 0 1 2

[[13]]
[1] 0 1 3

[[14]]
[1] 0 2 3

[[15]]
[1] 1 2 3

[[16]]
[1] 0 1 2 3

> prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[2]])
[[1]]
[1] NA

[[2]]
[1] 0

[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 0 1

[[6]]
[1] 0 2

[[7]]
[1] 1 2

[[8]]
[1] 0 1 2

> prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[3]])
[[1]]
[1] NA

[[2]]
[1] 0

[[3]]
[1] 1

[[4]]
[1] 0 1

> prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[4]])
[[1]]
[1] NA

[[2]]
[1] 0

>
> areas_allowed_Sara = matrix(data=c(1,1,0,0,1,1,1,0,0,1,1,1,0,0,1,1), ncol=4, byrow=TRUE)
> areas_allowed_Sara
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    1    1    1    0
[3,]    0    1    1    1
[4,]    0    0    1    1
>
> prune_states_list(states_list_0based_index, areas_allowed_Sara)
[[1]]
[1] NA

[[2]]
[1] 0

[[3]]
[1] 1

[[4]]
[1] 2

[[5]]
[1] 3

[[6]]
[1] 0 1

[[7]]
[1] 1 2

[[8]]
[1] 2 3


=============

Sara Ceccarelli

unread,
Aug 15, 2014, 8:18:28 AM8/15/14
to bioge...@googlegroups.com, mat...@nimbios.org
Thanks for the explanation Nick!

Meanwhile I came to the same conclusion that three-area-ranges are disallowed if even only one of the two-area-ranges is not allowed, after running startified analyses on my own data. As you say, the areas_allowed option was designed for island models, so it is slightly different to Lagrange's "adjacency matrix". In fact Lagrange does allow for a three-area-range even if one of the two-areas option is disallowed. You have a point in that "...all that is required to go from KOM -> KM is an extirpation in O", but wouldn't it also be true in a historical setting that a "linear" range of continuous, bordering (non-island) areas (K-O-M), a widespread ancestor is possible, but the moment the populations become disjunct, only existing in the two extremities of the range (areas K and M), it is allopatric speciation waiting to happen (for example of O is a mountain range that started rising later than the age of the mrca)? In any case I'm very happy with the results :-) And congratulations on the Syst Biol paper!

Cheers,

Sara

Nick Matzke

unread,
Aug 15, 2014, 1:15:03 PM8/15/14
to bioge...@googlegroups.com
Hi Sara -- thanks for digging into this.  What you say about "allopatric speciation waiting to happen" makes sense conceptually, at least assuming no gene flow etc. Getting our conceptual models fully into the computer models is what is sometimes hard!

Probably, in some future update, the ideal setup would be to allow users to specify the complete list of allowed ranges at each stratum...additional suggestions welcome.

Cheers!
Nick



Sara Ceccarelli

unread,
Aug 15, 2014, 2:57:27 PM8/15/14
to bioge...@googlegroups.com, mat...@nimbios.org
Excellent, thanks for all the feedback! Yes, an additional input file as a list of allowed ranges might be a good idea. I realise it isn't as simple as it sounds, so I appreciate you considering it.

Best,

Sara

torsten...@gmail.com

unread,
Dec 7, 2015, 4:13:51 AM12/7/15
to BioGeoBEARS, mat...@nimbios.org
Hi Nick,

hopefully you are doing good in Australia!


I think this topic fits good in this old thread:

In a stratified BioGeoBEARS run I wanted to use an rather complex coding of allowed areas, which I could not express in matrix-form for areas_allowed. Based on the information of your discussion with Sara, I created a list of allowed states for the lists_of_states_lists_0based option. 

Unfortunately, the BioGeoBEARS run ignored this constraint in the stratified analysis.  However, in the function calc_loglike_sp_stratified I could fix this (see the attached script line 260 - 273). 

My question is whether it has been a design-decision not to allow lists_of_states_lists_0based in a stratified analysis? Are there any potential pathologies of this approach? In the discussion with Sara it simply sounds like that in 2014 you were not done yet with the implementation of lists_of_states_lists_0based.

Cheers,
Torsten


# Example code:

library(optimx) 
library(FD)    
library(snow)     
library(parallel)
library(BioGeoBEARS)

calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)
calc_independent_likelihoods_on_each_branch = compiler::cmpfun(calc_independent_likelihoods_on_each_branch_prebyte)

wd = np("~")
setwd(wd)

scriptdir <- np(system.file("extdata/a_scripts", package = "BioGeoBEARS"))
extdata_dir = np(system.file("extdata", package="BioGeoBEARS"))
trfn = np(paste(addslash(extdata_dir), "Psychotria_5.2.newick", sep=""))
geogfn = np(paste(addslash(extdata_dir), "Psychotria_geog.data", sep=""))

# Example of BioGeoBEARS homepage
#################################
BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$force_sparse=FALSE
BioGeoBEARS_run_object$speedup=TRUE
BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$calc_ancprobs=TRUE
BioGeoBEARS_run_object$timesfn = "timeperiods.txt" # Should be in wd
BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt" # Should be in wd
BioGeoBEARS_run_object$max_range_size = 4
BioGeoBEARS_run_object$num_cores_to_use=1
BioGeoBEARS_run_object$force_sparse=FALSE
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$trfn = trfn 
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object$print_optim = FALSE
res = bears_optim_run(BioGeoBEARS_run_object)
Res1Dec <- plot_BioGeoBEARS_results(res, analysis_titletxt = NULL, addl_params=list("j"), 
                                    plotwhat="pie", label.offset=0.7, tipcex=0.6, 
                                    statecex=0.4, splitcex=0.2, titlecex=0.6, plotlegend = FALSE,
                                    plotsplits=TRUE, cornercoords_loc=scriptdir, 
                                    include_null_range=TRUE, 
                                    tr = NULL, tipranges = NULL)

# Using lists_of_states_lists_0based should give the same result but doesn't
################################################################  
areas_allowed_fn = paste0(getwd(), "/areas_allowed.txt")
tmpinputs = NULL
tmpinputs$areas_allowed_fn = areas_allowed_fn
list_of_areas_allowed_mats = read_areas_allowed_fn(inputs=tmpinputs)
areas = c("K", "O", "M", "H")
states_list_0based_index = rcpp_areas_list_to_states_list(areas, maxareas = 4)

L <- list()
L[[1]] <- prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[1]])
L[[2]] <- prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[2]])
L[[3]] <- prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[3]])
L[[4]] <- prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[4]])
L[[5]] <- prune_states_list(states_list_0based_index, list_of_areas_allowed_mats[[5]])

BioGeoBEARS_run_object2 = define_BioGeoBEARS_run()
BioGeoBEARS_run_object2$force_sparse=FALSE
BioGeoBEARS_run_object2$speedup=TRUE
BioGeoBEARS_run_object2$use_optimx = TRUE
BioGeoBEARS_run_object2$calc_ancprobs=TRUE 
BioGeoBEARS_run_object2$timesfn = "timeperiods.txt" # Should be in wd
BioGeoBEARS_run_object2$lists_of_states_lists_0based = L
BioGeoBEARS_run_object2$max_range_size = 4
BioGeoBEARS_run_object2$num_cores_to_use=1
BioGeoBEARS_run_object2$force_sparse=FALSE
BioGeoBEARS_run_object2$geogfn = geogfn
BioGeoBEARS_run_object2$trfn = trfn 
BioGeoBEARS_run_object2 = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object2)
BioGeoBEARS_run_object2 = section_the_tree(inputs=BioGeoBEARS_run_object2, make_master_table=TRUE, plot_pieces=FALSE)
BioGeoBEARS_run_object2$return_condlikes_table = TRUE
BioGeoBEARS_run_object2$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object2$calc_ancprobs = TRUE
BioGeoBEARS_run_object2$print_optim = FALSE # Avoid lnl printing
res2 = bears_optim_run(BioGeoBEARS_run_object2)

Res2Dec <- plot_BioGeoBEARS_results(res2, analysis_titletxt = NULL, addl_params=list("j"), 
                                    plotwhat="pie", label.offset=0.7, tipcex=0.6, 
                                    statecex=0.4, splitcex=0.2, titlecex=0.6, plotlegend = FALSE,
                                    plotsplits=TRUE, cornercoords_loc=scriptdir, 
                                    include_null_range=TRUE, 
                                    tr = NULL, tipranges = NULL)

# Using lists_of_states_lists_0based with modified calc_loglike_sp_stratified gives 
# now the same result than areas_allowed
################################################################
source("calc_loglike_sp_stratified.R") # Should be in wd
res3 = bears_optim_run(BioGeoBEARS_run_object2)

Res3Dec <- plot_BioGeoBEARS_results(res3, analysis_titletxt = NULL, addl_params=list("j"), 
                                    plotwhat="pie", label.offset=0.7, tipcex=0.6, 
                                    statecex=0.4, splitcex=0.2, titlecex=0.6, plotlegend = FALSE,
                                    plotsplits=TRUE, cornercoords_loc=scriptdir, 
                                    include_null_range=TRUE, 
                                    tr = NULL, tipranges = NULL)


On Wednesday, August 6, 2014 at 6:22:31 AM UTC+2, Nick Matzke wrote:
Hi Sara,
.....

PS: I have been meaning to revisit the areas_allowed calculation. At the moment, it works in a stratified analysis by zeroing out the likelihood for disallowed states, when doing the calculations at nodes. Note that this is not quite the same thing as disallowing those states in transitions along the branches. You can disallow those transitions by setting the relevant dispersal events to probability zero with the dispersal_multipliers matrix.  
....
For non-stratified analyses, it is probably preferable to limit the state space directly, either by cutting out areas you aren't using, setting the maximum range size, and/or manually editing the states_list_0based to exclude certain ranges from the state space. Although it may make little difference.

Ideally, rather than the multiplying-the-likelihood-of-disallowed-ranges-by-0-at-the-nodes strategy, I should just literally have different state spaces and different transition matrices in each different time period of a time-stratified analysis. However, this was rather more complex to program, when it was an afterthought in the first place....

Cheers!
Nick




 
calc_loglike_sp_stratified.R

Nick Matzke

unread,
Dec 7, 2015, 8:55:00 AM12/7/15
to bioge...@googlegroups.com
Hi Torsten,

It looks like you've fixed a bug, and/or I never finished that list-of-list-of-states option.  I will update the code when I get a chance (currently traveling), and credit you. Cheers!
Nick


--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears...@googlegroups.com.
To post to this group, send email to bioge...@googlegroups.com.
Visit this group at http://groups.google.com/group/biogeobears.

Pierre Arnal

unread,
May 15, 2018, 3:13:42 AM5/15/18
to BioGeoBEARS
Hi Nick, Torsten and all,

I wish to use an areas allowed matrix and I found this topic which helped me a lot.

However, I could not successfully run the analyse : I still get estimations for the ranges I disallowed.

Using your function on my dataset, Torsten, I get the following error :
Error in calc_loglike_sp(tip_condlikes_of_data_on_each_state = subtree_tip_relative_probs_of_each_state,  : 
  object 'printlevel' not found

Did you get this kind of errors ?

@Nick, do you have implemented the code to your package or should I try to understand what is wrong with my data ?

All the best,

Pierre

Nick Matzke

unread,
May 15, 2018, 4:05:53 AM5/15/18
to bioge...@googlegroups.com
Hi, 

That one is easy, just add this to the code somewhere:

printlevel = 1

Cheers!
Nick


To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears+unsubscribe@googlegroups.com.

To post to this group, send email to bioge...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages