Hi all,
Here I answer a question about the "areas_allowed" and
"dispersal_multipliers" options.
On 11/7/13 4:18 PM, Francisco Vel�squez wrote:> Hi, great,
feel free to add me, thank you!
>
>
> On Thu, Nov 7, 2013 at 4:03 PM, Nick Matzke
> <
mat...@berkeley.edu <mailto:
mat...@berkeley.edu>> wrote:
>
> Hi! Do you mind if I add you to the BioGeoBEARS google
> group and post an answer there? Cheers!
> Nick
>
>
>
> On 11/7/13 4:01 PM, Francisco Vel�squez wrote:
>
> Dear Dr. Matzke
>
> I have some further questions regarding the
BioGeoBears
> package. These questions are related to the
> "areas_allowed",
> and "dispersal_multipliers" options. From what I have
> understood the first one refers to a file containing
> connections between areas, however I am not quite
> sure to
> what does the second option refers to, I would really
> appreciate if you could give a brief explanation.
>
> Regarding my last issue, I could finally run the
DEC-J
> analysis with the help of your previous advice.
>
> Best regards.
>
> Francisco
>
>
>
>
> --
> Francisco Vel�squez
> M.Sc. Ecology, Evolution and Systematics
> Research assistant
> Antonelli Lab
>
http://www.antonelli-lab.net/people.php
I. DISPESRAL MULTIPLIERS
The "dispersal_multipliers" input just replicates the
feature of LAGRANGE, where users can input the relative
probability of dispersal between each region, in the form of
a matrix. E.g., perhaps you think that the probability of
dispersal between continents is 1/10th the probability of
dispersal within a continent. You would then put a 1 for
within-continent dispersal events, and a 0.1 for
between-continent dispersal events.
The dispersal multipliers files are just text files. Here
is the format for replicating the "M2" model of Ree & Smith
(2008) -- this was Hawaiian Psychotria, where only eastwards
dispersal was allowed.
(everything between the ====== is the text file, don't put
==== in the text file!)
manual_dispersal_multipliers_eastward_only_wZeros.txt
==============================
K O M H
1 1 0 0
0 1 1 0
0 0 1 1
0 0 0 1
END
==============================
If you run the default example Psychotria dataset with this
dispersal multipliers file, you will find that you reproduce
the parameter inferences and log-likelihood of the data that
you will get running the same model in LAGRANGE (Python or
C++) under the M2 model.
NOTE #1: When you do a DEC+J version of the model, the SAME
dispersal multipliers get applied to "j" events
(founder-events at speciation) as to "d" events
(range-expansion events along branches). This is not the
only way that reality could work, but it seemed "fairest"
for the initial comparisons of DEC and DEC+J.
II. TIME-STRATIFIED DISPERSAL
You can use the same strategy to replicate the
time-stratified analyses allowed by LAGRANGE. Here, you
just have a series of dispersal matrices in the same file:
manual_dispersal_multipliers_with_0s.txt
==================
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
K O M H
1 1 1 0
1 1 1 0
1 1 1 0
0 0 0 1
K O M H
1 1 0 0
1 1 0 0
0 0 1 0
0 0 0 1
K O M H
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
K O M H
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
END
==================
...and you also have to have a times file:
timeperiods.txt
==================
0.5
1.9
3.7
5.1
10
==================
...these are the time-points at which each of the main
Hawaiian Islands emerged. The 10 my is arbitrary, the last
age just has to be older than the bottom of the tree (5.2 my
in the case of Psychotria).
Adding these two files will replicate the "time-stratified"
analysis of Hawaiian Psychotria in Ree & Smith (2008).
(However, as people who have read my SysBio submission know,
I have noticed that there are slight differences in the
results of time-stratified analyses, between C++ LAGRANGE,
2012 Python LAGRANGE, and 2013 Python LAGRANGE (which
contained a bug fix). BioGeoBEARS agrees with 2013 Python
LAGRANGE for the above analysis.)
NOTE #2: To make these analyses work, you have to make sure
you give the BioGeoBEARS_run_object the correct
path+filename of each file you want it to see. You also
have to uncomment some/all of these lines in the example
script, and edit the file names to match whatever your
filenames are:
http://phylo.wikidot.com/biogeobears#toc7
================================
# Set up the stratified part
#BioGeoBEARS_run_object$timesfn = "timeperiods.txt"
#BioGeoBEARS_run_object$dispersal_multipliers_fn =
"manual_dispersal_multipliers.txt"
#BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
# Divide the tree up by strata
#BioGeoBEARS_run_object =
section_the_tree(inputs=BioGeoBEARS_run_object,
make_master_table=TRUE, plot_pieces=FALSE)
================================
NOTE #3: In some cases, I have found that having zeros in
dispersal matrices causes crashes in the ML search, perhaps
because of precision under-runs, or because with certain
data, all possible histories have probability 0.
In these cases, the solution is to replace the zeros with
some extremely low value, e.g.:
manual_dispersal_multipliers_without_0s.txt
=============================
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
K O M H
1 1 1 0.0000001
1 1 1 0.0000001
1 1 1 0.0000001
0.0000001 0.0000001 0.0000001 1
K O M H
1 1 0.0000001 0.0000001
1 1 0.0000001 0.0000001
0.0000001 0.0000001 1 0.0000001
0.0000001 0.0000001 0.0000001 1
K O M H
1 0.0000001 0.0000001 0.0000001
0.0000001 1 0.0000001 0.0000001
0.0000001 0.0000001 1 0.0000001
0.0000001 0.0000001 0.0000001 1
K O M H
1 0.0000001 0.0000001 0.0000001
0.0000001 1 0.0000001 0.0000001
0.0000001 0.0000001 1 0.0000001
0.0000001 0.0000001 0.0000001 1
END
=============================
III. AREAS ALLOWED MATRICES
LAGRANGE (Ree & Smith 2008) addresses the question of
changing geography by changing the dispersal multipliers at
different time points. I.e., it is impossible to disperse to
the Big Island before it emerges at 0.5 Ma.
However, I don't think this fully represents the mental
model that researchers have. Simply disallowing dispersal
before 0.5 Ma does not rule out the possibility, for
instance, that the ancestor of a Hawaiian clade lived on ALL
the islands, even though we know that those islands didn't
exist at all at that time. Since likelihood methods are
calculating the probability of all possible histories, it is
important that we rule out these absurd/impossible histories.
This is most simply done by simply deleting areas as you go
back in time, and re-calculating the probabilities only
between ranges that are allowed.
You can do this with an areas_allowed file, which is
formatted like this:
areas_allowed.txt
==================
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
K O M H
1 1 1 0
1 1 1 0
1 1 1 0
0 0 0 0
K O M H
1 1 0 0
1 1 0 0
0 0 0 0
0 0 0 0
K O M H
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
K O M H
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
END
==================
(this assumes the same time-points as in the other analyses,
above)
If you wanted to directly compare to the LAGRANGE
assumption, you would put in LAGRANGE's implicit assumption,
which is that the areas are existing, even when dispersal is
disallowed:
areas_allowed_all1s.txt
=======================
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
K O M H
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
END
=======================
IV. OTHER STUFF
BioGeoBEARS also allows:
(a) dispersal as a function of distance. Procedure: input a
distances file (similar format to above), and make the
distance exponent parameter "free".
Format: same as dispersal multiplers, just put in distances
instead.
(b) extinction (really range contraction) as a function of
area size. Procedure: input an areas file, make the relevant
parameter "free".
area_of_areas file format:
area_of_areas.txt
======================
K O M H
10 15 20 25
END
======================
...or you could do time-stratified:
area_of_areas_time.txt
======================
K O M H
10 15 20 25
K O M H
15 15 20 25
K O M H
20 15 20 25
END
======================
I have not extensively tested these last two models. Whether
or not they are statistically significantly better than null
models (where the parameters are fixed to 0) is an empirical
question, which should be tested for any given dataset.
Inferring extinction/range contraction is, in general, very
difficult, and so (b) will probably be difficult unless
you've got lots of fossils in your tree.
Cheers!
Nick