Answers to some questions:
On 1/22/14 11:04 PM, Alex Pyron wrote:
> Hi Nick,
>
>
> If you'd prefer me to move this to the group instead of
> bugging you directly, please let me know. I tried to find
> the answers in the current documentation; apologies if I
> overlooked something. A few questions:
>
> 1) It seems that you *must* have the max number of areas
> equal to that observed in the extant taxa. Is this the case?
> For me it fails when the max is greater.
Theoretically, if your geography data is e.g.:
3309 8 (A B C D E F G H)
sp1 11000000
sp2 00000010
...etc...
It should work for anything from 2 to 8 areas (or,
hopefully, 1 to 8 areas, but I haven't extensively tested 1
area, which might require re-parameterization).
The number of states in the transition matrix is much bigger
for max_range_size = 8:
numstates_from_numareas(8,2)
36
numstates_from_numareas(8,8)
255
...so perhaps you are hitting some memory issue with the
larger matrix? I wouldn't necessarily think so, (3309
species + 3308 internal nodes) * 255 possible states isn't
THAT big a data table (1687335 cells).
In the Psychotria example, you can see that the default
allows species to live in up to 4 areas, even though each
individual species is observed in only 1 area.
Sometimes, particular configurations of allowed areas and
constraints can result in 0 likelihood (-Inf log-likelihood)
for certain parameter values, which can crash the ML search.
The solution is usually to tweak the min/max values for the
parameters, and/or change 0 to e.g. 0.0000001 in e.g. the
probability multiplier constraint matrix.
> 2) How exactly do we interpret the node/branch labels? At
> first I thought that the node simply showed the two states
> descended from that node, and that the label on the base of
> the branch was the state for that branch. However, I see
> some cases in which the label on a node is not simply the
> upper and lower states, and is different from the label at
> the base of the branch. What changes are possible along
> branches and at nodes?
So, the labels/pie charts AT the nodes represent the states
(states = geographic ranges) at the branch TOPS,
infinitesimally before the "instantaneous" speciation event.
The labels/pie charts at the "corners" on the phylogeny,
i.e. the branch bottoms, represent the states at the branch
BOTTOMS, i.e. infinitessimally AFTER the "instantaneous"
speciation event.
These are distinguished, because in the
LAGRANGE/BioGeoBEARS-type biogeography models, unlike most
evolutionary models, special processes happen
instantaneously at speciation.
Some versions/settings of LAGRANGE seem to report the joint
probabilities of (ancestor, left descendant, right
descendant) range inheritance scenarios. I didn't do this,
because it is harder to keep track of, and harder to parse
afterwards, and I'm not sure what it accomplishes, since
most people care mostly about the states.
BioGeoBEARS by default reports ancestral state probabilities
under the globally optimum model. This is different than
what LAGRANGE does (local optimization). Local optimization
can be done in BioGeoBEARS, but you have to write a for-loop
and use the fixnode option.
These are both different than finding the "single best joint
history". For the different types of ancestral states, see
the Matzke 2013 Ph.D. thesis and Felsenstein (2004).
The upshot is that the best states at each node in
BioGeoBEARS or LAGRANGE cannot always be read exactly as a
consistent joint history. However, finding the single best
joint history is nontrivial, and even if found, it might
have a tiny probability compared to all other possible
histories.
For more, see:
1. BIOGEOBEARS VS LAGRANGE ANCESTRAL STATES
https://groups.google.com/forum/#!searchin/biogeobears/ancestral$20state/biogeobears/LlbN1jW8w5E/TxqZZuX6d3sJ
> 3) Is there a simple way to get a LAGRANGE-like output,
> where the potential states are listed for every node, ranked
> by relative probability (or likelihood, or AIC weight,
> whatever)? Presumably this will be needed for publication,
> to report exact values.
Yes, see:
2. NODE NUMBERS IN R / BIOGEOBEARS FOR ANCESTRAL STATES
https://groups.google.com/forum/#!searchin/biogeobears/ancestral$20state/biogeobears/LlbN1jW8w5E/TxqZZuX6d3sJ
E.g.:
resDEC$ML_marginal_prob_each_state_at_branch_top_AT_node
In this table, the rows correspond to the APE node numbers,
and the columns correspond to each possible ancestral
state=each possible geographic range.
The plot_BioGeoBEARS_results() function just extracts these
probabilities from the resDEC object (or resDECj or
whatever) to plot either the best single state, or the pie
charts of all states.
IMPORTANT NOTES:
A. Phylogeny node numbers are ALL DIFFERENT (sigh!) in the
output of:
APE/BioGeoBEARS
DIVA
Python LAGRANGE
C++ LAGRANGE.
See get_lagrange_nodenums() and
postorder_nodes_phylo4_return_table().
B. WHEN THERE ARE TIES FOR THE MOST PROBABLE STATE, THE
*FIRST* STATE (leftmost column in e.g.
resDEC$ML_marginal_prob_each_state_at_branch_top_AT_node)
IS CHOSEN FOR PLOTTING. This can make some history plots
look odd.
This decision was made to avoid crashes that were occuring
when there were ties. Look at the pie charts to see when
this is the case.
> 4) Where are the control parameters for plotting the
> phylogeny? I need to reduce the tip and branch labels, and
> pie charts. I also need to make a longer PDF file, but it
> fails when I set height over 200.
As you can see from the example:
res1 = plot_BioGeoBEARS_results(results_object,
analysis_titletxt, addl_params=list("j"), plotwhat="text",
label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6,
titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir,
include_null_range=TRUE, tr=tr, tipranges=tipranges)
...the control paramters are the "...cex" parameters in
plot_BioGeoBEARS_results(). cex refers character expansion,
as with other R plotting (I use it for the pie charts etc. too).
If I didn't cover everything, you might have to make a
custom version of plot_BioGeoBEARS_results(), call it
plot_BioGeoBEARS_results2(), and source it.
plot_BioGeoBEARS_results() mostly just accesses the standard
APE plotting functions, so it shouldn't be too hard to
figure out.
> I also need to make a longer PDF file, but it
> fails when I set height over 200.
You have a huge tree, so there might not be an easy solution
to getting it all to look nice. With some R skillz the tree
could be split up, as well as the
resDEC$ML_marginal_prob_each_state_at_branch_top_AT_node;
this would have to track the original versus new node numbers.
Re: max number of inches in the PDF. Did you increase the
width as well? There might be some absolute limit
regardless? I hear there are alternatives to pdf() for
plotting, but I can't remember what they are. You can also
comment out #pdf() and thus plot to screen, and then save to
whatever format.
> 5) How would you philosophically interpret the
> reconstruction of single areas from time periods where they
> were accreted? For instance, I'm getting firm "Nearctic"
> reconstructions with multiple "dispersals" into the
> Palearctic prior to the breakup of Laurasia.
The interpretation would depend on the other constraints, if
any...BioGeoBEARS by default just thinks of each individual
area as a 1 or a 0, occupied or not, with all areas
"equidistant"/equally connected, unless you've given it
information about connectivity or distance.
I'm happy to take a look if you like.
> All of these questions pertain to the settings as given in
> the out-of-the-box /Psychotria/ example, adapted to my data.
> I've got it running for 8 areas (2 max) with 3309 taxa.
> Thanks!!! I love BioGeoBEARS!!!
Awesome! Glad it's basically working!!
Cheers, Nick
>
> Alex
>
>
> R. Alexander Pyron, Ph.D.
> Robert F. Griggs Assistant Professor of Biology
> Department of Biological Sciences
> The George Washington University
> 2023 G St. NW, Lisner Hall 345
> Washington, D.C. 20052
> Phone:
202-994-6616
>
http://www.colubroid.org/
>
>
> On Mon, Jan 20, 2014 at 8:34 PM, Alex Pyron
> <
rpy...@colubroid.org <mailto:
rpy...@colubroid.org>> wrote:
>
> Hi Nick,
>
>
> I had pretty much surmised what you state is the case,
> re: merging areas. I will definitely experiment.
>
> These datasets I have (all amphibians and all squamates)
> definitely cry out for a full-scale analysis with many
> areas, and a revised version of how we conceive of
> biogeographic events. I will run my amphibian analyses
> within the current constraints of BioGeoBEARS, but I'd
> certainly be interested in a collaboration to do
> something better, if you'd like to use these datasets.
> Certainly, if we developed a new model with
> time-stratified area-merging and differential vicariance
> probabilities, that would be a worthwhile project.
>
> Good luck with jobs! I've noticed that the market has
> actually not been too bad for people with sophisticated
> research programs in ecology and evolution lately.
>
>
> Alex
>
> R. Alexander Pyron, Ph.D.
> Robert F. Griggs Assistant Professor of Biology
> Department of Biological Sciences
> The George Washington University
> 2023 G St. NW, Lisner Hall 345
> Washington, D.C. 20052
> Phone:
202-994-6616 <tel:
202-994-6616>
>
http://www.colubroid.org/
>
>
> On Mon, Jan 20, 2014 at 8:24 PM, Nick Matzke
> <
mat...@berkeley.edu <mailto:
mat...@berkeley.edu>> wrote:
>
>
>
> On 1/20/14 7:56 PM, Alex Pyron wrote:
>
> Nick,
>
>
> Also, have you devised a cogent way to define
> area-connectivity matrices in the past? In the
> /Psychotria
> /example, it's clear we want to eliminate
> islands from
>
> consideration before they emerged, but what
> about the
> opposite? I'd like to account for the
> re-accretion of
> Gondwanaland as we move back in time, so that at
> some point
> in the past, South America as an area is
> connected to Africa
> as an area, for example. Ideally, it would seem
> like we
> would want this to be a new area (with a reduced
> matrix)
> rather than just SA+AF. Even if we just do the
> latter, how
> do we account for their connection? Just by
> giving them a
> higher dispersal multiplier in the relevant
> time-slice?
> Another good question for Google Groups!
>
> Alex
>
>
>
>
>
> Hi again!
>
> So, this Google Groups link:
>
>
https://groups.google.com/__forum/#!topic/biogeobears/__XXaJqmI232o
> <
https://groups.google.com/forum/#!topic/biogeobears/XXaJqmI232o>
>
> ...describes what currently exists in terms of
> specifying dispersal multipliers, distance-based
> dispersal exponent, area connectivity, and areas
> allowed; all of these can be time-stratified.
>
> Re: merging areas back in time -- a number of people
> have talked about this; however, officially merging
> areas back in time would require some fundamental
> re-writes of the logic of the calculations.
> Basically, you would have different state-spaces
> in different time-slices, and there would have to be
> some machinery to split them, and to link the
> likelihoods between the different state spaces.
> Doing such a re-write is imaginable but it would
> be a paper's worth of work!
>
> Thinking about it now, a much easier solution would
> just be to e.g. pretend that Madagascar didn't exist
> before the time when it splits from Africa, and then
> just remember in our heads that "Africa" means
> "Africa+Madagascar" in previous time slices, and
> re-specify distances and connectivity accordingly.
> Not sure if this is "perfect", it would take some
> carefuly thinking. Probably it would work best if
> there was an intermediate time-slice with high
> connectivity, to make sure the Madgascar guys can
> get over to Africa before Madagascar "disappears".
> This may lose the point of merging the areas, though.
>
> One could also imagine doing different analyses with
> different time slices, and manually feeding the
> results down. This is kind of suboptimal though as
> presumably one would prefer to have a joint analysis.
>
> More important than all of the above, I would think,
> would be to come up with a better model for
> vicariance. Currently, the BioGeoBEARS vicariance
> model is just copied from LAGRANGE and "DIVALIKE"
> (likelihood implementation of DIVA's processes, not
> the parsimony part of DIVA though). These assume
> that all vicariance events that are possible under
> the model have equal conditional probability. But,
> that's a pretty silly model, especially when the
> dispersal and jump dispersal models allow an
> influence of distance, connectivity, etc.
>
> Programming an improved model could probably get
> done in a day or two, but I've got a stack of things
> to do as I'm jamming to get stuff published and
> apply for jobs this summer/fall! But I'm happy to
> give advice on the current BioGeoBEARS, or if we
> really get into it, especially with new models,
> perhaps we could consider collaboration.
>
> Cheers!
> Nick
>
> PS: Here's the semi-private link to the
> articles/submissions describing DEC+J etc.:
>
>
https://googledrive.com/host/__0B6HJElw5txrwZlhjQ0ZUcEdzUG8/__2013-08-10_matzke_PhD.html
> <
https://googledrive.com/host/0B6HJElw5txrwZlhjQ0ZUcEdzUG8/2013-08-10_matzke_PhD.html>
>
> Cheers!
> Nick
>
>
>
>
>
> R. Alexander Pyron, Ph.D.
> Robert F. Griggs Assistant Professor of Biology
> Department of Biological Sciences
> The George Washington University
> 2023 G St. NW, Lisner Hall 345
> Washington, D.C. 20052
> Phone:
202-994-6616 <tel:
202-994-6616>
>
http://www.colubroid.org/
>
>
> On Mon, Jan 20, 2014 at 7:35 PM, Alex Pyron
> <
rpy...@colubroid.org
> <mailto:
rpy...@colubroid.org>
> <mailto:
rpy...@colubroid.org
> <mailto:
rpy...@colubroid.org>>> wrote:
>
> Hi Nick,
>
>
> Yes, please, let's share the result on
> Google! That
> didn't occur to me before emailing you or I
> would have
> done it straight away. I will experiment
> with these
> suggestions, as well as with fewer areas.
> However, it's
> the fine subdivision of areas that I hope
> will yield the
> interesting results (e.g., separating
> Madagascar from
> Africa). If nothing else, I could run
> multiple models
> with different areas combined to address
> different parts
> of the tree.
>
>
> Alex
>
> R. Alexander Pyron, Ph.D.
> Robert F. Griggs Assistant Professor of Biology
> Department of Biological Sciences
> The George Washington University
> 2023 G St. NW, Lisner Hall 345
> Washington, D.C. 20052
> Phone:
202-994-6616 <tel:
202-994-6616>
> <tel:
202-994-6616 <tel:
202-994-6616>>
>
>
http://www.colubroid.org/
>
>
> On Mon, Jan 20, 2014 at 7:28 PM, Nick Matzke
> <
mat...@berkeley.edu
> <mailto:
mat...@berkeley.edu>
> <mailto:
mat...@berkeley.edu
> On 1/20/14 7:07 PM, Alex Pyron wrote:
>
> Hi Nicholas,
>
>
> I would really like to use
> BioGeoBEARS for a
> large-scale
> dataset, and evaluate the
> alternative models
> (especially
> founder-event speciation). My
> phylogeny (all
> amphibian
> lineages) has 3309 taxa, divided
> R. Alexander Pyron, Ph.D.
> Robert F. Griggs Assistant
> Professor of Biology
> Department of Biological Sciences
> The George Washington University
> 2023 G St. NW, Lisner Hall 345
> Washington, D.C. 20052
> Phone:
202-994-6616
> <tel:
202-994-6616> <tel:
202-994-6616
> <tel:
202-994-6616>>
>
http://www.colubroid.org/
>
>
>
>
>
>
>