BEAST for phylogeny estimation

2,571 views
Skip to first unread message

Emily Gillespie

unread,
Mar 18, 2010, 5:25:28 PM3/18/10
to beast-users
Hi all,
A general question: Has anyone run into issues using BEAST simply for
phylogeny estimation? I have compared BEAST runs to MrBayes runs I've
done repeatedly. Results in BEAST, in terms of topology and posterior
support, are virtually identical to MrBayes, but in a fraction of the
time. I can't always use exactly the same model, but assuming that the
data aren't terribly sensitive, can anyone think of a reason to not
use BEAST for this more general task?

Tod Reeder

unread,
Mar 18, 2010, 7:10:35 PM3/18/10
to emilylg...@gmail.com, beast-users
Though it seems like it has been slow to catch on, it seems that Drummond et al (2006) and Pybus (2006) made compelling arguments for why we should be using the relaxed clock model approach for phylogenetic inference in general (i.e., not just for estimating molecular divergence times).


P Please consider the health of our earth before printing this e-mail.

Tod W. Reeder
Department of Biology
San Diego State University
San Diego, CA  92182-4614

P Please consider the health of our earth before printing this e-mail.


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To post to this group, send email to beast...@googlegroups.com.
To unsubscribe from this group, send email to beast-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.



Gerald Schneeweiss

unread,
Mar 19, 2010, 4:27:16 AM3/19/10
to beast-users
Hi Emily,

in this context, you may want to have a look at: Wertheim et al. 2010:
Relaxed molecular clocks, the bias-variance trade-offs and the quality of
phylogenetic inference. Systematic Biology 59: 1-8.

Cheers,
Gerald

Marcelo Gehara

unread,
Mar 19, 2010, 6:51:15 AM3/19/10
to emilylg...@gmail.com, beast-users
Good point Emily
I already thought about using BEAST to simply estimate phylogeny, and
I often check the tree when I'm using BEAST for dating estimation. But
I've never compared with MrBayes.
Why not use BEAST for phylogeny?
A matter of fashion maybe.

2010/3/18 Emily Gillespie <emilylg...@gmail.com>

Paul Lewis

unread,
Mar 19, 2010, 11:04:46 AM3/19/10
to beast-users
While Beast may indeed be faster at calculating likelihoods than
MrBayes, the speed difference could also be due to the fact that, by
default, MrBayes does 2 independent runs and each run employs 4 Markov
chains. Thus, MrBayes can be expected to take about 8 times longer
than Beast unless you use "mcmcp nrun=1 nchain=1" in MrBayes.

Chris

unread,
Mar 19, 2010, 12:09:01 PM3/19/10
to beast-users
Hi Emily,

You should also consider the underlying models in BEAST... i.e., not
just nucleotide substitution rates/patterns, but also the tree prior
and its relationship to your sampling design. If you are interested in
estimating a phylogeny of species (or reproductively isolated
populations) using multiple individuals and/or multiple loci, then
maybe the *BEAST coalescent model would be most appropriate for your
question?

Best,
Chris

julien

unread,
Apr 2, 2010, 1:25:41 PM4/2/10
to beast-users
Hi Emily,

Beast is obviously more flexible and powerful than MrBayes because of
the many priors available. However, setting priors without a good
knowledge of their effect on the posterior may be risky. That's why I
usually perform first runs without data, to check it out ...
Afterwards, several runs are often still needed to further optimize
operators. Then come the final analyses of several runs.

MrBayes is much simpler to use. Very often you get convergence in
first runs and I would suggest people not willing to further develop
their knowledge of Bayesian analyses to use MrBayes. Simpler is not
better however...

Best

Julien

Brisa

unread,
Nov 27, 2013, 10:30:40 AM11/27/13
to beast...@googlegroups.com
a question:
The algorithm implemented in the program MrBayes uses MCMCMC and Beast MCMC. What you would use and why.?

Santiago Sanchez

unread,
Nov 27, 2013, 12:10:30 PM11/27/13
to beast...@googlegroups.com
Hi,

You can also use MCMCMC in BEAST. On your second question, I guess it depends on what you want. In BEAST, (I think) you have more options in terms of models, distributions, and prior specifications for phylogenetic-based evolutionary analyses; including divergence times, molecular clocks, discrete/continuous state reconstruction, population/coalescent, incomplete lineage sorting (*BEAST), etc..

If you are just interested in topology, probably MrBayes will get you there faster.

Cheers,
Santiago

Santiago Sanchez-Ramirez
Ecology and Evolutionary Biology, University of Toronto
Natural History (Mycology), Royal Ontario Museum
100 Queen's Park
Toronto, ON
M5S 2C6
Canada
--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.

To post to this group, send email to beast...@googlegroups.com.

skylamb

unread,
Nov 28, 2013, 4:48:19 AM11/28/13
to beast...@googlegroups.com
I also have a question about this. I am running Mr Bayes and *beast, using several mtDNA sequences for around three hundred sequences. My aim is to use star beast to calibrate the divergence time, thus I used calibrated Yule as tree prior. However, when I compare the topology result produced by star beast with mrBayes, it is quite different from mrBayes, not deep diverged branch, but shallow branches. Because people regard mtDNA as one locus, I dont think there are problems of imcomplete sorting here. Could anyone please give me some hint? I dont know if this is the problem of my setting. I put 8million years with normal distribution as a fossil calibration, but when I check the annotated figure, the node age is  239. I really hope any of you can offer some help about this..........


Thanks a lot!

Wang

Andrew Rambaut

unread,
Nov 28, 2013, 4:51:49 AM11/28/13
to beast...@googlegroups.com
A few points - 

If you only have mtDNA then don’t use *BEAST - this is intended for multi locus (not just linked genes). 
You just have one gene tree so that is your best estimate of the species tree.

A normal distribution has a non-zero probability down to zero. If your fossil provides a hard upper (in the 
geological sense) bound then use a lognormal with an offset to this bound.

Andrew



--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.

Andrew Rambaut 
Institute for Evolutionary Biology | Centre for Infection, Immunity & Evolution 
Ashworth Laboratories, University of Edinburgh, Edinburgh, EH3 9JT, UK


skylamb

unread,
Nov 28, 2013, 5:06:59 AM11/28/13
to beast...@googlegroups.com
Hi Andrew,

Thank you for the explanation. I dont intend to get the species tree, just want the divergence time. I have several species, and for one major species it has lots of different lineages, so I think it might be inappropriate to using Yule prior in beast. I am looking for a method which can incorporate both coalescent process between populations and speciation process between species. I guess starbeast can do this two things at the same time, so I try to use it. Isnt is applicable for this purpose?


Thanks,
Wang

Alexei Drummond

unread,
Nov 28, 2013, 5:07:07 AM11/28/13
to beast...@googlegroups.com
Hey Andrew,

From a theoretical standpoint, I would beg to differ with you on your first point. In terms of accurate posterior probabilities of clades (i.e. avoiding over-confidence), it can be better to use *BEAST even with only a single gene (assuming the multispecies coalescent model is correct!). I think we even say this in the second BEAST paper (Drummond et al, 2012)? It depends on the ratio of species age to population size and prior settings, but it is even possible for the *BEAST tree to be more accurate than a single gene tree (if you include divergence times and not just topology in your error measure). We have a manuscript soon to be submitted that demonstrates these points as an aside to a larger simulation and performance study.

Cheers
Alexei

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Andrew Rambaut

unread,
Nov 28, 2013, 5:10:54 AM11/28/13
to beast...@googlegroups.com
Yes, I regretted saying that as soon as I sent it - I haven’t had enough coffee. I will correct.

A.

Andrew Rambaut

unread,
Nov 28, 2013, 5:13:01 AM11/28/13
to beast...@googlegroups.com
I am going to retract my statement that you shouldn’t use *BEAST for 1 gene locus and blame it on insufficient coffee this morning. If you have lots of individuals from each species then this is an appropriate model to use. But my point about using normal calibration priors still holds. 

A.

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.

Krzysztof M. Kozak

unread,
Nov 28, 2013, 9:43:36 AM11/28/13
to beast...@googlegroups.com
Dear Brisa,

As Paul Lewis describes above, MrBayes runs multiple analyses at once. The default is to run 2 parallel analyses (just like running two instances of BEAST independently), which are then compared for convergence. Critically, each of these has four independent chains exploring the tree space at various degree of conservatism ("heat"). This is called Metropolis-coupling, hence MCMCMC or MC3. In contrast, a single BEAST instance has only one chain. The heating parameter corresponds to one of the operators in BEAST.  If one of the four chains stumbles upon a more optimal tree, that tree is chosen. The claim is that having these multiple parallel chains increases the chance of finding a more credible tree, as three chains run through the parameter space like crazy, and the most conservative one doesn't deviate much from the current optimal tree. I vaguely recall one of the authors of BEAST stating that there is a little empirical evidence to support this claim, but MC3 was added to BEAST b/c of user requests. I suspect that running multiple analyses in BEAST and playing with operators is just as good as using MC3 in MrB.

Cheers,
Chris

Andrew Rambaut

unread,
Nov 28, 2013, 10:11:40 AM11/28/13
to beast...@googlegroups.com
Dear All,

Chris describes MC^3 well. If you are running 4 chains (1 cold chain, 3 heated) you are doing 4 times the computational work but only the cold chain is being
sampled. So for it to be worth doing, the hot chains have to help the cold chain sample 4 times more efficiently than a single chain (MCMC). You could measure efficiency as ESS per cpu hour or ESS per Watt of electricity. There may be some optimum combination of number of chains and heating scheme that achieves greater sampling efficiency than a single chain but I haven’t seen anyone demonstrate this. 

Andrew


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages