Substitution model obtained in jModeltest - is correct to use in BEAST?

3,227 views
Skip to first unread message

Berenice

unread,
Jul 29, 2014, 2:10:20 PM7/29/14
to beast...@googlegroups.com
Hi everyone, my question is related to a comment of a reviewer. 

In my paper I used jModeltest to estimated the best substitution model for my data. The selected model was GTR + I + G with AIC and with BIC criterion. Then I estimated the tree and the divergence time using BEAST. In BEAST I selected as clock model the relax option (because selecting this option I know the same if is correct the relax or the strict - If the ucld.stdev parameter estimate is close to 0.0 then the data is quite clock-like (please tell me if I´m wrong, because I´m just learning) - and the yule process as tree model because I have different species in my dataset.

The reviewer told me this:

Model selection: you can't select models for BEAST in jModeltest, they use different models (same

substitution models but different clock and tree models). For BEAST there are other methods (ea.
Bayes factors). If you needed ultrametric tree for dating you could have also used jModeltest +
phyml + r8s as an alternative (the use of jModeltest would be appropriate in this context).

I thought that use jModeltest before BEAST was ok , I know that jModeltest use PhyML for the calculation of the tree topology but I don´t see the conflict.

What do you think? I asked this to my profesor and tell me that maybe if I run the analisis with at least two different option (e.g. strict vs relax and yule vs birth-death) and compare in tracer with bayes factor would be better and I would approach what the reviewer ask me.

I found a table with rule´s of decision for bayes factor in the MrBayes manual, it is ok?

I would greatly appreciate if you can help me,

Best regards, 

Berenice







Guy Baele

unread,
Jul 29, 2014, 2:48:28 PM7/29/14
to beast...@googlegroups.com
Hi Berenice,

You are indeed making a few assumptions, as the reviewer points out, when using jModelTest to determine your substitution model and then using BEAST for your further analyses. I personally like to decide first whether to use maximum likelihood or Bayesian inference and then stick with my choice (if possible).

You can indeed use BEAST to calculate Bayes factors, but please do not do this in Tracer, because your results will be unreliable.
Path sampling and/or stepping-stone sampling will allow you to estimate the necessary log marginal likelihoods to compute your Bayes factors.
I've recently put the tutorial back online (for BEAST users): http://rega.kuleuven.be/cev/ecv/tutorials/model-selection-tutorial
The papers dedicated to performing model selection using BEAST can also be found there (all of them providing evidence that model selection via Tracer is not a good idea, when it's feasible to use path sampling and/or stepping-stone sampling).

The cutoffs for Bayes factors can be found in a paper by Kass & Raftery (1995).
While the ones in the MrBayes tutorial are bound to be the same, I would give credit where credit's due and cite Kass & Raftery (1995).

Best regards,
Guy


Op dinsdag 29 juli 2014 11:10:20 UTC-7 schreef Berenice:

Berenice

unread,
Jul 30, 2014, 9:37:37 AM7/30/14
to beast...@googlegroups.com
Hi Guy, thank you so much for your answer. 

Can you help me to understand which assumptions I´m doing when I use jModelTest to determine the substitution model before BEAST?

I´m sorry, but it´s difficult to me understand how to follow the tutorial that you told me. First, the path sampling and the stepping stone are only to compare clock model? o also for substitution and tree models? Second, if I have two xml files one with strict clock and one with relax clock I have to copy the block in the two of them?

Thanks again,

Best,

Bere

Guy Baele

unread,
Jul 30, 2014, 3:09:31 PM7/30/14
to beast...@googlegroups.com
Hi Berenice,

Well, you're not simply comparing substitution models, you're comparing collections of models of which only the substitution model is different (i.e. the underlying clock model, demographic model, ... is the same). What you are doing is assuming that while the underlying models change, this will have no influence on the outcome of the model selection process for the substitution model. It's something that can be discussed endlessly, which is why it's better to avoid the problem altogether. This is something I have not explored myself (nor am I aware of anyone else who has done this), I'm more worried about using different inference techniques (i.e. maximum likelihood and Bayesian inference, or one that does not explicitly model uncertainty and another that does). A look on this forum will tell you that even comparing results between MrBayes and BEAST (while both being Bayesian inference software packages) is not easy.

No, path sampling and stepping-stone sampling are not only for comparing clock models. If you look at (or better: into) the papers mentioned in the tutorial, you'll see that these techniques are being used to compared demographic models, clock models, substitution models and even different priors. So please make an effort to read those papers and have a look at the examples provided with them.

Yes, you can copy the same blocks of XML code (remember to copy all 3: marginalLikelihoodEstimator, pathSamplingAnalysis and steppingStoneSamplingAnalysis) into the 2 different BEAST XML files. In order to avoid any problems, you'd better already double-check if you have defined proper priors for all your parameters (which is what the second paper listed is about). I know this all seems a bit daunting, but once you get the hang of it, it's well worth it.

Best regards,
Guy


Op woensdag 30 juli 2014 06:37:37 UTC-7 schreef Berenice:

Matt Buys

unread,
Jul 30, 2014, 8:32:07 PM7/30/14
to beast...@googlegroups.com
Blimey - I have made the same "error" as Berenice.  I find this all disconcerting and long for the days when you only had two things to contend with: plesio- or apomorphic!  I would have thought that with technology and time (wisdom) things would have gotten easier. Now you need to be an "elitist" technocrat to survive in this game.

Berenice

unread,
Jul 31, 2014, 3:08:39 PM7/31/14
to beast...@googlegroups.com
Hi Guy, thanks a lot for your quickly answer. Like Matt says, I feel that is disconcerting while interesting at the same time, and that just makes me think that most times I´m far from reality with my results.

I found this site in internet http://blog.beast2.org/2014/07/14/path-sampling-with-a-gui/ I think that is from your colleague (Remco Bouckaert), is right? I´m trying to doing the analysis via GUI and compare the different models and at the same time I´m trying to modify the xml following the tutorial like you told me. Looks like the two ways are doing the same, do you agree?

But, I don’t know if I’m reasoning well o I’m confuse about this. When I want to compare the substitution models, for example HKY and GTR, and use Beauty to select the options, at the same time I have to choose a tree model and a clock model. Independently that there are the same for both, is difficult to think about the independence of this choice.

For example, if I choose HKY (yule process + random clock) and GTR (yule process + random clock) and compare with Bayes Factor with the approach path sampling

Suppose that the resulting model with the highest likelihood is GTR 

And then I going to compare the tree model,

Yule (GTR + random) vs. Birth-Death (GTR + random)

Suppose the result is B-D, 

And then I want to compare the clock model?

Strict (GTR + B-D) relax (GTR + B-D)

Is not circular the conclusion?

I read in the tutorial that if I choose random clock is like a test of the strict clock, I would be like one less comparison, is ok?

Thanks again for your support, 

Berenice

pd: You are helping me more than the people that has to guide me, do you like alfajores?

El martes, 29 de julio de 2014 15:10:20 UTC-3, Berenice escribió:

Guy Baele

unread,
Jul 31, 2014, 3:51:29 PM7/31/14
to beast...@googlegroups.com
Hi Berenice,

You have to be a bit careful here. My tutorial and the implementation it discusses was written for use in BEAST 1 (all the results from the papers mentioned at the bottom of the tutorial were obtained in BEAST 1).
The other tutorial you mention discusses how to set things up in BEAST 2.
Ideally, this shouldn't matter however as they should do the same thing. Just don't mix things up as this surely will not work.

Each implementation, when applied to a phylogenetic problem, should ask you to specify a tree model, clock model and substitution model (and possibly also a trait model if you are providing trait data).

A stepwise approach, like you suggest, is probably what most users would do given that these methods tend to take a while to run.
If you'd like more confirmation in terms of the model you end up with, you could always try different combinations of models.
This takes us back to how different choices for one type of model might influence the choice of another type of model.

I don't see how the reasoning is circular. You select one model each step of the way and stick with that model for the rest of your comparisons.
Adding a random local clock on top of a strict and relaxed clock will increase the number of comparisons.

Best regards,
Guy



Op donderdag 31 juli 2014 12:08:39 UTC-7 schreef Berenice:

Nick Matzke

unread,
Jul 31, 2014, 4:28:13 PM7/31/14
to beast...@googlegroups.com
Hi all,

It's possible I'm wrong (please point it out if so), but I think the reviewer's comment is a little bit weird. Since when has it been recommended procedure to do a full path-sampling analysis and Bayesian model choice just to select a substitution model for a BEAST analysis?

Presumably (again, correct me if I'm wrong!), Berenice just has the typical goal of estimating a dated phylogeny.  The substitution model is not of primary interest -- all that she really wants is substitution model(s) that Aren't Too Bad And Don't Screw Things Up. It seems to me that jModelTest is reasonably adequate for that goal, in that at least it will eliminate really inappropriate models (e.g., Jukes-Cantor), and give some indication of how complex a model the available data will support. 

If one is going to be totally, thoroughly Bayesian, there ought to be NO substitution model selection step at the beginning at all, Bayesian or otherwise, and instead the choice of the model is co-estimated along with the phylogeny and everything else during the inference.  This is great in theory, but in practice, with limited data, having models that are too complex can cause identifiability issues, lack of convergence, low ESS values, etc. If the models and priors are perfect, even small data shouldn't be a problem, but in real life, they usually won't be.

So, presumably jModeltest is a reasonable way to compromise between (a) just always picking the most complex model that contains all the others (e.g. GTG+G) and (b) fixing a model a priori using prior experience or gut feeling.

Typically, the bigger problem is that people run jModeltest on each gene/partition independently, and give BEAST these partitions, never testing the partitioning scheme itself, and end up with an over-partitioned dataset, where each partition doesn't have enough data to estimate some of the parameters, resulting in low ESS values etc. which then gets the reviewers yelling at them for that.  I would recommend Lanfear's PartitionFinder program to help deal with this.

(And, actually, googling it, I see Lanfear has basically the same opinions that I have here:


)

Cheers,
Nick





--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Alexei Drummond

unread,
Jul 31, 2014, 6:14:15 PM7/31/14
to beast...@googlegroups.com
Dear Nick,

I think you express the pragmatic view quite well. At some point we method developers will get smart enough to implement a fast way to automatically average over all substitution models, so that model selection is no longer an issue for parameters you don't care about. Wu, Suchard and Drummond (2013) is a first attempt in BEAST 2 at doing that...


But it is still a bit hard to set up (no BEAUti support). But we are working on it!

Cheers
Alexei

Christopher Blair

unread,
Jul 31, 2014, 7:27:28 PM7/31/14
to beast-users
HI all, 

Great comments. I agree that the reviewer is being difficult. We need to remember that these are models and they will all be incorrect to some extent. Our goal is (or should be) to select models that best explain our data. In my opinion jModeltest and PartitionFinder do a better job than randomly selecting a model from the clouds and calling it a day. 

What about the new reversible jump model selection in BEAST2 (https://code.google.com/p/rb-beast/)? I was under the impression that this was developed specifically to deal with the issue of model selection. 


I have not read the MBE paper yet so perhaps these new BEAST methods are described there.

Chris
Christopher Blair, Ph.D.
Assistant Professor
Department of Biological Sciences
New York City College of Technology
The City University of New York
300 Jay Street

Berenice

unread,
Aug 1, 2014, 11:29:46 PM8/1/14
to beast...@googlegroups.com
I want to thank everyone for the support, the debate was very enriching for me.


El martes, 29 de julio de 2014 15:10:20 UTC-3, Berenice escribió:

Berenice

unread,
Aug 16, 2014, 12:06:06 PM8/16/14
to beast...@googlegroups.com
Sorry to bother you again, but I wanted to ask you a question regarding the comparison of tree models (Yule and Birth-death).

Can you use Bayes factor to compare which is the best "prior", being that is supposedly the a priori information that you have, or simply are you checking the sensitivity of your data to the selected prior?

Best, 

Berenice





El martes, 29 de julio de 2014 15:10:20 UTC-3, Berenice escribió:

Berenice

unread,
Aug 16, 2014, 10:31:26 PM8/16/14
to beast...@googlegroups.com
 I was reading about bayes factor and I found this text related to Christopher Blair´s comments. 

" An alternative to running a full analysis on each model and then choosing among them using the estimated model likelihoods and Bayes’ factors is to let a single Bayesian analysis explore the models in a predefined model space (using reversible-jump MCMC). In this case, all parameter estimates will be based on an average across models, each model weighted according to its posterior probability. For instance, MrBayes 3 uses this approach to explore a range of common fixedrate matrices for amino acid data (Lemey et al., 2009)

The reversible-jump based model (Bouckaert et al. 2013) is a substitution model that jumps between models in a hierarchy of models, and is available through the RBS add-on as the RB substitution model. It can also automatically partition the alignment, but unlike subst-BMA assumes that partitions consist of consecutive sites. One can average over clock models (Li & Drummond 2012) in an MCMC analysis as well (beast v 2 Book).

Maybe is not of help, but just in case

Best,

Berenice



El martes, 29 de julio de 2014 15:10:20 UTC-3, Berenice escribió:

石勇

unread,
Sep 14, 2014, 10:42:49 PM9/14/14
to beast...@googlegroups.com
Hi, Berenice
    I have downloaded the RBS-addon and installed it in beauti. However, I can not found a manual to use it in detail.
    In the paper, I found "The RBS package contains a reversible-jump based substitution model for nucleotide data. With this substitution model there is no need to choose a substitution model for a given partition, since the RBS model samples the appropriate (mixture of) models given the sequence data. "
    Does that mean that I do not have to choose site model (leaving it as default?) and generate a xml file, and then load it in beast and run it?
    I feel very confused and looking forward for your reply.
Cheers,
Sandy

在 2014年8月17日星期日UTC+8上午10时31分26秒,Berenice写道:

Donavan Jackson

unread,
Feb 25, 2016, 2:13:16 PM2/25/16
to beast-users


On Sunday, September 14, 2014 at 8:42:49 PM UTC-6, sandy wrote:
Hi, Berenice
    I have downloaded the RBS-addon and installed it in beauti. However, I can not found a manual to use it in detail.
    In the paper, I found "The RBS package contains a reversible-jump based substitution model for nucleotide data. With this substitution model there is no need to choose a substitution model for a given partition, since the RBS model samples the appropriate (mixture of) models given the sequence data. "
    Does that mean that I do not have to choose site model (leaving it as default?) and generate a xml file, and then load it in beast and run it?
    I feel very confused and looking forward for your reply.
Cheers,
Sandy

Did you ever find the answer to your question Sandy? I am currently trying to set up a Beast run as well using RB and I am confused just as you were about using this package.

Remco Bouckaert

unread,
Feb 28, 2016, 1:48:02 PM2/28/16
to beast...@googlegroups.com
Hi Donovan,

The RBS package only averages over substitution models, not site models, so you still have to choose whether to have gamma rate heterogeneity and/or proportion invariant sites. 

The bModelTest package averages over site models (including substitution models), which means you do not need to choose whether to have rate heterogeneity and/or a proportion invariant sites — during the MCMC run, it will use reversible jump over these four states.

Cheers,

Remco





--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages