Two questions on BEAST2 run: model comparison and eBSP

352 views
Skip to first unread message

Taehyung Kwon

unread,
Aug 21, 2018, 1:26:01 AM8/21/18
to beast-users
Dear all

I was running BEAST2 for multiple runs, and I got converged results for clock rates, divergence time, and most of tree topologies.
Since I wasn't 100% sure about my molecular clock/population model, I used combination of following models

3 molecular clock- strict clock, relaxed log normal, and relaxed exponential
2 population model- coalescent constant population and exponential growth

Even though their mean substitution rates and divergence time were similar, I wanted to check which combination gives me the best result.
So I used Tracer 1.6 to perform model comparison using BF, difference of harmonic means of bootstrapped likelihood between those models, according to http://www.beast2.org/model-comparison.
Here are the questions I had.

1) Some models using strict clock has higher ESS values for posterior (>>200), and relaxed clock models hasn't (20~130). But in model comparison using Tracer, one of the relaxed clock models have higher BF > 60~100 than strict clock model.
In this case, do I choose model based on logBF? or do I suppose to remove models with low posterior ESS ?

2) I would like to perform Bayesian Skyline analyses. But what exactly are differences between eBSP and BEAST runs with other coalescent population models?
In my opinion, eBSP mainly tracks on effective population size through time, so it may just follow where MCMC goes. Other models focus on estimating substitution rates and divergence time, according to some fixed population dynamics. Is it right?

Any kinds of answers will be appreciated. 
Thanks.

Best regards,

Taehyung

Remco Bouckaert

unread,
Aug 21, 2018, 4:35:39 PM8/21/18
to beast...@googlegroups.com
Hi Taehyung,

Unfortunately, the page you link to has information that is out of data — using the harmonic mean to compare models is strongly discouraged, since the estimate has infinite variance and is generally not reliable. I updated the web page.

If the relaxed clock model has much better likelihoods as observed in Tracer, you probably want to use that — however, you still want to confirm this by proper model comparison. Also, with ESSs in the 20s you need to run the analysis a lot longer to get be confident the analysis can be relied upon (http://www.beast2.org/increasing-esss/index.html).

You are right about the Bayesian skyline plot: it is a non-parametric tree prior, so it influences the topology and timing of the tree, but if you have enough data, the tree will be more influenced by the site and clock models through the tree likelihood. Note that it is a balance between prior and likelihood that determines the posterior.

Cheers,

Remco

On 21/08/2018, at 5:26 PM, Taehyung Kwon <ted...@gmail.com> wrote:

Dear all

I was running BEAST2 for multiple runs, and I got converged results for clock rates, divergence time, and most of tree topologies.
Since I wasn't 100% sure about my molecular clock/population model, I used combination of following models

3 molecular clock- strict clock, relaxed log normal, and relaxed exponential
2 population model- coalescent constant population and exponential growth

Even though their mean substitution rates and divergence time were similar, I wanted to check which combination gives me the best result.
So I used Tracer 1.6 to perform model comparison using BF, difference of harmonic means of bootstrapped likelihood between those models, according tohttp://www.beast2.org/model-comparison.
Here are the questions I had. 

1) Some models using strict clock has higher ESS values for posterior (>>200), and relaxed clock models hasn't (20~130). But in model comparison using Tracer, one of the relaxed clock models have higher BF > 60~100 than strict clock model.
In this case, do I choose model based on logBF? or do I suppose to remove models with low posterior ESS ?

2) I would like to perform Bayesian Skyline analyses. But what exactly are differences between eBSP and BEAST runs with other coalescent population models?
In my opinion, eBSP mainly tracks on effective population size through time, so it may just follow where MCMC goes. Other models focus on estimating substitution rates and divergence time, according to some fixed population dynamics. Is it right?

Any kinds of answers will be appreciated. 
Thanks.

Best regards,

Taehyung

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Taehyung Kwon

unread,
Aug 22, 2018, 1:37:53 AM8/22/18
to beast-users
Dear Remco

Thanks for taking your time. 
It helped me a lot.
I mainly understand what I have to do, but if you don't mind, I wanna make it sure by a couple of further questions.

Further Q1) So, I checked other posts from the mailing list, and found some posts supporting that PS/SS method is currently the most suitable way to select the best model. Is this right?

Further Q2) What I was wondering is that when I reconstruct evolutionary history, do I suppose to choose either eBSP method or other coalescent methods? or can I use coalescent methods for divergence dating and eBSP for ePopsize tracking?
Because it seems a bit weird for me if I choose certain clock/tree model (such as coalescent constant pop model) to estimate divergence time and then also use eBSP to estimate effective population size changes.

Again, thanks for your kindness. I really appreciate it.

Best regards,

Taehyung

Remco Bouckaert

unread,
Aug 22, 2018, 11:21:56 PM8/22/18
to beast...@googlegroups.com
Hi Taehyung,

Further Q1) So, I checked other posts from the mailing list, and found some posts supporting that PS/SS method is currently the most suitable way to select the best model. Is this right?

This was true before nested sampling was available (https://github.com/BEAST2-Dev/nested-sampling) — you can use that as an alternative to PS/SS. The paper came out in June (https://academic.oup.com/sysbio/advance-article-abstract/doi/10.1093/sysbio/syy050/5046926?redirectedFrom=fulltext)

Further Q2) What I was wondering is that when I reconstruct evolutionary history, do I suppose to choose either eBSP method or other coalescent methods? or can I use coalescent methods for divergence dating and eBSP for ePopsize tracking?
Because it seems a bit weird for me if I choose certain clock/tree model (such as coalescent constant pop model) to estimate divergence time and then also use eBSP to estimate effective population size changes.

You probably want to compare methods using a model comparison method, and use the best method for both the evolutionary history and the population history. Regardless, it is always a good idea to see how much they differ — if the trees and population history change a lot, while the methods have similar marginal likelihoods you know that your analysis is sensitive to the prior, so any conclusion you draw from the analysis should take this in account. If both models produce very similar results, you can be more confident in your conclusions.

Cheers,

Remco

Taehyung Kwon

unread,
Aug 22, 2018, 11:31:06 PM8/22/18
to beast...@googlegroups.com
That really clarified my doubts.
Thank you so much. I appreciate it.

Sincerely,

Taehyung
Reply all
Reply to author
Forward
0 new messages