Independent BEAST runs not converging

278 views
Skip to first unread message

jithinj...@gmail.com

unread,
Aug 25, 2021, 2:52:52 PM8/25/21
to beast-users
Hello BEAST community,

I have a dataset with 69 terminals and 1989 characters (3 gene partitions). Ran 2 independent BEAST runs with two different initial seed parameters and with fossil calibrations:

Substitution models: GTR model on COI and 28S, HKY on HIS 
Tree Prior: birthdeath model
Chain length: 500,000,000 logged every 10,000 states
The trace files had ESS values for all parameters >200, however the two runs did not converge. (photo of trace file atta1.JPG2.JPG3.JPGched)

Can anyone please suggest any plausible solutions to obtain convergence?

Thanks in advance!

Jithin Johnson

Patrice Showers Corneli

unread,
Aug 25, 2021, 4:38:07 PM8/25/21
to beast-users
Maybe someone else has a better idea but 1989 characters with three partitions sounds to small to accommodate 69 terminal taxa. In other words the possible number of patterns necessary for a complex model that you must have  for 69 taxa is unlikely to appear in your data. You need more data. Your models requires estimating a total of 2x8 GTR parameters plus five for the HKY model. So right there is a complex model with 21 parameters to estimate as well as the many, many branch length parameters you must estimate. There could not possible enough different patterns in the data to estimate such a tree.

On Aug 25, 2021, at 7:33 AM, jithinj...@gmail.com <jithinj...@gmail.com> wrote:

Hello BEAST community,

I have a dataset with 69 terminals and 1989 characters (3 gene partitions). Ran 2 independent BEAST runs with two different initial seed parameters and with fossil calibrations:

Substitution models: GTR model on COI and 28S, HKY on HIS 
Tree Prior: birthdeath model
Chain length: 500,000,000 logged every 10,000 states
The trace files had ESS values for all parameters >200, however the two runs did not converge. (photo of trace file atta<1.JPG><2.JPG><3.JPG>ched)

Can anyone please suggest any plausible solutions to obtain convergence?

Thanks in advance!

Jithin Johnson

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/a78d626d-4ee0-4e71-bf34-0ebd5af41468n%40googlegroups.com.
<2.JPG><1.JPG><3.JPG>

Patrice Showers Corneli

unread,
Aug 25, 2021, 5:18:42 PM8/25/21
to beast-users
Also, The standard error of the mean of the two runs I way too big at >0.3 where is should be <0.01. See the box plot where the two means do not overlap with one another.

This is a good indication that your data set has too little phylogenetic information (too few parameter patterns) to resolve the tree.

Patrice Showers Corneli,(Retired), Phd. Biology, M.S. Mathematical Statistics.
Associate Research Professor
Department of Biology
University of Utah, 
Salt Lake City, UT 84112



On Aug 25, 2021, at 7:33 AM, jithinj...@gmail.com <jithinj...@gmail.com> wrote:

Hello BEAST community,

I have a dataset with 69 terminals and 1989 characters (3 gene partitions). Ran 2 independent BEAST runs with two different initial seed parameters and with fossil calibrations:

Substitution models: GTR model on COI and 28S, HKY on HIS 
Tree Prior: birthdeath model
Chain length: 500,000,000 logged every 10,000 states
The trace files had ESS values for all parameters >200, however the two runs did not converge. (photo of trace file atta<1.JPG><2.JPG><3.JPG>ched)

Can anyone please suggest any plausible solutions to obtain convergence?

Thanks in advance!

Jithin Johnson

jithinj...@gmail.com

unread,
Aug 26, 2021, 3:55:32 AM8/26/21
to beast-users
Thanks for your prompt reply.  So I understand that I might have to either reduce the complexity of the models that I use or reduce the number of taxa or increase the number of gene partitions (which is not possible in this case) ? Any other troubleshooting guidelines to tackle this convergence issue?

Remco Bouckaert

unread,
Aug 26, 2021, 4:00:13 AM8/26/21
to beast...@googlegroups.com
Hi Jithin,

Apart from trying simpler models, you might consider using MC3 instead of MCMC (https://github.com/nicfel/CoupledMCMC) or nested sampling (https://github.com/BEAST2-Dev/nested-sampling/wiki) to improve convergence.

Cheers,
Remco


On 26/08/2021, at 1:33 AM, jithinj...@gmail.com <jithinj...@gmail.com> wrote:

Hello BEAST community,

I have a dataset with 69 terminals and 1989 characters (3 gene partitions). Ran 2 independent BEAST runs with two different initial seed parameters and with fossil calibrations:

Substitution models: GTR model on COI and 28S, HKY on HIS 
Tree Prior: birthdeath model
Chain length: 500,000,000 logged every 10,000 states
The trace files had ESS values for all parameters >200, however the two runs did not converge. (photo of trace file atta<1.JPG><2.JPG><3.JPG>ched)

Can anyone please suggest any plausible solutions to obtain convergence?

Thanks in advance!

Jithin Johnson

Patrice Showers Corneli

unread,
Aug 26, 2021, 5:23:47 AM8/26/21
to beast...@googlegroups.com
You might try a simpler model - like a Kimura  2-parameter just to see if there is any phylogenetic information  at all in your data. However a too simple model is not ideal because the analysis may be biased. 
I assume you have used some kind K
Sent from my iPhone

On Aug 26, 2021, at 2:00 AM, RemcoK huge  Bouckaert <higg...@gmail.com> wrote:
 
Hi Jithin,
jn

jithinj...@gmail.com

unread,
Aug 26, 2021, 2:21:38 PM8/26/21
to beast-users
Thanks for the suggestion.
I will try to run with the coupled MCMC and also try a simpler substitution model and get back to you soon.

Patrice Showers Corneli

unread,
Aug 26, 2021, 3:14:50 PM8/26/21
to beast-users, jithinj...@gmail.com
I sent the last reply before asking if you had used any software to determine the best model? If not you should.becuse any model that is too simple could give you the wrong tree.


Also do you know how many informative sites the sequences have across all sequences?

Patrice



jithinj...@gmail.com

unread,
Aug 27, 2021, 2:52:57 PM8/27/21
to beast-users
Hello Patice,

1. I selected the model for the partitions as specified by JModeltest: GTR+G+I for both COI and 28S, and HKY+G for HIS respectively
2. I understand that I have 1073 parsimony-informative sites (1262 variable sites) across the 1989 nucleotides (3 genes)

Jithin

jithinj...@gmail.com

unread,
Aug 31, 2021, 4:50:46 PM8/31/21
to beast-users
Another query: In those standard BEAST runs where I get convergence, I get high ESS values except those for the genetreelikelihoods (3 genetreelikelihoods in my case). Can someone explain why only this prior gets low ESS values and can I still use the result (if the combined ESS values turn out to be >200 but not those for the independent runs in case where the two runs are converging) ?

jithinj...@gmail.com

unread,
Aug 31, 2021, 4:50:46 PM8/31/21
to beast-users
Hello Remco and Patrice,

I tried both simplifying the model to HKY and set the substitution frequencies as "estimated" as well as ran a coupled MCMC analysis in Beast 2.6.3 but both of them turned out to be unsuccessful.
1. Simplifying the models (without coupled MCMC) gave the same convergence problem (with high ESS values)
2. I was using Beast 1.8.4 for all my previous runs and so I am not sure if I set up the right parameters for the coupled MCMC in Beast 2.6.3. I have confusion regarding the parameters to set in the "Parallel tempering" tab: Everything was left to the defaults except: chain length: 500 million, log every 10000 in tracelog, screenlog and treelog. The number of chains were 2.

Problem: The runs were kind-of converging but ESS values were too low (<100) and the swapProbability was  0.157 (which I presume should be between 0.25 and 0.6?)
Is there something specific that I need to be careful about while setting up the "Parallel tempering", for e.g. delta temperature or the target acceptance probability.

Your suggestions would be really helpful!

Jithin



On Friday, 27 August 2021 at 20:52:57 UTC+2 jithinj...@gmail.com wrote:

Patrice Showers Corneli

unread,
Sep 1, 2021, 10:00:07 AM9/1/21
to beast-users
Do you have the ML parameters optimized for each gene? Are the genes partial or complete sequences. I am guessing that they are partial since COI should have on the order of 1000bp itself.  If so then I am guessing - once again - that the individual gene are too short or, in some other way lack phylogenetic signal (variable sites that have at least two taxa with the same nucleotide) or conflicting (noisy) sites.

If you would like to send the alignments, I could look into that and then tell you how to examine your data to determine the phylogenetic utltily for 60 taxa.

Patrice


Reply all
Reply to author
Forward
0 new messages