BEAST - treelikelihood is not converging

3,052 views
Skip to first unread message

Asutu

unread,
Jan 27, 2014, 8:43:22 AM1/27/14
to beast...@googlegroups.com
Dear BEAST users,

I am running BEAST on 13 partitions (~400 000 sites) using a model of constant size (coalescence) with strict clock. The 13 partitions resulted from a large set of 320 initial loci that were run on partitionfinder, and then concatenated based on these results. Substitution models are unlinked, and clock and tree models are linked.

After running the analysis for 7 independent times with 25000000 MCMC sampling every 1000 gen, the results were combined in logcombiner discarding the first 5000001 states as burn-in. All parameter estimates show high ESS values, both for posterior, likelihood, coalescent and all the others, however the treelikelihood for most of the partitions is still below 100, actually near 50 for most of them. Why is this happening? Is treelikelihood for the partitions a such important parameter that I should run BEAST even more until convergence (I don't think it will get to converge anyway...)? Since I am analyzing a great number of sites and know apriori the presence of several admixed individuals in the data (expecting that the sampling of the trees will be harder) can I use this analysis for relative divergence dating?

Any help would be much appreciated. All the best.
Asutu

Santiago Sánchez

unread,
Jan 27, 2014, 10:17:07 AM1/27/14
to beast...@googlegroups.com
Hi Asutu,

This generaly means that each of your partitions is converging at different likelihoods in each of the runs, possibly at different tree topologies and/or branch lengths. I don't think that adding more runs/chains will solve the issue. Nor letting them run for longer. I'm not sure but maybe operator tunning might help.

Cheers,
Santiago
--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.


--
Santiago Sánchez-Ramírez
Department of Ecology and Evolutionary Biology, University of Toronto
Department of Natural History (Mycology), Royal Ontario Museum
100 Queen's Park
Toronto, ON
M5S 2C6
Canada

Asutu

unread,
Jan 27, 2014, 11:14:54 AM1/27/14
to beast...@googlegroups.com
Great, many thanks for the advice. Do you have any suggestions on how to tune these operators. I do really not know how to achieve this.

Best,
Pedro


On Monday, January 27, 2014 3:17:07 PM UTC, santiago wrote:
Hi Asutu,

This generaly means that each of your partitions is converging at different likelihoods in each of the runs, possibly at different tree topologies and/or branch lengths. I don't think that adding more runs/chains will solve the issue. Nor letting them run for longer. I'm not sure but maybe operator tunning might help.

Cheers,
Santiago

On Monday, January 27, 2014, Asutu <p.alme...@gmail.com> wrote:
Dear BEAST users,

I am running BEAST on 13 partitions (~400 000 sites) using a model of constant size (coalescence) with strict clock. The 13 partitions resulted from a large set of 320 initial loci that were run on partitionfinder, and then concatenated based on these results. Substitution models are unlinked, and clock and tree models are linked.

After running the analysis for 7 independent times with 25000000 MCMC sampling every 1000 gen, the results were combined in logcombiner discarding the first 5000001 states as burn-in. All parameter estimates show high ESS values, both for posterior, likelihood, coalescent and all the others, however the treelikelihood for most of the partitions is still below 100, actually near 50 for most of them. Why is this happening? Is treelikelihood for the partitions a such important parameter that I should run BEAST even more until convergence (I don't think it will get to converge anyway...)? Since I am analyzing a great number of sites and know apriori the presence of several admixed individuals in the data (expecting that the sampling of the trees will be harder) can I use this analysis for relative divergence dating?

Any help would be much appreciated. All the best.
Asutu

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users+unsubscribe@googlegroups.com.

To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.

Natalia

unread,
Jan 28, 2014, 6:15:03 PM1/28/14
to beast...@googlegroups.com
Hello Pedro,
Before combining the logs from your run, you should open them all in Tracer and look at the traces. You should combine only runs (and portions of runs) that have converged. In Tracer you can also adjust the burn in for each run. It could be that one of them reached stationarity quickly and others were slower and require longer burn in. Combining runs that have not converged will result in low ESS values. This could solve your problem, but most probably something else is going on too.
You say that your tree likelihood have very low ESS while all others have converged. The likelihood for each partition is from all the parameters for that partition, so it is not possible that all of them have converged and the tree likelihood hasn't. I suggest that you look closely at the traces of all the parameters to see if some of them haven't converged (again, before combining the logs). The only parameter that participates in the tree likelihood and is not visible in Tracer is the tree topology. To verify that the topology has converged across your runs, you can use AWTY (are we there yet). 
Another reason why still things don't converge could be that your models are too complex for some of the partitions. Are you using a lot of GTRs? Do they have estimated rates close to zero with strange looking traces and low ESS values? You can try simplifying these models to TrN or HKY and see if you get better results.
best,
Natalia

Asutu

unread,
Jan 28, 2014, 7:51:38 PM1/28/14
to beast...@googlegroups.com
Hello Natalia,

thank you so much for your helpful suggestions. I have checked the first three runs both individually and combined using logcombiner (after discarding 5000000 initial steps which ensured convergence on each single run) and the problem still remains. All individual log files have posterior, prior, likelihood, treeModel.rootHeight, constant.popSize and coalescent well above 1000 ESS (4000 for the combine data). Only two partitions have GTR + G models with good mixing also (above 1000 ESS) for all parameters. Other partitions have TrN+G and HKY+G but again with good mixing. Only the treelikelihood for partitions have low ESS values which reflect the poor convergence since I can see that there are several jumps/steps in these traces. I am guessing that this is an intrinsic characteristic of the data and that I have nothing left to do but run run run the chains longer... I am now trying with 100000000 generations, two runs. But in case that ESS values for treelikelihood still remain below 100 (I am not very confident on the contrary...) are the results invalid? How does the treelikelihood for each partition influences the final tree topology and dating?

Sincerely,
Pedro

Natalia Chousou-Polydouri

unread,
Jan 28, 2014, 8:14:23 PM1/28/14
to beast...@googlegroups.com
Hello Pedro,
My guess from your description is that the tree topology hasn't converged across all your runs. Since your model parameters have good mixing and are converging, topology could be the one causing problems. My understanding is that you can have two different topologies and the same total likelihood. Or there could be a trade off between partitions: one partition has low likelihood, the other high and then they "swap" places, while the overall likelihood remains the same. In this case (2 or more highly supported topologies), the result is valid in the sense that they are both highly supported, but conventional methods of combining them and making consensus trees are not the best way to interpret them (most probably the resulting tree will be of low support). 
An indication of two distinct topologies of equal likelihood is traces that oscillate between two positions for some of the partitions.
I suggest that you give AWTY a try and check which of your runs are converging on the same topology or topologies. Then you can combine only the ones with common topology and discuss the implications of all topologies for your taxa/ characters.
best,
Natalia


--
You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/WKZwqBr7R6E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beast-users...@googlegroups.com.

Julian W Tang

unread,
May 25, 2014, 10:55:38 AM5/25/14
to beast...@googlegroups.com
Hi Pedro, Natalia,

I've just been reading this thread as I have been having the same problem with this HIV subtype C pol sequences, run with a typical virus set-up (lognormal uncorrelated clock, SRD06 site susbtitution model with the time-aware GMRF Bayesian Skyline tree priors)

I've been checking the.log files during this 50 million chain run, and noticed that the likelihood and posterior parameters are are not converging at all - though most of the other parameters are.

However, lower down in the Tracer window, it looks like the codon position parameters are also clearly not converging, so maybe this might be giving a clue as to what is going on?

This is an immigrant subtype C HIV population and maybe the origins of the different viruses are too diverse to be suitable for this BEAST analysis?

Any advice would be helpful ;)

Many thanks,

Julian




Date: Tue, 28 Jan 2014 17:14:23 -0800
Subject: Re: BEAST - treelikelihood is not converging
From: pagou...@gmail.com
To: beast...@googlegroups.com
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
Beast likelihood not converging query.pptx

Alexei Drummond

unread,
May 25, 2014, 9:15:51 PM5/25/14
to beast...@googlegroups.com
Dear Julian,

A screen shot of the trace for the two offending codon position parameters would be helpful, and each of their joint-marginals with the likelihood. Do they correlate with the likelihood in an obvious way? 
The odd thing is that the mean mutation rate across the two codon position partitions doesn't seem to be 1.0 (it seems to be 2.0). If they were relative rates I would expect the mean to be 1.0. Is there a good reason for it being 2.0? It is hard to tell what the precise problem is with the mixing without seeing the XML and log files.

Cheers
Alexei

For more options, visit https://groups.google.com/d/optout.
<Beast likelihood not converging query.pptx>

Julian W Tang

unread,
May 25, 2014, 10:10:09 PM5/25/14
to beast...@googlegroups.com
Thanks Alexei - yes, this was one of the default settings I was trying.

I've attached some extra slides showing the initial value, as well as the Beauti and Beast files.

It's difficult to know how to set these initial values for different pathogens, admittedly, I'm also not sure how different the ucld.mean values would be between different HIV subtypes - and how to set these initial values if there are no accurate published estimates for these.

Julian




Subject: Re: BEAST - treelikelihood is not converging
Date: Mon, 26 May 2014 13:15:42 +1200
To: beast...@googlegroups.com
Beast likelihood not converging query 2.pptx
HIV_GART_2007-2013_subtypeC_GMRF_skyride_Beauti
HIV_GART_2007-2013_subtypeC_GMRF_skyride.xml

Alexei Drummond

unread,
May 26, 2014, 12:09:15 AM5/26/14
to beast...@googlegroups.com
Dear Julian,

This will not solve you mixing problem, but I believe you should be setting the mu parameters to an initial value of 1.0, otherwise your estimates of ucld.mean and meanRate will be incorrect by a factor of 2. This is because the operator on the mu parameters maintains their mean (i.e. there are two parameters but only one degree of freedom), so the initial values can not be set to arbitrary values. The mu parameters are relative rates and their mean must be 1.0, or else the absolute rate (meanRate) will not be reported in the correct units. 

If this was the default setting for the initial values then I think this is a (rather serious) bug. What version of BEAST are you using?

Cheers
Alexei

<Beast likelihood not converging query 2.pptx><HIV_GART_2007-2013_subtypeC_GMRF_skyride_Beauti><HIV_GART_2007-2013_subtypeC_GMRF_skyride.xml>

Julian W Tang

unread,
May 26, 2014, 2:46:42 AM5/26/14
to beast...@googlegroups.com
Hi Alexei,

I used v1.7.5, but what I meant was that I set these parameters myself - the .mu parameters are ones that Beast prompts you to set yourself.

Can you advise on which settings I should use for the various HIV subtypes for these parameters then? Exponential, lognormal or uniform, etc. initial, mean and stdev, etc.? And how do we know these are appropriate for this virus? Also, are the default values for the other prior and operator parameters appropriate for this pathogen - how can we know or check this?

Setting these parameters appropriately for different pathogens seems difficult as they are not always easily found in the literature - and the terminology may differ between what was published prior to the development of Beast.

Thanks for your help with this.

Julian




From: alexei....@gmail.com
Subject: Re: BEAST - treelikelihood is not converging
Date: Mon, 26 May 2014 16:09:05 +1200

Alexei Drummond

unread,
May 26, 2014, 3:00:35 AM5/26/14
to beast...@googlegroups.com
The initial values for relative rates of partitions (e.g. *.mu parameters) should be 1.0 when you are estimating the absolute rate with a separate parameter (e.g. ucld.mean).

Alexei

Sent from my iPhone

Julian W Tang

unread,
May 26, 2014, 9:54:42 AM5/26/14
to beast...@googlegroups.com
OK will reset these - thanks again Alexei ;)

Julian




Subject: Re: BEAST - treelikelihood is not converging
From: alexei....@gmail.com
Date: Mon, 26 May 2014 19:00:22 +1200
To: beast...@googlegroups.com

Andrew Rambaut

unread,
May 26, 2014, 1:39:33 PM5/26/14
to beast...@googlegroups.com
Hi Julian,

I think you are confusing the initial values for a parameter (which, as Alexei says, should be 1 or have a mean of 1 in the case of the relative rates) with the operator size values (which determines how big a change to a parameter the MCMC will make). The latter are optimized automatically but at the end of the run, BEAST may report some suggestions that may help improve the mixing. With the optimization, it will rarely be worth considering these but may be an indicator that that operator is not mixing very well. In later versions (v1.8) these are not reported as suggestions but just info to try and avoid the confusion.

Andrew
signature.asc

Julian W Tang

unread,
May 26, 2014, 3:03:59 PM5/26/14
to beast...@googlegroups.com
Thanks Andrew - I will use Beast v1.8 now on ;)

Julian


From: ram...@gmail.com

Subject: Re: BEAST - treelikelihood is not converging
Date: Mon, 26 May 2014 18:09:51 +0100
To: beast...@googlegroups.com

Julian W Tang

unread,
May 27, 2014, 2:45:36 AM5/27/14
to beast...@googlegroups.com
Hi Andrew,

I've attached a slide to show you what I am seeing.

Pretty much all the sequence data sets (HIV and influenza) I have been trying with the more recent versions of Beast (v1.7 upwards) seem to give me these queries for the setting of these 3 specific priors: CP1+2.mu, CP3.mu and ucld.mean - the default settings (presumably based on my imported data) seem to be 'improper'.

I can find a reasonable value for ucld.mean from the literature usually, but I'm not sure how to set the .mu parameters to what is a reasonable value. Given Alexei's earlier answer, I can set the initial values as 1.0 for now, but again, I'm not sure what would be the upper and lower bounds would be, though I guess a lower bound of 0 and an upper bound of 2.0 might be OK?

And if the CP1+2.kappa, CP3.kappa parameters above are given a lognormal distribution, would it be reasonable to use this same distribution for CP1+2.mu, CP3.mu parameters, also?

Sorry about the nitty-gritty details, but it helps a lot for me to try to understand the rationale for these parameter settings better.

Thanks,

Julian




From: jwta...@hotmail.com
To: beast...@googlegroups.com
Subject: RE: BEAST - treelikelihood is not converging
Date: Mon, 26 May 2014 19:03:53 +0000
Beast_relative rate parameter settings.pptx

Alexei Drummond

unread,
May 27, 2014, 2:51:50 AM5/27/14
to beast...@googlegroups.com
Hey Julian,

Set the mu parameters to 1.0 for initial value. Because the only operator on these parameters is a deltaExchange operator (double check that in XML, this is a crucial assumption) there is a constraint on the mean and you can safely leave the prior as Uniform from 0 to infinity. Upper bound of 2 would also be fine, but will not change the result.

Alexei

<Beast_relative rate parameter settings.pptx>

Julian W Tang

unread,
May 27, 2014, 9:30:18 AM5/27/14
to beast...@googlegroups.com
Great - will try these settings.

Thanks again, Alexei ;)

Julian




From: alexei....@gmail.com
Subject: Re: BEAST - treelikelihood is not converging
Date: Tue, 27 May 2014 18:51:38 +1200
To: beast...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages