StarBeast2 convergence problems (possibly not depending on generations)

58 views
Skip to first unread message

Paolo Caputo

unread,
Jan 5, 2022, 12:50:18 PM1/5/22
to beast-users

Hello everybody.

I will freely confess that I am just beginning with Starbeast2 and therefore I am somehow confused (to say the least).

I have a small dataset with 25 terminals representing 9 taxa in a small angiosperm genus, 2 repeated nuclear sequences and 6 chloroplast sequences, for a total of little more than 4000 nucleotides (some missing sequences substituted with N’s and this, if kept to a minimum as in my case, seems to be allowed). The signal, which is overall not strong, is incongruent across genes. For this reason, after the usual concatenation approach, which does not seem to fit my case (but gave me perfect convergence with ESS >>200 in all parameters) I decided for a species-tree approach with Starbeast2, hoping that incongruence is ILS-related.

As I did not get convergence of likelihood and gene trees (I got up to one billion chains, often resulting in positive likelihoods in the second half billion of chains), I simplified my model and now I have:

two partitions (as per genome), two simple unlinked site models (HKY + I and F81 – the latter by unchecking GRT rate operators - , empirical frequencies, rates of the first partition estimated), two unlinked strict clocks (rate of the second estimated), two unlinked gene trees. No prior is improper (and proper priors have upper and lower bounds – which are not “hit” by the estimates). Yule prior, no calibration. Taxa and gene names are different. No singletons.

As I am only interested in phylogeny, I am integrating population size. Speciation rate, whose prior was formerly in the hundreds or thousands, was brought under a tighter rein with an exponential prior.

I still get no convergence and poor mixing on likelihood and treelikelihoods (all other traces have ESS << 200, some with weird “spikes” or “randomly distributed very long hairs” on caterpillars). A run presently in progress dropped 50 ESS units for likelihood at 250 million generations a few minutes ago. Probably even Starbeast2 should not require so many generations to converge for such a small, simply modeled dataset.

Am I making some very stupid mistake, are topologies too different to find any sensible signal, is my taxon set completely off the wall or what?

I would be very grateful of some feedback.

Cheers

Paolo

 

 

 

 

Remco Bouckaert

unread,
Jan 11, 2022, 4:34:37 PM1/11/22
to beast...@googlegroups.com
Hi Paolo,

You mentioned observing positive likelihoods. Priors can become positive, since they are not always normalised, and thus the posterior can become positive. However, the (log) likelihoods are normalised and should never get positive. If the likelihood is positive, this indicates numerical instability most of the time: substitution model parameters can get extreme values, or the tree height times clock rate is extremely large. Adjusting priors on these parameters can reduce this problem.

Have a look at there trace log and see whether there are any parameters that get extreme values (between 0 and say 1e-10, or larger than 1e10) and check the prior on these parameters.

Hope this helps,

Remco

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/2f77089b-b2a5-4aec-b670-667c2eee0393n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages