Low ESS for Likelihood

Bharadwaj Vemparala

unread,

Sep 7, 2023, 3:47:06 PM9/7/23

to beast-users

Hello BEAST community,

I have an alignment of around 900 protein-coding viral sequences (of a single gene) and each sequence has 1260 nucleotides. Using BEAST v1.10.4, although almost all parameters have converged very well, the 'joint' and 'likelihood' haven't converged, as shown:

It feels like there are two very nearby solutions.

Following is the configuration:

No. of runs: 50 million

Burn-in: 5 million

Requirement: Get the topology as accurately as possible and compare the topology predicted by this single gene with the topology predicted by the complete genome sequences.

Substitution model: GTR+I+G4

Base frequencies: Empirical

Partitioning: 2 partitions [(1+2), 3]

Clock: Strict clock

Tree prior: Coalescent - Constant size

Could someone please help me out?

HS

unread,

Sep 8, 2023, 4:14:54 AM9/8/23

to beast-users

Hello,

Before experts or developers will answer, I suggest changing two things. First, don't use I and G4 together. I would use GTR + G4 instead. As I understood G4 cares about invariant sites. Using I in addition to G4 may cause convergence problems. Second, use Estimated rather than Empirical for the Base frequencies. It also can cause convergence problems.

Best,

Hovhannes

Bharadwaj Vemparala

unread,

Sep 8, 2023, 2:19:22 PM9/8/23

to beast-users

Hello Hovhannes,

Thank you for the suggestions. I was choosing the model based on ModelFinder in IQTREE; attached is the screenshot below:

The best model, according to ModelFinder, was GTR+F+I+G4. I understand why you recommended not to use I along with G4, and I shall follow that.

But, F means that I need to use empirical base frequencies (Ref: Taming the Beast). Could you provide some source/reference for when (not) to use either of them? It would be of great help.

Thanks,

Bharadwaj

HS 8, సెప్టెంబర్ 2023, శుక్రవారంన 1:44:54 PM UTC+5:30కి దీన్ని రాశారు:

shanshan du

unread,

Sep 8, 2023, 2:19:22 PM9/8/23

to beast-users

Hello, I find that I often match this way, I use JMODEL with the best nucleotide model is GTR+I+G. What should I do? Looking forward to your reply~

HS

unread,

Sep 9, 2023, 6:30:16 AM9/9/23

to beast-users

Hi,

Of course, it is always good to find some research that would help to find the best modeling for your analyses. Nevertheless, nowadays, there are so many sources that it is easy to be misled sometimes. I have no idea what ModelFinder, IQTREE, and, as Shanshan says below, JMODEL do. I assume these are ML-based approaches. And they suggest the best model without extensive tree reconstruction. If so, and as I read in this group from more knowledgeable people, they can't suggest the best model for Bayesian analyses. That is why my approach is to learn the models from earlier studies. So, in the case of human mitochondrial DNA, I used HKY as it has been shown many times that the model fits mtDNA the best. Then, why use a more complex model?

Also, in this group, I read that GTR + I + G4 and Empirical can make a convergence problem. So, for Y chromosome analyses, I use GTR + G4. It has been shown many times that the model works well in the Y chr analyses. You may find the discussions about problems when using GTR + I + G4 and Empirical yourself. It is not an easy task :) Sorry.

By the way, you can use BModelTest, but it is only in Beast 2, to find the model for sites in the same analysis.

Best,

Hovhannes

Bharadwaj Vemparala

unread,

Sep 9, 2023, 2:22:37 PM9/9/23

to beast-users

Thank you very much, Hovhannes. It was of great help.

Regards,

Bharadwaj

HS 9, సెప్టెంబర్ 2023, శనివారంన 4:00:16 PM UTC+5:30కి దీన్ని రాశారు:

Artem B

unread,

Sep 10, 2023, 9:50:20 AM9/10/23

to beast-users

Hello Bharadwaj,

As for me, it is too much to use a GTR model with codon partitions, therefore it's hard to reach convergence with so many parameters. TreeLikelihood needs a site model as an input so it may be the reason for your problem. SRD06, in turn, has better convergence and pretty preciseness. In fact, I've never seen the Yang96 model in publications, and the common practice is comparing GTR without codon partitions vs SRD06 by marginal likelihood estimator, for example.

Best,
Artem.

воскресенье, 10 сентября 2023 г. в 02:22:37 UTC+8, Bharadwaj Vemparala:

Bharadwaj Vemparala

unread,

Sep 10, 2023, 3:34:20 PM9/10/23

to beast-users

Sure, thank you, Artem. I shall try it out.

Regards,

Bharadwaj

Artem B 10, సెప్టెంబర్ 2023, ఆదివారంన 7:20:20 PM UTC+5:30కి దీన్ని రాశారు:

shanshan du

unread,

Sep 10, 2023, 3:34:20 PM9/10/23

to beast-users

Sorry to bother you again. May I ask what your tree prior is? Have you changed the tmrca? thanks

Bharadwaj Vemparala

unread,

Sep 11, 2023, 2:53:59 AM9/11/23

to beast-users

Hello shanshan du,

I am using the UPGMA starting tree, and I do not change the tmrca.

Regards,

Bharadwaj

shanshan du 11, సెప్టెంబర్ 2023, సోమవారంన 1:04:20 AM UTC+5:30కి దీన్ని రాశారు:

Reply all

Reply to author

Forward