Low ESS for Likelihood

194 views
Skip to first unread message

Bharadwaj Vemparala

unread,
Sep 7, 2023, 3:47:06 PM9/7/23
to beast-users
Hello BEAST community,

I have an alignment of around 900 protein-coding viral sequences (of a single gene) and each sequence has 1260 nucleotides. Using BEAST v1.10.4, although almost all parameters have converged very well, the 'joint' and 'likelihood' haven't converged, as shown:

joint.png
likelihood.png

It feels like there are two very nearby solutions.

Following is the configuration:
No. of runs: 50 million
Burn-in: 5 million
Requirement: Get the topology as accurately as possible and compare the topology predicted by this single gene with the topology predicted by the complete genome sequences.
Substitution model: GTR+I+G4
Base frequencies: Empirical
Partitioning: 2 partitions [(1+2), 3]
Clock: Strict clock
Tree prior: Coalescent - Constant size

Could someone please help me out?

HS

unread,
Sep 8, 2023, 4:14:54 AM9/8/23
to beast-users
Hello,

Before experts or developers will answer, I suggest changing two things. First, don't use I and G4 together. I would use GTR + G4 instead. As I understood G4 cares about invariant sites. Using I in addition to G4 may cause convergence problems. Second, use Estimated rather than Empirical for the Base frequencies. It also can cause convergence problems.

Best,
Hovhannes

Bharadwaj Vemparala

unread,
Sep 8, 2023, 2:19:22 PM9/8/23
to beast-users
Hello Hovhannes,

Thank you for the suggestions. I was choosing the model based on ModelFinder in IQTREE; attached is the screenshot below:
modelTest.png
The best model, according to ModelFinder, was GTR+F+I+G4. I understand why you recommended not to use I along with G4, and I shall follow that.

But, F means that I need to use empirical base frequencies (Ref: Taming the Beast). Could you provide some source/reference for when (not) to use either of them? It would be of great help.

Thanks,
Bharadwaj

HS 8, సెప్టెంబర్ 2023, శుక్రవారంన 1:44:54 PM UTC+5:30కి దీన్ని రాశారు:

shanshan du

unread,
Sep 8, 2023, 2:19:22 PM9/8/23
to beast-users
Hello, I find that I often match this way, I use JMODEL with the best nucleotide model is GTR+I+G. What should I do? Looking forward to your reply~1694162326420.jpg

HS

unread,
Sep 9, 2023, 6:30:16 AM9/9/23
to beast-users
Hi,

Of course, it is always good to find some research that would help to find the best modeling for your analyses. Nevertheless, nowadays, there are so many sources that it is easy to be misled sometimes. I have no idea what ModelFinder, IQTREE, and, as Shanshan says below, JMODEL do. I assume these are ML-based approaches. And they suggest the best model without extensive tree reconstruction. If so, and as I read in this group from more knowledgeable people, they can't suggest the best model for Bayesian analyses. That is why my approach is to learn the models from earlier studies. So, in the case of human mitochondrial DNA, I used HKY as it has been shown many times that the model fits mtDNA the best. Then, why use a more complex model?

Also, in this group, I read that GTR + I + G4 and Empirical can make a convergence problem. So, for Y chromosome analyses, I use GTR + G4. It has been shown many times that the model works well in the Y chr analyses. You may find the discussions about problems when using GTR + I + G4 and Empirical yourself. It is not an easy task :) Sorry.

By the way, you can use BModelTest, but it is only in Beast 2, to find the model for sites in the same analysis.

Best,
Hovhannes

Bharadwaj Vemparala

unread,
Sep 9, 2023, 2:22:37 PM9/9/23
to beast-users
Thank you very much, Hovhannes. It was of great help.

Regards,
Bharadwaj

HS 9, సెప్టెంబర్ 2023, శనివారంన 4:00:16 PM UTC+5:30కి దీన్ని రాశారు:

Artem B

unread,
Sep 10, 2023, 9:50:20 AM9/10/23
to beast-users
Hello Bharadwaj,

As for me, it is too much to use a GTR model with codon partitions, therefore it's hard to reach convergence with so many parameters. TreeLikelihood needs a site model as an input so it may be the reason for your problem. SRD06, in turn, has better convergence and pretty preciseness. In fact, I've never seen the Yang96 model in publications, and the common practice is comparing GTR without codon partitions vs SRD06 by marginal likelihood estimator, for example.

Best,
Artem.

воскресенье, 10 сентября 2023 г. в 02:22:37 UTC+8, Bharadwaj Vemparala:

Bharadwaj Vemparala

unread,
Sep 10, 2023, 3:34:20 PM9/10/23
to beast-users
Sure, thank you, Artem. I shall try it out.

Regards,
Bharadwaj

Artem B 10, సెప్టెంబర్ 2023, ఆదివారంన 7:20:20 PM UTC+5:30కి దీన్ని రాశారు:

shanshan du

unread,
Sep 10, 2023, 3:34:20 PM9/10/23
to beast-users
1694315127750.jpg
Sorry to bother you again. May I ask what your tree prior is? Have you changed the tmrca? thanks1694315252703.jpg

Bharadwaj Vemparala

unread,
Sep 11, 2023, 2:53:59 AM9/11/23
to beast-users
Hello shanshan du,

I am using the UPGMA starting tree, and I do not change the tmrca.

Regards,
Bharadwaj

shanshan du 11, సెప్టెంబర్ 2023, సోమవారంన 1:04:20 AM UTC+5:30కి దీన్ని రాశారు:
Reply all
Reply to author
Forward
0 new messages