TIM substitution model

251 views
Skip to first unread message

Jo Rhodes

unread,
Oct 18, 2022, 1:42:24 PM10/18/22
to beast-users
Hi

I am using BEAST v2.6.3, and running bModelTest to select my substitution model. The chain spent the most time in TIM (model 23, 123421), which had 69.81% posterior support.

I have since implemented the TIM model and re-run BEAST, but the resulting output is biologically incorrect, leading me to believe I've implemented the TIM model incorrectly.

I have the SSM package installed, but realise this allows models from jModelTest, which may not line up with bModelTest suggested models. 

How do I implement the TIM model?

Thanks

Carlo Pacioni

unread,
Oct 19, 2022, 12:26:04 AM10/19/22
to beast...@googlegroups.com
Hi Jo,
the beauty of bModelTest is that you don't have to re-run the analysis with the model that has the most posterior support. That is reported FYI, but by using bModelTest you will be conducting a model average across all visited models so your analysis will also include the uncertainty around the sub-model to be used. This is arguably the most correct approach rather than picking one sub model.

Cheers,
carlo


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/5cb0ed78-8f56-4062-9d68-119393797899n%40googlegroups.com.

Jo Rhodes

unread,
Oct 19, 2022, 3:57:02 AM10/19/22
to beast-users
Hi Carlo,

I appreciate that I could use the output from bModelTest, I have done this before (especially when there's similar support for more than one model), but the output is still biologically incorrect (by 'incorrect', I mean the resulting tree is completely different to the tree of the input data, and the dates inferred are completely incorrect, too). I re-ran it to increase the chain length, but the output is still the same. I wanted to see if the TIM model was implemented incorrectly. I initially implemented it by using a GTR with all rates equal, but AG and CT estimate box unchecked.

Or is there something weird going on with my data? I'm completely thrown by way the phylogeny of the input data would be so different to the output though.

Jo

Carlo Pacioni

unread,
Oct 19, 2022, 7:25:27 AM10/19/22
to beast...@googlegroups.com
Hey Jo,
when you say that the output is incorrect, could you clarify whether you are referring to the phylogeny of the run where you used bModelTest? If you are referring to the one with TIM, I don't think there is any reason why you should look at it at all. That being said, I think that the way you set it is wrong. I believe that what you have is four rate parameters, one for each rate except for AG and CT which are fixed to the start value. The TIM model has four rate parameters, but one for both AC and GT, one for AT and CG, and then one for each of the other substitutions left. I am not sure whether you can set it up in beauty, I'd modify the XML file by renaming, say CG with the name of AT, and fixing to 1 AC and GT, but again, why bother?
If you have issues with the phylogeny in the run where you used bModelTest, I think you may have other problems (perhaps the calibrations if you use some since you mention the dates). 

It sounds like you used a starting tree rather than a random one because you say the phylogeny of the input data. Is it possible that simply your starting tree is not supported by your data?

Cheers,
carlo


Jo Rhodes

unread,
Oct 19, 2022, 1:28:00 PM10/19/22
to beast-users
Hi,

Sorry, I should've been more specific. So I removed homoplasies prior to running bModelTest, and checked the phylogeny to make sure it still looked ok (and that the removal of homoplasies hadn't been heavy handed!). I then ran Phylostems to check for a clock-like signal, and then ran bModelTest to check for the best substitution model. The resulting tree (after I've run tree annotator and removed burn-in etc) is completely different: isolates that were collected years ago are placed as most recent, and it seems to suggest some sort of split in the data. It sort of makes sense (if you ignore the dates), as isolates that you would expect to see together (i.e. they're from the same geographical location) are grouped together in the phylogeny. But the dates are messed up. 

So your suggestion that the starting tree isn't supported by the data does make sense; I just have no idea why 😂 perhaps this split in the data that I'm observing means the data needs to be split up and analysed separately.....

Thanks!

Jo

Carlo Pacioni

unread,
Oct 19, 2022, 6:41:55 PM10/19/22
to beast...@googlegroups.com
Hi Jo,
sampling should be random. I'm pretty sure that the problem you see is caused by the removal of homoplasies. My recommendation is to use all data you have, even if this means that the runs take longer to converge. To speed up the process you could consider running multiple analyses (long enough to pass the burnin) and combine them. If you really need to thin your data (because the analysis takes months to run) then you should do so randomly rather than selectively removing part of the data. What you have done artificially increases the diversity of your dataset.

Cheers,
carlo


Jo Rhodes

unread,
Oct 24, 2022, 1:09:52 PM10/24/22
to beast-users
Sorry for the delay, I've ended up with some sort of flu :(

Ok I'll give it a go. I guess I'm worried that homoplasies derived from homologous recombination could also artificially bias the temporal analysis?

Reply all
Reply to author
Forward
0 new messages