Substitution models more general than GTR

547 views
Skip to first unread message

Marc Suchard

unread,
Sep 2, 2008, 12:51:41 PM9/2/08
to beast-users

Anyone interested in playing with nucleotide substitution models more
general than GTR; i.e., possibly irreversible or with complex
eigenvalues? I've recently added new algorithms in Beast to draw
inference from these substitution models. They could lead to a quick
substitution model selection paper.

best, Marc


Graham Jones

unread,
Sep 3, 2008, 5:18:45 AM9/3/08
to msuc...@gmail.com, beast-users
Marc,

I'm interested, but I am a mathematician/programmer, not a biologist, and I
guess you are looking for the latter?

It seems surprising to me that this hasn't been done before. With a Bayesian
approach and rooted trees there seems no reason not to. I have some ideas
for projects...

A particular data set that I think would be interesting is the primate
alignment from [1]. Because this is non-coding, the mutations should be
neutral, so they should not care which DNA strand they are on. (I'd like
confirmation or correction of that, but anyway it seems a good approximation
to try.) So the rates of fixed mutations should reflect the rates of new
base-pair substitutions, and so you should have a rate matrix with only 6
parameters (when unnormalised) although it is not in general time
reversible.

The artifical data from [2] is another possiblility. In particular he has
some empirical figures for new (not fixed) mutation rates (p794-5).
These do not give a time reversible matrix.

I also have some ideas on how to design a prior for rate matrices with a
'bias' towards simple models. I'll say more if you're interested.

In [3] they use a non time reversible rate matrix to root unrooted trees.
Surely it is possible to improve on that?

On the implementation...

I hope you read "Nineteen Dubious Ways to Compute the Exponential of a
Matrix, Twenty-Five Years Later" before coding ;-)

In [4] it says "In practice, numerical evaluation of matrix exponential
1 is a major computational bottleneck hindering maximum likelihood
estimation (Yang 1994)." Really?


Regards

Graham


1. Drummond AJ, Ho SYW, Phillips MJ & Rambaut A (2006) Relaxed Phylogenetics
and Dating with Confidence. PLoS Biology 4, e88


2. Comparison of the Accuracies of Several Phylogenetic Methods
Using Protein and DNA Sequences
Barry G. Hall
Mol. Biol. Evol. 22(3):792-802. 2005


3. Rooting a phylogenetic tree with nonreversible substitution models
Von Bing Yap and Terry Speed
BMC Evolutionary Biology 2005, 5:2 doi:10.1186/1471-2148-5-2

4. Computational Advances in Maximum
Likelihood Methods for Molecular Phylogeny
Eric E. Schadt,1 Janet S. Sinsheimer,2 and Kenneth Lange3,4

Roxana Capper

unread,
Sep 21, 2014, 9:40:49 PM9/21/14
to beast...@googlegroups.com
Just curious if these more general models were ever published or incorporated into Beast v1 or v2?  I'm very interested in using a non-reversible substitution model with my data.

Thanks!
Roxana

Julian W Tang

unread,
Sep 22, 2014, 2:46:52 AM9/22/14
to beast...@googlegroups.com
Marc, I'm having trouble getting a large dataset of HIV subtype B sequences to converge using the SRD06 model in BEAST.

If you think these more general models are still applicable to HIV and other viruses, I'm happy to give them a try - but you need to send me a link to the BEAST version that incorporates them so I can try this.

But at the moment I am running 200 million chains, which takes about 2-3 weeks though (815 seqs x 1017 bp, spanning 2007-2013 years).

Julian



Date: Sun, 21 Sep 2014 18:40:48 -0700
From: roxana...@gmail.com
To: beast...@googlegroups.com
Subject: Re: Substitution models more general than GTR
--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Andrew Rambaut

unread,
Sep 22, 2014, 4:08:11 AM9/22/14
to beast...@googlegroups.com
They are implemented for discrete traits (the ‘asymmetric model’) and can be selected in BEAUti. It would also be possible to use them for nucleotide or amino acid data by editing the
XML (generate a example for the discrete traits and then apply the substitution model to the nucleotide data).

Andrew

Julian W Tang

unread,
Sep 22, 2014, 7:07:07 AM9/22/14
to beast...@googlegroups.com
Any particular version of Beauti, Andrew?
 
Can you give more specific details about how to edit the XML for this for non-programmers?
 
Thanks,
 
Julian



 

From: ram...@gmail.com
Subject: Re: Substitution models more general than GTR
Date: Mon, 22 Sep 2014 09:09:09 +0100
To: beast...@googlegroups.com

Bram

unread,
Sep 22, 2014, 1:10:29 PM9/22/14
to beast...@googlegroups.com
Hi Julian,

Have you tested whether or not the data contain some temporal signal? The time range (6 years) does not seem awfully large. To get a quick idea you could use path-o-gen, see Andrew's website.

Best,
Bram

Op maandag 22 september 2014 08:46:52 UTC+2 schreef Julian W Tang:

Julian W Tang

unread,
Sep 22, 2014, 1:41:38 PM9/22/14
to beast...@googlegroups.com
Thx Bram - I can try that (not used it before though).

The HIV subtype A and C viruses from the same population over the same time-frame do converge nicely - though these are likely to be mostly immigration related, whereas the subtype Bs are both endemic and imported to about an equal measure.

In fact, so far, I am seeing a small peak and a large peak in a bimodal distribution for the subtype Bs, so maybe that is telling me something ....

Julian


Date: Mon, 22 Sep 2014 10:10:28 -0700
From: bram.v...@gmail.com

Bram

unread,
Sep 25, 2014, 4:00:07 AM9/25/14
to beast...@googlegroups.com
Hello Julian,

"so maybe that is telling me something ...."

With the little information I have it is difficult to give a grounded reply. E.g. what does the path-o-gen analysis tells you? What parameter has a bimodal posterior distribution? At what level are you looking at the evolution: local, regional, country, continent, worldwide,...? 

Best,
Bram


Op maandag 22 september 2014 19:41:38 UTC+2 schreef Julian W Tang:

Roxana Capper

unread,
Sep 27, 2014, 7:15:20 PM9/27/14
to beast...@googlegroups.com
Great!   I am indeed trying to use non-reversible models with nucleotide data.  Thanks for getting me pointed in the right direction.  However, now I have another question --

I have gotten the following code block to run, but obviously as written it's not asymmetric in the least.  Which value index corresponds to which flavor of nucleotide mutation?  

<!-- asymmetric CTMC model for discrete state reconstructions                -->
<generalSubstitutionModel id="asym_nt.model">
<frequencies>
<frequencyModel id="frequencyModel" dataType="nucleotide" normalize="true">
<alignment idref="alignment"/>
<frequencies>
<parameter id="frequencies" dimension="4"/>
</frequencies>
</frequencyModel>
</frequencies>

<!-- rates and indicators                                                    -->
<rates>
<parameter id="all.rates" value="1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0"/>
</rates>
</generalSubstitutionModel>
<siteModel id="siteModel">
<substitutionModel>
<generalSubstitutionModel idref="asym_nt.model"/>
</substitutionModel>
<gammaShape gammaCategories="4">
<parameter id="alpha" value="0.034" lower="0.0"/>
</gammaShape>
</siteModel>

Thanks!
Roxana

Manish Kumar Sinha

unread,
Sep 28, 2014, 9:36:24 AM9/28/14
to beast...@googlegroups.com
Hello all , 
I am new to this tool . please help in running the tool . i want detailed introduction and how to go through this software. please help.

With regards 

Manish

Roxana Capper

unread,
Sep 28, 2014, 11:03:04 AM9/28/14
to beast...@googlegroups.com
Hi Manish,

There are lots of great tutorials online, plus a really nice book on the beast2 website.  I'm happy to answer questions, but you should spend some time figuring out what your questions are exactly.  It is not possible to teach you "how to go through the software" without knowing what you know and what you're working with; even then, BEAST is very complex and no one is going to do your work for you... 

Good luck!
Roxana

Roxana Capper

unread,
Sep 28, 2014, 11:18:26 AM9/28/14
to beast...@googlegroups.com
Bumping this -- I really want to make sure this question gets some attention.  
I'm trying to produce a non-time reversible nucleotide substitution model.  I can get it to run using the CTMC generalSubstitutionModel code block applied to my nt data (posted above); however, if I change this:

<rates>
<parameter id="all.rates" value="1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0"/>
</rates>

to this: 
<rates>
<parameter id="all.rates" dimension="12"/>
</rates> 

I get the following error:
  CompoundLikelihood(compoundModel)=(
    AncestralStateBeagleTreeLikelihood(treeLikelihood)=-Inf
    Total = -Infinity

My goal is to be able to set the specific mutation rates for each type of mutation, because for my data, the C-to-T mutation rate is extremely unequal to the T-to-C mutation rate.  GTR combines both types into a 'ct' category and I'm pretty sure that modeling the combined mutations that way really screws up my results.  I have all the true priors for each type of mutation; I just don't know which mutation type goes where in the <rates> value array!

Thanks for any advice or information about modeling time-irreversible nucleotide patterns...
Roxana

Andrew Rambaut

unread,
Sep 28, 2014, 1:35:28 PM9/28/14
to beast...@googlegroups.com
Parameters have a default value of 0.0.

Try this:

<parameter id="all.rates" dimension=“12” value=“1.0”/>


Andrew
Andrew Rambaut 
Institute for Evolutionary Biology | Centre for Infection, Immunity & Evolution 
Ashworth Laboratories, University of Edinburgh, Edinburgh, EH3 9JT, UK


Roxana Capper

unread,
Sep 28, 2014, 1:41:54 PM9/28/14
to beast...@googlegroups.com
Hi Andrew, thanks for getting back to me so quickly.  Indeed, that line works, but ideally I'd like to set a prior for the subst model that says that if 'ct' and 'ga' are both 1.0, then 'tc' and 'ag' are 0.08 and everything else is 0.02, all relative to 'ct'.  Or does this model behave differently than how one would set relative rates for GTR?

Andrew Rambaut

unread,
Sep 28, 2014, 6:40:34 PM9/28/14
to beast...@googlegroups.com
The order of rates is 

- 1 2 3
7 - 4 5
8 10 - 6
9 11 12 -

I.e., top-right triangle in left to right order followed by bottom-left triangle in top to bottom order.

Assuming you have listed the the nucleotide states in alphabetical order this would give:

A->C, A->G, A->T, C->G, C->T, G->T, C->A, G->A, T->A, G->C, T->C, T->G

So to fix your rates as you specified:

values=“0.02 0.08 0.02 0.02 1.0 0.02 0.02 1.0 0.02 0.02 0.08 0.02"

Would give the value you asked for (but check, I may have made a mistake).

Andrew

Roxana Capper

unread,
Sep 28, 2014, 6:42:22 PM9/28/14
to beast...@googlegroups.com
Thank you so much!!  That's exactly what I was looking for.  :)  
-Roxana

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

--
You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/PeiwC6MA-Mc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beast-users...@googlegroups.com.

Guadalupe Bribiesca

unread,
Apr 6, 2016, 2:47:04 PM4/6/16
to beast-users
Hi,
I've been working on something similar for model comparison but this is as far as I got. I would really appreciate is someone can give me any information about fixing rates. I have a discrete character with three states, so my matrix is smaller. In one model I want to have different rates for everything, but force 3 and 4 to be equal. I don't know how to do this without setting them as priors, which I've been avoiding. Any ideas on this? 
Also, when I run the first, where rates are asymmetrical, I'm only getting one rate value. How do I get the rate for each transition?

Thank you so much.

Lupita
Reply all
Reply to author
Forward
0 new messages