appropriate model for indel coding?

1,567 views
Skip to first unread message

anna papadopoulou

unread,
Aug 24, 2012, 7:52:10 AM8/24/12
to ra...@googlegroups.com
Hi,

I am analysing an intergenic spacer region dataset and I am interested in distinguishing between some closely related species which only differ by an indel (so they appear identical when gaps are treated as missing data). I have coded the indels using 'simple indel coding' (binary characters) or 'modified complex indel coding' (multistate characters), and I would like to use them as an additional partition in my dataset.
I wonder: a) which would be the appropriate model to analyse this partition in RAxML? would it be appropriate to use " -m BINGAMMA" in the case of binary characters? In the case of the multistate characters would it be preferable to use "-K GTR" or "-K ORDERED" or "-K MK"? 
b) to my understanding, I should be able to combine DNA data partitions with multistate data partitions in a single analysis (after defining each partition as 'MULTI' or 'DNA' in the part.txt file) by using "-K ORDERED|MK|GTR -m GTRGAMMA', is that right? Is there also a way to combine DNA data with binary data partitions?

any general suggestions on whether it actually makes sense to use indel coding+RAxML would be very much appreciated too.

thank you very much for your help!

Anna Papadopoulou

Alexandros Stamatakis

unread,
Aug 24, 2012, 9:39:47 AM8/24/12
to ra...@googlegroups.com
Gia sou Anna,

> I am analysing an intergenic spacer region dataset and I am interested in
> distinguishing between some closely related species which only differ by an
> indel (so they appear identical when gaps are treated as missing data). I
> have coded the indels using 'simple indel coding' (binary characters) or
> 'modified complex indel coding' (multistate characters), and I would like
> to use them as an additional partition in my dataset.
> I wonder: a) which would be the appropriate model to analyse this partition
> in RAxML? would it be appropriate to use " -m BINGAMMA" in the case of
> binary characters?

Yes, I believe so. There are not that many models for binary character
evolution and we have found that this works quite well for modeling
indel patterns, albeit in a slightly different context (see this
technical report here:
http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-5.pdf)


> In the case of the multistate characters would it be
> preferable to use "-K GTR" or "-K ORDERED" or "-K MK"?

No clue, you should actually do some model testing (I think you will
have to use the Aikaike Information Criterion) to determine the best-fit
model here.

> b) to my understanding, I should be able to combine DNA data partitions
> with multistate data partitions in a single analysis (after defining each
> partition as 'MULTI' or 'DNA' in the part.txt file) by using "-K
> ORDERED|MK|GTR -m GTRGAMMA', is that right?

yes :-)

> Is there also a way to combine
> DNA data with binary data partitions?

Yes, that's also feasible, I think that there is a pervious post
somewhere in here where we were discussing how this is done, your
partition file could look like this:

BIN, p1=1-100
DNA, P2=101-200


> any general suggestions on whether it actually makes sense to use indel
> coding+RAxML would be very much appreciated too.

Well I guess it's worth a try, we just don't know yet how to better
incorporate indels. Another thing to try is to treat indels as fifth
state, i.e., map your characters like this:
A -> 0
C -> 1
G -> 2
T -> 3
- -> 4

and then run this with the multi-state GTR model ...

Hope this helps,

Alexis

>
> thank you very much for your help!
>
> Anna Papadopoulou

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

anna papadopoulou

unread,
Aug 24, 2012, 10:27:21 AM8/24/12
to ra...@googlegroups.com
That's all very helpful, thank you very much Alexi!

just one (silly) clarification: in the case of combining DNA data with a binary character partition, how do I pass the binary model? using '-K' as in the case of multi-state characters? or I do not really need to specify anything in the command line (as there is a single model for binary characters, and I have specified the binary data in the partitions file), so it would simply be '-m GTRGAMMA' as in a DNA-only analysis?

s'efxaristw poly!

Anna

Alexandros Stamatakis

unread,
Aug 24, 2012, 10:34:44 AM8/24/12
to ra...@googlegroups.com

> That's all very helpful, thank you very much Alexi!

:-)

> just one (silly) clarification: in the case of combining DNA data with a
> binary character partition, how do I pass the binary model? using '-K' as
> in the case of multi-state characters? or I do not really need to specify
> anything in the command line (as there is a single model for binary
> characters, and I have specified the binary data in the partitions file),
> so it would simply be '-m GTRGAMMA' as in a DNA-only analysis?

yes that's exactly right, when you are using a partition file the only
thing raxml extracts from that -m GTRGAMMA command line string is that
the gamme model of rate heterogeneity shall be used, you can also use
-m BINGAMMA which weill have exactly the same effect.

καλο βραδυ,

Αλεξης

Akifumi S. Tanabe

unread,
Aug 24, 2012, 4:14:13 PM8/24/12
to ra...@googlegroups.com
Hi Anna,

Because modified complex indel coding by Muller (2006) assumes step
matrix, the coded indel characters should not be used for model-based
phylogenetic analysis without step matrix. The AIC and model selection
cannot detect violation of assumption.

As RAxML cannot correct coding bias (ascertainment bias), you should
use RAxML carefully for combined analysis of nucleotide and
bionary/multistate sequences. GARLI and MrBayes can correct coding bias.

If the gap-size of your data vary, fifth-state coding suggested by
Alexis might causes misestimation because long gaps caused by 1 event
are overvalued. This idea was confirmed in my study using simulated data,
but unfortunately it has not been published yet :-(.

Best regards,

On Fri, 24 Aug 2012 04:52:10 -0700 (PDT)
anna papadopoulou <anna.pap...@ibe.upf-csic.es> wrote in
<b487c8f6-f62c-46eb...@googlegroups.com>
--
Akifumi S. Tanabe <akifumi...@gmail.com>

Anna Papadopoulou

unread,
Aug 27, 2012, 5:58:15 AM8/27/12
to ra...@googlegroups.com
Dear Akifumi,

thank you very much for your input and your explanations, it is greatly appreciated!

- Yes, I realised the issue that you mention about the step matrix, when I tried to apply the 'modified complex indel coding' (Muller 2006), and I perfectly understand your point. I assume there is no way around it, right? So perhaps it would be less problematic to use the 'simple indel coding' (Simmons & Ochoterena, 2000) instead?

- Regarding the coding (ascertainment) bias, is it correct that it is less important for big datasets (as stated in the MrBayes manual)? In my case, I analyse datasets of a few hundreds of taxa (150-1600), do you think it might be less of a problem?

- Yes, I totally agree that I shouldn't code gaps as fifth characters for my dataset, as it causes long branches between closely related species that only differ by a single indel event.

Do you have any other suggestion on how to incorporate indel information in ML phylogenetic analysis?

thank you very much for your help,

best regards,

Anna

Akifumi S. Tanabe

unread,
Aug 27, 2012, 7:53:46 AM8/27/12
to ra...@googlegroups.com
Hi Anna,

On Mon, 27 Aug 2012 11:58:15 +0200
Anna Papadopoulou <a.papado...@gmail.com> wrote in
<CAF+19HpoLsM2jhLWt2ybhLtx...@mail.gmail.com>
> - Yes, I realised the issue that you mention about the step matrix, when I
> tried to apply the 'modified complex indel coding' (Muller 2006), and I
> perfectly understand your point. I assume there is no way around it, right?

No. You can specify substitution rate matrix in stead of step matrix on
the ML analysis with other software such as PAUP* (but I don't recommend
this method), or use "a variant of modified complex indel coding (vMCIC)"
developed by me. vMCIC is implemeted in "pgencodegap" in Phylogears2.
http://www.fifthdimension.jp/products/phylogears/
This method decompose a multistate character with step matrix by MCIC into
multiple binary characters. Note that this method requires the model
without among-site rate variation and without among-character state rate
variation like JC69 which cannot apply on RAxML analysis.
However, vMCIC did not outperform SIC in my study using simulated data.

> So perhaps it would be less problematic to use the 'simple indel coding'
> (Simmons & Ochoterena, 2000) instead?

Yes.

> - Regarding the coding (ascertainment) bias, is it correct that it is less
> important for big datasets (as stated in the MrBayes manual)? In my case, I
> analyse datasets of a few hundreds of taxa (150-1600), do you think it
> might be less of a problem?

I don't know. If you want to know it, you should compare the results with
correction and without correction.

> Do you have any other suggestion on how to incorporate indel information in
> ML phylogenetic analysis?

Briefly speaking, I recommend SIC + Mk model + coding bias correction,
and comparing the trees with gap information and without gap information.
Note that appropreate model and appropreate correction method for coding
bias are still unclear.

Best regards,

Ajith Harish

unread,
Jun 15, 2019, 2:48:52 PM6/15/19
to raxml
Hi everybody,

I am going back to an older issue, because I could not find an answer anywhere else ... I also have some questions related to RAxML-NG. 
On Friday, August 24, 2012 at 3:39:47 PM UTC+2, Alexis wrote

> In the case of the multistate characters would it be
> preferable to use "-K GTR" or "-K ORDERED" or "-K MK"?

No clue, you should actually do some model testing (I think you will
have to use the Aikaike Information Criterion) to determine the best-fit
model here. 

To run model selection tests for multistate (or binary) characters using AIC or BIC, how can I determine the number of parameters for each model? Is it possible to ask RAxML to output the number of parameters for each model? I see that RAxML-NG does this in a log file when we want to compare different models (explained in Using RAxML-NG in Practice - https://www.preprints.org/manuscript/201905.0056/v1). 

But, RAxML-NG does not have the ORDERED model (yet). Presumably it will be included in a future update? In any case, I guess it is possible to specify the ORDERED model using the "User-defined symmetries" option?

If anybody has come across these issues, please let me know. Thanks.

Best,
Ajith


Alexey Kozlov

unread,
Jun 17, 2019, 9:14:02 AM6/17/19
to ra...@googlegroups.com
Hi Ajith,

> To run model selection tests for multistate (or binary) characters using AIC or BIC, how can I
> determine the number of parameters for each model? Is it possible to ask RAxML to output the number
> of parameters for each model?

RAxML 8.x will print number of model parameters in tree evaluation mode ("-f e" ).

It will also print substitution rates, and if you look at them for the ORDERED model, you will
realize they are rather weird:

rate 0 <-> 1: 1.000000
rate 0 <-> 2: 2.000000
rate 0 <-> 3: 3.000000
rate 0 <-> 4: 4.000000
rate 1 <-> 2: 1.000000
rate 1 <-> 3: 2.000000
rate 1 <-> 4: 3.000000
rate 2 <-> 3: 1.000000
rate 2 <-> 4: 2.000000
rate 3 <-> 4: 1.000000

Not sure if it is a different model, or just a bug...

In any case, I would recommend moving to RAxML-NG.

> But, RAxML-NG does not have the ORDERED model (yet). Presumably it will be included in a future
> update?

Probably.

>In any case, I guess it is possible to specify the ORDERED model using the "User-defined
> symmetries" option?

If all substitution rates are equal - this is how "ORDERED" is defined in IQ-Tree - then you don't
need to define symmetries, but you can simply fix subst. rates to 1.0 for neighboring states, and to
0.0 otherwise.

For instance, 5-state model substitution matrix will be:

A B C D E
A * 1 0 0 0
B * 1 0 0
C * 1 0
D * 1
E *

an the corresponding model definition string for RAxML-NG is:

--model MULTI5_GTR{1/0/0/0/1/0/0/1/0/1}+FE+G

Hope this helps,
Alexey

Ajith Harish

unread,
Jun 17, 2019, 12:40:22 PM6/17/19
to ra...@googlegroups.com
Hi Alexey,

Thank you for your explanations.

>
> It will also print substitution rates, and if you look at them for the ORDERED model, you will realize they are rather weird:
>
> rate 0 <-> 1: 1.000000
> rate 0 <-> 2: 2.000000
> rate 0 <-> 3: 3.000000
> rate 0 <-> 4: 4.000000
> rate 1 <-> 2: 1.000000
> rate 1 <-> 3: 2.000000
> rate 1 <-> 4: 3.000000
> rate 2 <-> 3: 1.000000
> rate 2 <-> 4: 2.000000
> rate 3 <-> 4: 1.000000
>
> Not sure if it is a different model, or just a bug…

I suppose this is also one way of defining ORDERED state transformations? This is how it is in a parsimony setting where the character-state 'transformation cost’ is defined for a 'transformation series’ for a given set of states, since parsimony counts the number of steps. Not sure if these ‘costs’ can be directly translated to ‘rates’ in a ML setting. The idea behind ORDERED transformation is to specify that A to C changes are required to go through B - in a linear order. In that case, the rates cannot all be equal, but a series of decreasing (or increasing?) rates.

> If all substitution rates are equal - this is how "ORDERED" is defined in IQ-Tree - then you don't need to define symmetries, but you can simply fix subst. rates to 1.0 for neighboring states, and to 0.0 otherwise.

While we are on this: What is the difference b/w 'User-defined symmetries’ and 'User-defined rates’. Would it be correct to equate the User-defined rates = User-defined substitution matrix = User-defined model?

>
> For instance, 5-state model substitution matrix will be:
>
> A B C D E
> A * 1 0 0 0
> B * 1 0 0
> C * 1 0
> D * 1
> E *
>
> an the corresponding model definition string for RAxML-NG is:
>
> --model MULTI5_GTR{1/0/0/0/1/0/0/1/0/1}+FE+G
>
> Hope this helps,

It sure does, many thanks.


Alexey Kozlov

unread,
Jun 17, 2019, 5:04:49 PM6/17/19
to ra...@googlegroups.com
Hi Ajith,

> I suppose this is also one way of defining ORDERED state transformations? This is how it is in a parsimony setting where the character-state 'transformation cost’ is defined for a 'transformation series’ for a given set of states, since parsimony counts the number of steps. Not sure if these ‘costs’ can be directly translated to ‘rates’ in a ML setting. The idea behind ORDERED transformation is to specify that A to C changes are required to go through B - in a linear order. In that case, the rates cannot all be equal, but a series of decreasing (or increasing?) rates.
>

This "cost" logic will work for parsimony, but not for the probabilistic substitution models we use
in RAxML/RAxML-NG.

Do you know any publication where those ORDERED model(s) were formally described?

> While we are on this: What is the difference b/w 'User-defined symmetries’ and 'User-defined rates’.

With symmetries, we link some of the rates together, so we *decrease* the number of free parameters.
eg: DNA010010 (equivalent to HKY) -> one free parameter (rate1/rate0 ratio) will be estimated by ML

With user-defined rates, we set all rates to the fixed values, so there are *no free parameters* to
estimate, eg: GTR{5.1/4.2/3.3/2.4/1.0}

>Would it be correct to equate the User-defined rates = User-defined substitution matrix = User-defined model?

I would say 1=2, but "user-defined model" is too unspecific (could be rate symmetries, fixed
parameter values, or both).

Best,
Alexey

Ajith Harish

unread,
Jun 17, 2019, 5:47:07 PM6/17/19
to ra...@googlegroups.com
Hi Alexey,

Thanks for your answers.


This "cost" logic will work for parsimony, but not for the probabilistic substitution models we use in RAxML/RAxML-NG.

Of course, the cost logic might not be so straight forward, or not applicable with probabilistic methods.


Do you know any publication where those ORDERED model(s) were formally described?

Here’s one, but this was implemented in a Bayesian framework in BEAST. But I guess something similar could be implemented for ML.

Alekseyenko AV, Lee CJ, Suchard MA. Wagner and Dollo: a stochastic duet by composing two parsimonious solos. Syst Biol. 2008 Oct;57(5):772-84. doi:10.1080/10635150802434394.


Also, ordered (Wagner) type transformation is implemented in MrBayes, but it is limited to 10 states. Not sure if there is formal description, but may be something can be extracted from the source code (though I can’t myself) https://github.com/NBISweden/MrBayes/tree/v3.2.7a


I would say 1=2, but "user-defined model" is too unspecific (could be rate symmetries, fixed parameter values, or both).

Makes sense. Thanks again.

Best,
Ajith

Alexey Kozlov

unread,
Jun 17, 2019, 6:14:44 PM6/17/19
to ra...@googlegroups.com
Hi Ajith,

thanks for the references!

As far as I can tell from quickly checking the MrBayes code, it defines the ORDERED model in the
same way as IQTree:

rate_ij = if abs(i-j) == 1 then 1.0 else 0.0

Best,
Alexey
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/724EF0EA-C518-40DE-8E06-706C37D24B47%40gmail.com
> <https://groups.google.com/d/msgid/raxml/724EF0EA-C518-40DE-8E06-706C37D24B47%40gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Alexey Kozlov

unread,
Jun 18, 2019, 8:44:31 AM6/18/19
to Ajith Harish, ra...@googlegroups.com
Hi Ajith,

hm seems like we run into numerical issues with those zero rates...

A practical workaround would be to use a very small but non-zero rate for non-adjacent states (see
attach).

Best,
Alexey

On 18.06.19 14:20, Ajith Harish wrote:
> Hi again Alexey,
>
>> please remove the dots in the "ordered.txt" file, it should contain upper triangle w/o the diagonal.
>>
>> Still have to check why raxml-ng hangs in this case, it should just print an error message :)
>
> Removing the dots fixed the hanging :) but gave an error
>
> Starting ML tree search with 20 distinct starting trees
>
> Assertion failed: (total_loglh < 0.), function treeinfo_compute_loglh, file
> /Users/sco/alexey/raxml-ng/libs/pll-modules/src/tree/treeinfo.c, line 1072.
> Assertion failed: (total_loglh < 0.), function treeinfo_compute_loglh, file
> /Users/sco/alexey/raxml-ng/libs/pll-modules/src/tree/treeinfo.c, line 1072.
> Abort trap: 6
>
> I have tried this matrix with IQ-Tree as well. It gives an error too (ERROR: WARNING: Numerical
> underflow for lh-derivative).
>
> So far, only RAxML-8.x ORDERED model has worked for this dataset :-)
>
> Best,
> Ajith
>
>
>
ordered2.txt

Grimm

unread,
Jun 18, 2019, 9:15:11 AM6/18/19
to raxml
Hi you two,

wouldn't it be the easiest solution, in case you want to have ordered multistate characters, to binarise the multistate matrix and then just run it with the binary model implemented also in RAxML-NG?

If I have three states, ordered 0 <> 1 <> 2, I can run the ordered model in (classic) RAxML which effectively just puts the probability for 0<>2 to zero and takes instead the sum-probability for 0<>1 and 1<>2 (right?) or recode as 
0 := 00
1 := 10
2 := 11
It's a simple adaptation of the parsimony/ordered characters extra step idea. Since I optimise a global rate for 0<>1 that must have p < 1, a direct transformation between 0<>2 will always be much less probable.

The optimised models may be different (quite so, because, GTR considers that 0<>1 may be different from 1<>2; but unless your 2, 3, 4 etc are biologically analogous, you would never run GTR), but the results is pretty much the same (tree-topology and BS-support wise) ... at least for all morphological data sets I run through MrBayes and RAxML (old MrBayes didn't had an option for multistate "standard" characters, so when I was still using it long time ago, before RAxML came up, binarising the multistate characters was the only choice).

It of course depends on what you multistate matrix codes for, but in generally, I don't think any model test can help you, because far the most models are designed for molecular evolution, and more often than not assume neutral evolution.

For example, if you have a matrix with a lot of 0,1, rare 2, and very rarely 3, 4 etc. (because you don't need many states for most of your characters, coded gaps, non-molecular data), any molecular-derived model must be oddly wrong since a 0,1,2 etc in one character has a completely different quality than in another, while an A, for example, is always an adenine and a G a guanine. When the model is non-grippable, stick with the most simple ones is not the worst choice, and binary (Mk for multistate) is probably the most simple one.

If you worried about overweighting the impact of many-state characters when binarising them (an ordered character with x states ends up with x-long binary sequences), you just down-weight the multistate ones (I suppose you still can read-in a character-weigth set into RAxML-NG like into RAxML 8)

Cheers, Guido

Ajith Harish

unread,
Jun 18, 2019, 2:57:53 PM6/18/19
to ra...@googlegroups.com
Hi Guido and Alexey,

wouldn't it be the easiest solution, in case you want to have ordered multistate characters, to binarise the multistate matrix and then just run it with the binary model implemented also in RAxML-NG?

If I have three states, ordered 0 <> 1 <> 2, I can run the ordered model in (classic) RAxML which effectively just puts the probability for 0<>2 to zero and takes instead the sum-probability for 0<>1 and 1<>2 (right?) or recode as 
0 := 00
1 := 10
2 := 11
It's a simple adaptation of the parsimony/ordered characters extra step idea. Since I optimise a global rate for 0<>1 that must have p < 1, a direct transformation between 0<>2 will always be much less probable.

I guess additive binary coding would work here, and the issues with multi-state transformations may well be taken care of.


The optimised models may be different (quite so, because, GTR considers that 0<>1 may be different from 1<>2; but unless your 2, 3, 4 etc are biologically analogous, you would never run GTR), but the results is pretty much the same (tree-topology and BS-support wise) ... at least for all morphological data sets I run through MrBayes and RAxML (old MrBayes didn't had an option for multistate "standard" characters, so when I was still using it long time ago, before RAxML came up, binarising the multistate characters was the only choice).

It of course depends on what you multistate matrix codes for, but in generally, I don't think any model test can help you, because far the most models are designed for molecular evolution, and more often than not assume neutral evolution.

For example, if you have a matrix with a lot of 0,1, rare 2, and very rarely 3, 4 etc. (because you don't need many states for most of your characters, coded gaps, non-molecular data), any molecular-derived model must be oddly wrong since a 0,1,2 etc in one character has a completely different quality than in another, while an A, for example, is always an adenine and a G a guanine. When the model is non-grippable, stick with the most simple ones is not the worst choice, and binary (Mk for multistate) is probably the most simple one.

I am working with (gene) copy number data coded as discrete character-states. Like the As and Gs, 5s and 10s will always be 5s and 10s as long as this coding is used consistently. Accounting for rate variation is one way to account for selection, isn’t it?


If you worried about overweighting the impact of many-state characters when binarising them (an ordered character with x states ends up with x-long binary sequences), you just down-weight the multistate ones (I suppose you still can read-in a character-weigth set into RAxML-NG like into RAxML 8)

Haven’t explored character-weighting in RAxML-NG, though, weighting itself can be sticky issue. But, thanks for your suggestions, its worth exploring.

Cheers,
Ajith


Ajith Harish

unread,
Jun 18, 2019, 3:21:02 PM6/18/19
to ra...@googlegroups.com
Hi Alexey,

A practical workaround would be to use a very small but non-zero rate for non-adjacent states (see attach).

This seems to work. Thanks!

Going back to evaluating trees and models, and in relation to this paper (Alekseyenko AV, Lee CJ, Suchard MA. Wagner and Dollo: a stochastic duet by composing two parsimonious solos. Syst Biol. 2008) - Is it possible to evaluate rooted trees in in RAxML-NG? This would of course require using asymmetric rates (Dollo type characters) and may be (also) non-stationary frequencies. 

My interested in evaluating rooted trees is w.r.t ancestral state reconstruction. Here as well, in trying to move to RAxML-NG from RAxML I noticed that RAxML-NG takes both rooted and unrooted trees for --ancestral command, unlike RAxML which requires a rooted tree. I can imagine that the ancestral states will be different between the two, but RAxML-NG does not output ancestral states for the ROOT even when a rooted tree is provided i.e. there is one less ancestral sequence in the output compared to RAxML. Any comments on rooted trees and ancestral states?

Best,
Ajith

Grimm

unread,
Jun 18, 2019, 3:24:43 PM6/18/19
to raxml

I am working with (gene) copy number data coded as discrete character-states. Like the As and Gs, 5s and 10s will always be 5s and 10s as long as this coding is used consistently.

So your "binning" is proportional. Maybe it works then to optimise in a multi-state framework. But is the duplication (copy) probably always following the same model? I have no experience with such data, but as geneticist I guess it can be quite different depending on the gene families' functions. I would just go for try and error. Run the multistate as-is (ordered) and binarised and just compare the results. Preliminary with RAxML 8 if not yet implemented in RAxML, using both the Mk and GTR implementations. And then compare the outcome.

If it converges, you're done. If not, you can still make a comparison of the optimised models, does it make sense or not. My experience with various binned data was that GTR always comes up with quite odd models. Running with Mk and GTR should suffice, because they are the endpoints of all possible models for multistate characters.

 
Accounting for rate variation is one way to account for selection, isn’t it?

Yes. I would run with the multistate and/or binary with the PSR approximation (CAT models) or the full Gamma (if feasible).
 
Happy inference, Guido

Ajith Harish

unread,
Jun 21, 2019, 8:03:47 AM6/21/19
to ra...@googlegroups.com, Alexey Kozlov
Hi Alexey,

A small update - to answer my questions about rooted trees and Dollo type characters I tried a few things,

Going back to evaluating trees and models, and in relation to this paper (Alekseyenko AV, Lee CJ, Suchard MA. Wagner and Dollo: a stochastic duet by composing two parsimonious solos. Syst Biol. 2008) - Is it possible to evaluate rooted trees in in RAxML-NG? This would of course require using asymmetric rates (Dollo type characters) and may be (also) non-stationary frequencies. 

Rooted trees can indeed be evaluated with the  —evaluate function in RAxML-NG. I tried a few random rootings, and interestingly all the scores (AIC, AICc and BIC) were better (smaller) for the rooted trees compared to the unrooted tree.

However, RAxML-NG did not take asymmetric rates for user-defined substitution matrix (file attached) - ERROR: Invalid number of substitution rates specified: 1022 (expected: 496)


My interested in evaluating rooted trees is w.r.t ancestral state reconstruction. Here as well, in trying to move to RAxML-NG from RAxML I noticed that RAxML-NG takes both rooted and unrooted trees for --ancestral command, unlike RAxML which requires a rooted tree. I can imagine that the ancestral states will be different between the two, but RAxML-NG does not output ancestral states for the ROOT even when a rooted tree is provided i.e. there is one less ancestral sequence in the output compared to RAxML. Any comments on rooted trees and ancestral states?

I could not find a way to output ancestral states for the root-node. I suppose I could put this under ‘feature request’?

In any case if you would like to look into asymmetric rates and non-stationary frequencies for BIN models, here is another reference: Klopfstein, S., Vilhelmsen, L., & Ronquist, F. (2015). A nonstationary Markov model detects directional evolution in hymenopteran morphology. Systematic biology, 64(6), 1089-1103.

Please suggest.

Best,
Ajith

OrdAsymm.txt

Alexey Kozlov

unread,
Jun 21, 2019, 9:11:51 AM6/21/19
to Ajith Harish, ra...@googlegroups.com, Alexandros Stamatakis
Hi Ajith,

>> Please note that, similar to the situation with ORDERED model, (old) RAxML implemented non-standard ancestral state reconstruction algorithm. In contrast, RAxML-NG implements classical marginal state reconstruction ("empirical Bayes") as describe here:
>
> Could you provide a reference for the non-standard implementation, and is ancestral state reconstruction (ASR) algorithm in (old) RAxML a joint reconstruction? Are you suggesting the (old) RAxML’s ASR algorithm is buggy? Hope not!

I hope Alexis can comment on that :)

> In any case, could you please suggest ML programs that can reconstruct ancestral states of non-standard (morphology) characters at the root node. I want to evaluate the differences in estimates resulting from alternative root positions.

I'm not really and expert in ASR, but maybe these references will help:

- Mesquite: http://www.mesquiteproject.org/Ancestral%20States.html

- https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.2837

Also as long as your model is reversible, you can simply look at the ancestral states for the
internal nodes, this will be equivalent to rooting the tree at the respective node and computing the
ancestrals. This will mean cutting the "rooting" branch 0/100 and not 50/50 though.

>Also, programs that would evaluate rooted trees. Bayesian programs take a Loooong time to run!

Maybe you should check IQTree. AFAIK it does support some non-reversible models.


Best,
Alexey

Alexandros Stamatakis

unread,
Jun 23, 2019, 11:05:20 PM6/23/19
to Ajith Harish, ra...@googlegroups.com


On 21.06.19 16:11, Alexey Kozlov wrote:
> Hi Ajith,
>
>>> Please note that, similar to the situation with ORDERED model, (old)
>>> RAxML implemented non-standard ancestral state reconstruction
>>> algorithm. In contrast, RAxML-NG implements classical marginal state
>>> reconstruction ("empirical Bayes") as describe here:
>>
>> Could you provide a reference for the non-standard implementation, and
>> is ancestral state reconstruction (ASR) algorithm in (old) RAxML a
>> joint reconstruction? Are you suggesting the (old) RAxML’s ASR
>> algorithm is buggy? Hope not!
>
> I hope Alexis can comment on that :)

It's not buggy, just different: essentially it prints the conditional
likelihood entries with respect to the given root if the input tree.

Alexis

>
>> In any case, could you please suggest ML programs that can reconstruct
>> ancestral states of non-standard (morphology) characters at the root
>> node. I want to evaluate the differences in estimates resulting from
>> alternative root positions.
>
> I'm not really and expert in ASR, but maybe these references will help:
>
> - Mesquite: http://www.mesquiteproject.org/Ancestral%20States.html
>
> - https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.2837
>
> Also as long as your model is reversible, you can simply look at the
> ancestral states for the internal nodes, this will be equivalent to
> rooting the tree at the respective node and computing the ancestrals.
> This will mean cutting the "rooting" branch 0/100 and not 50/50 though.
>
>> Also, programs that would evaluate rooted trees. Bayesian programs
>> take a Loooong time to run!
>
> Maybe you should check IQTree. AFAIK it does support some non-reversible
> models.
>
>
> Best,
> Alexey

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

Ajith Harish

unread,
Jun 24, 2019, 8:06:36 AM6/24/19
to Alexandros Stamatakis, ra...@googlegroups.com

> On 21.06.19 16:11, Alexey Kozlov wrote:
>> Hi Ajith,
>>>> Please note that, similar to the situation with ORDERED model, (old) RAxML implemented non-standard ancestral state reconstruction algorithm. In contrast, RAxML-NG implements classical marginal state reconstruction ("empirical Bayes") as describe here:
>>>
>>> Could you provide a reference for the non-standard implementation, and is ancestral state reconstruction (ASR) algorithm in (old) RAxML a joint reconstruction? Are you suggesting the (old) RAxML’s ASR algorithm is buggy? Hope not!
>> I hope Alexis can comment on that :)
>
> It's not buggy, just different: essentially it prints the conditional likelihood entries with respect to the given root if the input tree.

Thanks for the clarification Alexis. I didn’t suspect it to be buggy, in the context of the discussion there was pun intended :-) -- because our discussion started with reference to evaluating the ORDERED model, and Alexey noticed something odd with the ORDERED substitution matrix in RAxML. Anyhow, It could be useful if the same is possible in RAxML-NG also.

All the best,
Ajith

Ajith Harish

unread,
Jun 24, 2019, 8:40:14 AM6/24/19
to ra...@googlegroups.com
Hi Alexey,

>
> I'm not really and expert in ASR, but maybe these references will help:
>
> - Mesquite: http://www.mesquiteproject.org/Ancestral%20States.html
>
> - https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.2837
>
> Also as long as your model is reversible, you can simply look at the ancestral states for the internal nodes, this will be equivalent to rooting the tree at the respective node and computing the ancestrals. This will mean cutting the "rooting" branch 0/100 and not 50/50 though.
>
>> Also, programs that would evaluate rooted trees. Bayesian programs take a Loooong time to run!
>
> Maybe you should check IQTree. AFAIK it does support some non-reversible models.

Thanks for your suggestions. I think that mesquite is limited to DNA, AA and BIN type characters, and non-reversible models in IQTree is limited to DNA. The reference and the tool they implement in R seems like a more feasible option for now. Thanks for point it out.

All the best,
Ajith
Reply all
Reply to author
Forward
0 new messages