How does RAxML handle a variable number of character states for the Mk model

128 views

Skip to first unread message

Joseph O'Reilly

unread,

Jul 6, 2017, 11:23:17 AM7/6/17

to raxml

Hi all,

I was wondering how RAxML deals with variable values of k (number of character states) across a morphological dataset when the Mk model is used, and how this translates into the dimensions of the transition matrix of the Mk model, and Im hoping someone here might have the answer.

For nucleotide data we know that the state space of the substitution model consists of 4 possible states at each site, but for morphological matrices the state space is variable and is defined at each character by the character state with the largest value.

Is a single transition matrix constructed using the maximum value of k as implied across the entire morphological matrix? Are characters automatically partitioned by the implied value of k they exhibit, and then an appropriately sized transition matrix applied to each partition of characters?

Thanks,

Joe O'Reilly

Grimm

unread,

Jul 7, 2017, 8:36:36 AM7/7/17

to raxml

Hi Joe,

Is a single transition matrix constructed using the maximum value of k as implied across the entire morphological matrix?

Yes. When you choose the basic setting (-K MK), the substitution model has only two parameters, a probability for change (whether this is a change from 0<->1, 0<->2, etc does not matter), and the Gamma distribution parameter to model some variation of substitution rates across sites.

Are characters automatically partitioned by the implied value of k they exhibit, and then an appropriately sized transition matrix applied to each partition of characters?

You can define several partitions like in the case of molecular data. In that case you will optimise independent 2-parameter models for each of the partitions. For instance, you can define one parition each for all binary, ternary, multistate characters to avoid that the lower frequency of states > 1 have an unproportional effect on the optimised substitution rate.
Or define partitions for different organs, assuming that their evolution is constrained by different mutation probabilities, hence benefit from decoupled models. Or partitions collecting highly homoplasious vs. slow-evolving, phylogenetically better sorted (on the background of a molecular geneology of the group under study) morphological traits.
However, I'm not aware whether these effects have been tested for simulated or real-world data.

According to my experience in analysing also morphological matrices with complex, non-trivial signals (i.e. matrices suffering from treeunlikelyness), using the GTR model will increase the overall support levels, but also the vulnerability of the analysis towards imbalanced matrix coding. With imbalanced coding I mean that e.g. most characters are binary, and few multistate. As the GTR model will optimise a i x i substitution matrix (with i being the highest number of states used in the matrix), it tends to be over-parametrising. So increasing the number of partitions, or the substitution categories is a double-edged sword. Regarding most aspects however, the support patterns are usually not too different for both models implemented in RAxML. But ML-BS may be strongly different from the traditional MP-BS, even when using one partition and the Mk model (which effectively is the ML counterpart of a parsimony analysis)

You should nevertheless be wary regarding the tree's topology. Because of the complex, non-treelike signal in many morphological matrices, the optimised tree does not necessarily show the best-supported branches. See also my recent blogpost on the issue: http://phylonetworks.blogspot.fr/2017/07/should-we-try-to-infer-trees-on.html
If you are aiming to do a total-evidence analysis, this is of less concern (although I would always run single-partition trees, no matter what data has been concatenated, to make sure not to overlook signal conflict in the concatenated data)

Cheers, and good luck.
Guido.

Alexandros Stamatakis

unread,

Jul 8, 2017, 12:41:41 AM7/8/17

to ra...@googlegroups.com

many thanks guido,

that's it :-)

alexis

On 07.07.2017 14:36, Grimm wrote:
> Hi Joe,
>
> Is a single transition matrix constructed using the maximum value of

> /k/ as implied across the entire morphological matrix?

>
>
> Yes. When you choose the basic setting (-K MK), the substitution model
> has only two parameters, a probability for change (whether this is a
> change from 0<->1, 0<->2, etc does not matter), and the Gamma
> distribution parameter to model some variation of substitution rates
> across sites.
>
> Are characters automatically partitioned by the implied value of

> /k/ they exhibit, and then an appropriately sized transition matrix

> applied to each partition of characters?
>
>
> You can define several partitions like in the case of molecular data. In
> that case you will optimise independent 2-parameter models for each of
> the partitions. For instance, you can define one parition each for all
> binary, ternary, multistate characters to avoid that the lower frequency
> of states > 1 have an unproportional effect on the optimised
> substitution rate.
> Or define partitions for different organs, assuming that their evolution
> is constrained by different mutation probabilities, hence benefit from
> decoupled models. Or partitions collecting highly homoplasious vs.
> slow-evolving, phylogenetically better sorted (on the background of a
> molecular geneology of the group under study) morphological traits.
> However, I'm not aware whether these effects have been tested for
> simulated or real-world data.
>
> According to my experience in analysing also morphological matrices with
> complex, non-trivial signals (i.e. matrices suffering from
> treeunlikelyness), using the GTR model will increase the overall support
> levels, but also the vulnerability of the analysis towards imbalanced
> matrix coding. With imbalanced coding I mean that e.g. most characters

> are binary, and few multistate. As the GTR model will optimise a /i/
> x/i/ substitution matrix (with /i /being the highest number of states

> used in the matrix), it tends to be over-parametrising. So increasing
> the number of partitions, or the substitution categories is a
> double-edged sword. Regarding most aspects however, the support patterns
> are usually not too different for both models implemented in RAxML. But
> ML-BS may be strongly different from the traditional MP-BS, even when

> using one partition and the /Mk/ model (which effectively is the ML

> counterpart of a parsimony analysis)
>
> You should nevertheless be wary regarding the tree's topology. Because
> of the complex, non-treelike signal in many morphological matrices, the
> optimised tree does not necessarily show the best-supported branches.
> See also my recent blogpost on the issue:
> http://phylonetworks.blogspot.fr/2017/07/should-we-try-to-infer-trees-on.html
> If you are aiming to do a total-evidence analysis, this is of less
> concern (although I would always run single-partition trees, no matter
> what data has been concatenated, to make sure not to overlook signal
> conflict in the concatenated data)
>
> Cheers, and good luck.
> Guido.
>

> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

Reply all

Reply to author

Forward

0 new messages