Unphased diploid genotypes rates vector

31 views
Skip to first unread message

Michael Tassia

unread,
Apr 4, 2024, 11:20:23 AMApr 4
to raxml
Hello,

I'm trying to extract the character state transitions from the *.bestModel output from raxml-ng for some downstream analyses. For this analysis, I've used the `cellphy` adaptation of `raxml-ng`, but I'm hoping that the *bestModel output format for the GT10+F0 model is comparable to that of GTGTR+F0.

For the following *bestModel output:
GT10{0.001000/6.209183/6.457754/9.898749/1.000000/6.419445/6.196008}+FU{0.143473/0.339141/0.339543/0.144042/0.002217/0.009090/0.002213/0.008874/0.009174/0.002231}, noname = 1-313747

I want to confirm the following interpretation:

The GT10 vector contains the instantaneous transition rates where the first field captures the simultaneous transition of both alleles (set to ~0 for all states), and fields [2-7] are ordered identically to the GTR vector as {AC/CG/AT/GT/AG/CT} (at least according to their order specified here. My intuition behind this interpretation is that the rate for G<->T is fixed at 1.0 (as specified by the `cellphy` manuscript).

Following that, the FU vector captures the character state frequencies in the alignment, where the order is identical to the character state encoding order specified here. That is: `{A/A,C/C,G/G,T/T,A/C,A/G,A/T,C/G,C/T,G/T}`

The version of RAxML used to produce these models is v. 0.9.0git (CellPHY branch). 

I really appreciate the help,
Michael Tassia

Oleksiy Kozlov

unread,
Apr 8, 2024, 12:17:52 PMApr 8
to ra...@googlegroups.com
Hi Michael,

for the state frequencies, your interpretation is correct.

For the substitution rates, the order is different, namely:

/* A-C: 1, A-G: 2, A-T: 3, C-G: 4, C-T: 5, G-T: 6, others: 0 */

Upper triangle matrix looks as follows:

/* AA CC GG TT AC AG AT CG CT GT */
static int gt_sym_rate_dna4[] = { 0, 0, 0, 1, 2, 3, 0, 0, 0, /* AA */
0, 0, 1, 0, 0, 4, 5, 0, /* CC */
0, 0, 2, 0, 4, 0, 6, /* GG */
0, 0, 3, 0, 5, 6, /* TT */
4, 5, 2, 3, 0, /* AC */
6, 1, 0, 3, /* AG */
0, 1, 2, /* AT */
6, 5, /* CG */
4 }; /* CT */

We always normalize by the last rate in the triangle matrix. Here, it happens to be rate 4
(zero-based), which is GT<->CT in the genotype space, corresponding to G<->C in the nucleotide space.

So once again, here is the correct order for the substitution rates in the .bestModel file:

"impossible", A-C, A-G, A-T, C-G, C-T, G-T

This is the same order as for the DNA GTR model (see here:
https://github.com/amkozlov/raxml-ng/wiki/Input-data#single-model).


Hope this helps,
Oleksiy


On 04.04.24 16:52, Michael Tassia wrote:
> Hello,
>
> I'm trying to extract the character state transitions from the *.bestModeloutput from raxml-ng for
> some downstream analyses. For this analysis, I've used the `cellphy` adaptation of `raxml-ng`, but
> I'm hoping that the *bestModeloutput format for the GT10+F0 model is comparable to that of GTGTR+F0.
>
> For the following *bestModel output:
> GT10{0.001000/6.209183/6.457754/9.898749/1.000000/6.419445/6.196008}+FU{0.143473/0.339141/0.339543/0.144042/0.002217/0.009090/0.002213/0.008874/0.009174/0.002231}, noname = 1-313747
>
> I want to confirm the following interpretation:
>
> The GT10 vector contains the instantaneous transition rates where the first field captures the
> simultaneous transition of both alleles (set to ~0 for all states), and fields [2-7] are ordered
> identically to the GTR vector as {AC/CG/AT/GT/AG/CT} (at least according to their order specified
> here <https://github.com/ddarriba/modeltest/wiki/Models-of-Evolution>. My intuition behind this
> interpretation is that the rate for G<->T is fixed at 1.0 (as specified by the `cellphy` manuscript).
>
> Following that, the FU vector captures the character state frequencies in the alignment, where the
> order is identical to the character state encoding order specified here
> <https://github.com/amkozlov/raxml-ng/wiki/Input-data#state-encoding--order>. That is:
> `{A/A,C/C,G/G,T/T,A/C,A/G,A/T,C/G,C/T,G/T}`
>
> The version of RAxML used to produce these models is v. 0.9.0git (CellPHY branch).
>
> I really appreciate the help,
> Michael Tassia
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/16f92bbc-20c6-4559-a42e-c949c35289d1n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/16f92bbc-20c6-4559-a42e-c949c35289d1n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Michael Tassia

unread,
Apr 9, 2024, 1:35:30 PMApr 9
to raxml
Thank you so much for your detailed response - this certainly helps!

Regarding the substitution rates, Supplementary Note 2 of the `CellPhy` manuscript reports:
"In this case, we need to estimate five nucleotide exchangeabilities (𝛼 = r(A↔C), 𝛽 = r(A↔G), 𝛾= r(A↔T), 𝜅 = r(C↔G), 𝜆 = r(C↔T); let 𝜇 = r(G↔T) = 1)"

This sentence suggests that the substitution rate from G<->T is always set to 1 for normalization, where as the DNA GTR model and your response above suggest the C<->G is the category against which the matrix is normalized (given the current structuring of the substitution matrix). Am I misinterpreting the manuscript?  

Thanks again,
Michael

Oleksiy Kozlov

unread,
Apr 10, 2024, 6:48:39 AMApr 10
to ra...@googlegroups.com
There seems to be a discrepancy between supplementary note and the actual implementation w.r.t. rate
normalization, sorry about this!

On 09.04.24 19:35, Michael Tassia wrote:
> Thank you so much for your detailed response - this certainly helps!
>
> Regarding the substitution rates,* Supplementary Note 2* of the `CellPhy` manuscript reports:
> "/In this case, we need to estimate five nucleotide exchangeabilities (𝛼 = r(A↔C), 𝛽 = r(A↔G), 𝛾=
> r(A↔T), 𝜅 = r(C↔G), 𝜆 = r(C↔T); let 𝜇 = r(G↔T) = 1)"/
> <https://github.com/amkozlov/raxml-ng/wiki/Input-data#single-model>).
> <https://groups.google.com/d/msgid/raxml/16f92bbc-20c6-4559-a42e-c949c35289d1n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/16f92bbc-20c6-4559-a42e-c949c35289d1n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/bab13e35-2391-4217-ad97-fddbc1931d25n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/bab13e35-2391-4217-ad97-fddbc1931d25n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages