Diploid genotypes rates for double substitutions

32 views
Skip to first unread message

Van Nguyen

unread,
Jun 19, 2025, 10:06:56 PMJun 19
to raxml

Dear CellPhy and RAxML-NG development team,

Recently I have started using CellPhy for genotype‐based tree inference. According to your paper, any “double” substitutions—i.e. simultaneous changes in both alleles—should have a rate of zero under the GT10 (and GT16) models. However, when I ran CellPhy on the provided ToySet data with the GT10+FC (skip the consideration of the error model) option, I obtained the following ML‐estimated rates:

Substitution rates (ML): 0.605749 0.605749 0.605749 2.892161 3.482518 1.236435 0.605749 0.605749 0.605749 0.605749 0.605749 2.892161 0.605749 0.605749 1.000000 3.111827 0.605749 0.605749 0.605749 3.482518 0.605749 1.000000 0.605749 2.550716 0.605749 0.605749 1.236435 0.605749 3.111827 2.550716 1.000000 3.111827 3.482518 1.236435 0.605749 2.550716 2.892161 0.605749 1.236435 0.605749 2.892161 3.482518 2.550716 3.111827 1.000000

Here, every “impossible” double‐substitution rate shows up as 0.605749, rather than zero. Could you please help me understand:

  1. Why these rates are nonzero in the output, despite the paper’s statement?

  2. Whether there is a specific setting or detail I may have missed to force true zeroes for those entries?

Thank you very much for your time and assistance.

Best regards,

Van


Oleksiy Kozlov

unread,
Jul 7, 2025, 5:32:50 AMJul 7
to ra...@googlegroups.com
Dear Van,

thanks for reporting, and sorry for the late reply, I had to find time to dig into this old code.

Indeed, the actual implementation differs from the model described in the paper in two ways:

1. Since true zero substitution rates often lead to numerical problems in optimization, we use a
very small non-zero value instead (currently 0.001).

2. “Impossible” double‐substitution rates were not fixed to 0.0 or 0.001, but instead all
“impossible” rates are forced to be the same, and this rate was optimized by ML. Although this is
different from the theoretical model, in practice we observed that this "impossible" rate converged
to the minimum value (0.001) for all simulated and empirical dataset we analyzed. This was also the
case for the toy dataset with GT16+FO+E model (see Tutorial Section 3):

Substitution rates (ML): 0.001000 0.001000 0.001000 27.354070 989.178276 93.810669 0.001000 0.001000
0.001000 27.354070 [...]

However, as you found out, it does not work with all models. So I now changed the implementation
such that "impossible" mutation rates are always fixed to 0.001

Please check the new version of CellPhy here:

https://github.com/amkozlov/cellphy/releases/tag/v0.9.3


Best,
Oleksiy


On 20.06.25 04:06, Van Nguyen wrote:
> Dear CellPhy and RAxML-NG development team,
>
> Recently I have started using CellPhy for genotype‐based tree inference. According to your paper,
> any “double” substitutions—i.e. simultaneous changes in both alleles—should have a rate of zero
> under the GT10 (and GT16) models. However, when I ran CellPhy on the provided ToySet data with the
> GT10+FC (skip the consideration of the error model) option, I obtained the following ML‐estimated rates:
>
> Substitution rates (ML): 0.605749 0.605749 0.605749 2.892161 3.482518 1.236435 0.605749 0.605749
> 0.605749 0.605749 0.605749 2.892161 0.605749 0.605749 1.000000 3.111827 0.605749 0.605749 0.605749
> 3.482518 0.605749 1.000000 0.605749 2.550716 0.605749 0.605749 1.236435 0.605749 3.111827 2.550716
> 1.000000 3.111827 3.482518 1.236435 0.605749 2.550716 2.892161 0.605749 1.236435 0.605749 2.892161
> 3.482518 2.550716 3.111827 1.000000
>
> Here, every “impossible” double‐substitution rate shows up as 0.605749, rather than zero. Could you
> please help me understand:
>
> 1.
>
> Why these rates are nonzero in the output, despite the paper’s statement?
>
> 2.
>
> Whether there is a specific setting or detail I may have missed to force true zeroes for those
> entries?
>
> Thank you very much for your time and assistance.
>
> Best regards,
>
> Van
>
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion visit https://groups.google.com/d/msgid/
> raxml/60f0e757-54c9-479c-9d7b-2c290817f86dn%40googlegroups.com <https://groups.google.com/d/msgid/
> raxml/60f0e757-54c9-479c-9d7b-2c290817f86dn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages