Support for site-specific frequency profiles

44 views
Skip to first unread message

Stepan Puhov

unread,
Apr 20, 2024, 12:15:36 AMApr 20
to raxml

Oleksiy Kozlov

unread,
Apr 29, 2024, 9:20:03 AMApr 29
to ra...@googlegroups.com
Dear Stepan,

thanks for your feedback!

> Do you have any plans for adding support for user-defined site-specific AA-frequency profiles (like
> the PMSF profiles pre-calculated with IQ-Tree)? These seem to be a fast and effective alternative to
> complex +C10-60 mixture models, which could significantly improve tree inference accuracy while
> requiring almost no extra computational resources compared to the usual models.
> Similar request seems to have been raised by other users
> <https://groups.google.com/g/raxml/c/nUf_97lwgLc/m/UNALm7v7AAAJ> and, as I can learn from your
> previous post here <https://groups.google.com/g/raxml/c/R2FHM3tYyvU/m/N38NO9bMBAAJ>, you were
> planning to add the model already in 2019.

I know, PMSF has been on our TODO list for a long time, and still is...

> Meanwhile, from my understanding it seems that one could already use site-specific profiles with the
> current program version by defining a per-site partition model representing the original
> site-specific profile. The only trouble would come here for defining rates globally for the whole
> alignment.
> Having tried it this way, however, I see that for some reason the computation time of such
> single-site partition analysis with fixed rates is exceedingly high.

I briefly looked at PMSF, and it should be indeed possible to simulate it with fixed per-site
frequencies (LG+FU{freqs_siteN.txt}). Of course, you would need to estimate those frequency profiles
externally.

The computation time should not be exceedingly high, but there are a couple of caveats.
Could you please post full log files from these runs?

Best.
Oleksiy

>
> I hope that you'll add support for the site-profile model soon. Thank you!
>
>
> Best regards,
> Stepan
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
Apr 30, 2024, 12:02:48 PMApr 30
to raxml
Dear Oleksiy,

Thanks for your reply!

The --model LG+FU{freqs_siteN.txt} syntax proposed in your reply doesn't work, for RAxML (as there is no PMSF support yet) cannot read the freqs_siteN.txt file specifying multiple frequency vectors.
One gets an error like:
ERROR: Invalid number of user frequencies specified: 5320
Number of frequencies must be equal to the number of states: 20
Context: LG+FU{freqs.txt}


What I had tried to do is to use the --model partition.txt --brlen linked syntax, where the partition.txt file would contain a per-site partition model (with the line for the i-th site as following: LG+FU{f1/f2/.../f20}, parti=i-i).
This approach worked out well concerning the final tree likelihood (equals to that of the IQ-Tree LG+PMSF run), however, it was running ca. 50 times longer than the IQ-Tree LG+PMSF run, the IQ-Tree LG+F run or the RAxML LG+F run. For an alignment of 53 seqs x 266 cols (190 PI cols) a run using 3 threads took:
RAxML PMSFpart   1001 s   
IQ-Tree LG+PMSF  23 s
RAxML LG+F       29 s
IQ-Tree LG+F     16 s

Below I attach the relevant benchmark.


Best regards,
Stepan
raxml_test_1.tar.gz

Oleksiy Kozlov

unread,
May 5, 2024, 3:11:48 PMMay 5
to ra...@googlegroups.com
Dear Stepan,

thanks for benchmarking!

> What I had tried to do is to use the --model partition.txt --brlen linked syntax, where the
> partition.txt file would contain a per-site partition model (with the line for the i-th site as
> following: LG+FU{f1/f2/.../f20}, parti=i-i).

Yes, this is exactly what I had in mind.

> This approach worked out well concerning the final tree likelihood (equals to that of the IQ-Tree
> LG+PMSF run), however, it was running ca. 50 times longer than the IQ-Tree LG+PMSF run, the IQ-Tree
> LG+F run or the RAxML LG+F run.

After looking into it, I realized it's a bit more complicated. In raxml-ng, or more specifically in
libpll, we use a slightly different representation of CLVs:

https://cme.h-its.org/exelixis/pubs/dissAlexey.pdf#subsection.2.6.2

For a typical analysis with hundreds/thousands sites per partition, this should yield better (or at
least similar) performance. However, with per-site partitioning, explicit computation of P-matrix at
every site and branch becomes prohibitively expensive. So unfortunately, no quick workaround is
possible, and substantial changes to the likelihood computation kernels would be needed to implement
PMSF efficiently. This might actually be the reason why we postponed PMSF implementation 5 years ago...

Best,
Oleksiy
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
May 13, 2024, 8:50:31 PMMay 13
to raxml
Dear  Oleksiy,

It was quite insightful for me to look through your thesis work, thank you for the link.
Though it is really sad that, in the end, PMSF model cannot be efficiently implemented in raxml-ng currently.

Thank you for your time and good luck in further developments!

Best,
Stepan

Stepan Puhov

unread,
Jun 12, 2024, 8:26:31 PMJun 12
to raxml
Dear Oleksiy,

I have a little follow-up question:

Since the implementation of PMSF profiles through partitioning, though running longer than expected, seems to work properly overall, I am interested in using the same scheme to also use precomputed posterior mean site-specific rates (PMSR).

To jointly use PMSF and PMSR profiles I've tried the --model partition.txt --brlen scaled syntax, with the line of the partition file for the i-th site reading as LG+FU{f1_i/.../f20_i}+BU{r_i}, where r_i stands for the precomputed site rate.
Could you, please, tell me, does this seem to be the correct way to include site-specific rates? Or are there any caveats?

The test run went well, with the resultant likelihood even better (expectedly) than for a LG+PMSF+R4 IQ-Tree run. One thing yet bothering me is the warning that I get about overparameterization, though, in fact, the only parameters in the aforementioned model RAxML is to estimate are the tree branch lengths (the rates seem to get only normalized by their mean, but not optimized in any way).


Best regards,
Stepan

Oleksiy Kozlov

unread,
Jun 13, 2024, 10:15:04 AMJun 13
to ra...@googlegroups.com
Dear Stepan,

> Since the implementation of PMSF profiles through partitioning, though running longer than expected,
> seems to work properly overall, I am interested in using the same scheme to also use precomputed
> posterior mean site-specific rates (PMSR).
>
> To jointly use PMSF and PMSR profiles I've tried the --model partition.txt --brlen scaled syntax,
> with the line of the partition file for the i-th site reading as LG+FU{f1_i/.../f20_i}+BU{r_i},
> where r_i stands for the precomputed site rate.
> Could you, please, tell me, does this seem to be the correct way to include site-specific rates? Or
> are there any caveats?

Yes, given that in the likelihood computation time is the product of branch length and per-site
rate, per-site branch scalers can indeed be (mis-)used as per-site rates.


> The test run went well, with the resultant likelihood even better (expectedly) than for a LG+PMSF+R4
> IQ-Tree run. One thing yet bothering me is the warning that I get about overparameterization,
> though, in fact, the only parameters in the aforementioned model RAxML is to estimate are the tree
> branch lengths (the rates seem to get only normalized by their mean, but not optimized in any way).

In this particular case, please ignore the warning. When computing the number of free parameters,
raxml-ng assumes that per-partition branch length scalers are optimized, which is pretty much always
the case in "regular" analyses.

Best,
Oleksiy
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>>>.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to
> > raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com>
> >
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
Jun 16, 2024, 4:34:33 PMJun 16
to raxml
Dear Oleksiy,

I am a little bit confused by the "can be (mis-)used" phrasing... Did you just mean that what I proposed is a case of very unusual usage of raxml or do you consider the whole approach with site-specific rates to be a bad idea? I understand that the approach is quite unorthodox, so any critique would be highly appreciated. Thank you!

Best,
Stepan

Oleksiy Kozlov

unread,
Jun 17, 2024, 5:17:02 AMJun 17
to ra...@googlegroups.com
Hi Stepan,

> I am a little bit confused by the "can be (mis-)used" phrasing... Did you just mean that what I
> proposed is a case of very unusual usage of raxml or do you consider the whole approach with
> site-specific rates to be a bad idea? I understand that the approach is quite unorthodox, so any
> critique would be highly appreciated. Thank you!

I only mean that this is indeed unusual usage of raxml, but it is valid as far as I can tell, so no
worries :)

Best,
Oleksiy
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>>>>.
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "raxml" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to
> > > raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> > > To view this discussion on the web visit
> > >
> >
> https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com>>
> > >
> >
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer>>>.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to
> > raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com>
> >
> <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/15ec1029-d14b-4177-958c-ffac1602b8d9n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/15ec1029-d14b-4177-958c-ffac1602b8d9n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
Jun 17, 2024, 5:45:56 PMJun 17
to raxml
Hi Oleksiy,
Ok, got it. It is quite a relief to read)

Of note: the description of the +BU{} option seems to be absent in the program's manual or help and, thus, might be worth adding there unless the absence is intended.

Great thanks for the helpful discussion!

Best,
Stepan

Reply all
Reply to author
Forward
0 new messages