Support for site-specific frequency profiles

93 views
Skip to first unread message

Stepan Puhov

unread,
Apr 20, 2024, 12:15:36 AM4/20/24
to raxml

Oleksiy Kozlov

unread,
Apr 29, 2024, 9:20:03 AM4/29/24
to ra...@googlegroups.com
Dear Stepan,

thanks for your feedback!

> Do you have any plans for adding support for user-defined site-specific AA-frequency profiles (like
> the PMSF profiles pre-calculated with IQ-Tree)? These seem to be a fast and effective alternative to
> complex +C10-60 mixture models, which could significantly improve tree inference accuracy while
> requiring almost no extra computational resources compared to the usual models.
> Similar request seems to have been raised by other users
> <https://groups.google.com/g/raxml/c/nUf_97lwgLc/m/UNALm7v7AAAJ> and, as I can learn from your
> previous post here <https://groups.google.com/g/raxml/c/R2FHM3tYyvU/m/N38NO9bMBAAJ>, you were
> planning to add the model already in 2019.

I know, PMSF has been on our TODO list for a long time, and still is...

> Meanwhile, from my understanding it seems that one could already use site-specific profiles with the
> current program version by defining a per-site partition model representing the original
> site-specific profile. The only trouble would come here for defining rates globally for the whole
> alignment.
> Having tried it this way, however, I see that for some reason the computation time of such
> single-site partition analysis with fixed rates is exceedingly high.

I briefly looked at PMSF, and it should be indeed possible to simulate it with fixed per-site
frequencies (LG+FU{freqs_siteN.txt}). Of course, you would need to estimate those frequency profiles
externally.

The computation time should not be exceedingly high, but there are a couple of caveats.
Could you please post full log files from these runs?

Best.
Oleksiy

>
> I hope that you'll add support for the site-profile model soon. Thank you!
>
>
> Best regards,
> Stepan
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
Apr 30, 2024, 12:02:48 PM4/30/24
to raxml
Dear Oleksiy,

Thanks for your reply!

The --model LG+FU{freqs_siteN.txt} syntax proposed in your reply doesn't work, for RAxML (as there is no PMSF support yet) cannot read the freqs_siteN.txt file specifying multiple frequency vectors.
One gets an error like:
ERROR: Invalid number of user frequencies specified: 5320
Number of frequencies must be equal to the number of states: 20
Context: LG+FU{freqs.txt}


What I had tried to do is to use the --model partition.txt --brlen linked syntax, where the partition.txt file would contain a per-site partition model (with the line for the i-th site as following: LG+FU{f1/f2/.../f20}, parti=i-i).
This approach worked out well concerning the final tree likelihood (equals to that of the IQ-Tree LG+PMSF run), however, it was running ca. 50 times longer than the IQ-Tree LG+PMSF run, the IQ-Tree LG+F run or the RAxML LG+F run. For an alignment of 53 seqs x 266 cols (190 PI cols) a run using 3 threads took:
RAxML PMSFpart   1001 s   
IQ-Tree LG+PMSF  23 s
RAxML LG+F       29 s
IQ-Tree LG+F     16 s

Below I attach the relevant benchmark.


Best regards,
Stepan
raxml_test_1.tar.gz

Oleksiy Kozlov

unread,
May 5, 2024, 3:11:48 PM5/5/24
to ra...@googlegroups.com
Dear Stepan,

thanks for benchmarking!

> What I had tried to do is to use the --model partition.txt --brlen linked syntax, where the
> partition.txt file would contain a per-site partition model (with the line for the i-th site as
> following: LG+FU{f1/f2/.../f20}, parti=i-i).

Yes, this is exactly what I had in mind.

> This approach worked out well concerning the final tree likelihood (equals to that of the IQ-Tree
> LG+PMSF run), however, it was running ca. 50 times longer than the IQ-Tree LG+PMSF run, the IQ-Tree
> LG+F run or the RAxML LG+F run.

After looking into it, I realized it's a bit more complicated. In raxml-ng, or more specifically in
libpll, we use a slightly different representation of CLVs:

https://cme.h-its.org/exelixis/pubs/dissAlexey.pdf#subsection.2.6.2

For a typical analysis with hundreds/thousands sites per partition, this should yield better (or at
least similar) performance. However, with per-site partitioning, explicit computation of P-matrix at
every site and branch becomes prohibitively expensive. So unfortunately, no quick workaround is
possible, and substantial changes to the likelihood computation kernels would be needed to implement
PMSF efficiently. This might actually be the reason why we postponed PMSF implementation 5 years ago...

Best,
Oleksiy
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
May 13, 2024, 8:50:31 PM5/13/24
to raxml
Dear  Oleksiy,

It was quite insightful for me to look through your thesis work, thank you for the link.
Though it is really sad that, in the end, PMSF model cannot be efficiently implemented in raxml-ng currently.

Thank you for your time and good luck in further developments!

Best,
Stepan

Stepan Puhov

unread,
Jun 12, 2024, 8:26:31 PM6/12/24
to raxml
Dear Oleksiy,

I have a little follow-up question:

Since the implementation of PMSF profiles through partitioning, though running longer than expected, seems to work properly overall, I am interested in using the same scheme to also use precomputed posterior mean site-specific rates (PMSR).

To jointly use PMSF and PMSR profiles I've tried the --model partition.txt --brlen scaled syntax, with the line of the partition file for the i-th site reading as LG+FU{f1_i/.../f20_i}+BU{r_i}, where r_i stands for the precomputed site rate.
Could you, please, tell me, does this seem to be the correct way to include site-specific rates? Or are there any caveats?

The test run went well, with the resultant likelihood even better (expectedly) than for a LG+PMSF+R4 IQ-Tree run. One thing yet bothering me is the warning that I get about overparameterization, though, in fact, the only parameters in the aforementioned model RAxML is to estimate are the tree branch lengths (the rates seem to get only normalized by their mean, but not optimized in any way).


Best regards,
Stepan

Oleksiy Kozlov

unread,
Jun 13, 2024, 10:15:04 AM6/13/24
to ra...@googlegroups.com
Dear Stepan,

> Since the implementation of PMSF profiles through partitioning, though running longer than expected,
> seems to work properly overall, I am interested in using the same scheme to also use precomputed
> posterior mean site-specific rates (PMSR).
>
> To jointly use PMSF and PMSR profiles I've tried the --model partition.txt --brlen scaled syntax,
> with the line of the partition file for the i-th site reading as LG+FU{f1_i/.../f20_i}+BU{r_i},
> where r_i stands for the precomputed site rate.
> Could you, please, tell me, does this seem to be the correct way to include site-specific rates? Or
> are there any caveats?

Yes, given that in the likelihood computation time is the product of branch length and per-site
rate, per-site branch scalers can indeed be (mis-)used as per-site rates.


> The test run went well, with the resultant likelihood even better (expectedly) than for a LG+PMSF+R4
> IQ-Tree run. One thing yet bothering me is the warning that I get about overparameterization,
> though, in fact, the only parameters in the aforementioned model RAxML is to estimate are the tree
> branch lengths (the rates seem to get only normalized by their mean, but not optimized in any way).

In this particular case, please ignore the warning. When computing the number of free parameters,
raxml-ng assumes that per-partition branch length scalers are optimized, which is pretty much always
the case in "regular" analyses.

Best,
Oleksiy
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>>>.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to
> > raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com>
> >
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
Jun 16, 2024, 4:34:33 PM6/16/24
to raxml
Dear Oleksiy,

I am a little bit confused by the "can be (mis-)used" phrasing... Did you just mean that what I proposed is a case of very unusual usage of raxml or do you consider the whole approach with site-specific rates to be a bad idea? I understand that the approach is quite unorthodox, so any critique would be highly appreciated. Thank you!

Best,
Stepan

Oleksiy Kozlov

unread,
Jun 17, 2024, 5:17:02 AM6/17/24
to ra...@googlegroups.com
Hi Stepan,

> I am a little bit confused by the "can be (mis-)used" phrasing... Did you just mean that what I
> proposed is a case of very unusual usage of raxml or do you consider the whole approach with
> site-specific rates to be a bad idea? I understand that the approach is quite unorthodox, so any
> critique would be highly appreciated. Thank you!

I only mean that this is indeed unusual usage of raxml, but it is valid as far as I can tell, so no
worries :)

Best,
Oleksiy
> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/3c02d706-a4f9-4062-9574-83aee64f4430n%40googlegroups.com?utm_medium=email&utm_source=footer>>>>.
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "raxml" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to
> > > raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> > > To view this discussion on the web visit
> > >
> >
> https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com>>
> > >
> >
> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer> <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/61c20eda-6d03-475d-a36d-d2736591cfe7n%40googlegroups.com?utm_medium=email&utm_source=footer>>>.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to
> > raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com>
> >
> <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/raxml/08aac54a-6587-4d3d-bb88-3780f0305a4fn%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/15ec1029-d14b-4177-958c-ffac1602b8d9n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/15ec1029-d14b-4177-958c-ffac1602b8d9n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Stepan Puhov

unread,
Jun 17, 2024, 5:45:56 PM6/17/24
to raxml
Hi Oleksiy,
Ok, got it. It is quite a relief to read)

Of note: the description of the +BU{} option seems to be absent in the program's manual or help and, thus, might be worth adding there unless the absence is intended.

Great thanks for the helpful discussion!

Best,
Stepan

Kenta Renard

unread,
Jan 1, 2026, 8:03:22 PM (21 hours ago) Jan 1
to raxml
Hi Stepan,

I came across this because I really want to implement empirical profile mixture models (or rather PMSF) in RAxML-NG. We are currently unable to do so but looking at what you managed to achieve here, I want to see if I can replicate it with my data. I managed to get it working without +BU to implement PMSR. I obtained PMSR from the .rate file in IQTREE and just mapped the per-site rates to the RAxML-NG partition file, but doing so I was unable to get it working. I got an error message: "ERROR: Failed to read partition file:
ERROR model initialization |ÿþP".

Can I ask how you managed to get it working? Just for reference I have 581 sites and 1925 taxa.

Best,
Kenta
Reply all
Reply to author
Forward
0 new messages