RAxML - substitutions per variant site?

M Lawlor

unread,

Mar 14, 2016, 7:19:08 AM3/14/16

to raxml

Hi all,

I have read through the documentation as well as a large chunk of posts in this group related to my query and couldn't find an answer to my question.

I ran a phylogenetic analysis in PhyML and then RAxML using the same data set with the same settings (so far as this was possible). Output is attached. The topology is comparable (agreeing almost exactly) however the scale bars are very different. I think this is because (from what I can tell) the scale bar in PhyML represents the number of substitutions per variant site while the RAxML scale bar represents the substitutions per site. It is possible that this is already an expected difference, I was wondering if this was the case? And in particular, if there is an option in RAxML so that branch length can be scaled by the proportion of variant sites in the alignment?

Thank you for your time.

Best Regards,

M

raxml_group.pdf

Alexandros Stamatakis

unread,

Mar 14, 2016, 8:27:18 AM3/14/16

to ra...@googlegroups.com

Hi,

> I have read through the documentation as well as a large chunk of posts
> in this group related to my query and couldn't find an answer to my
> question.
>
> I ran a phylogenetic analysis in PhyML and then RAxML using the same
> data set with the same settings (so far as this was possible). Output is
> attached. The topology is comparable (agreeing almost exactly) however
> the scale bars are very different. I think this is because (from what I
> can tell) the scale bar in PhyML represents the number of substitutions
> per variant site while the RAxML scale bar represents the substitutions
> per site.

Are you sure about the PHYML interpretation or is this just a guess?
I am pretty sure that RAxML shows substitutions per site. Can you
confirm that you are (i) running analyses under GTR+GAMMA+P-Invar (ii)
does your dataset have missing data per gene?

> It is possible that this is already an expected difference, I
> was wondering if this was the case?

I am not sure since I don't know what PHYML does, there could be many
other reasons for the observed difference.

> And in particular, if there is an
> option in RAxML so that branch length can be scaled by the proportion of
> variant sites in the alignment?

Unfortuntaley, not.

Alexis

>
> Thank you for your time.
>
> Best Regards,
> M
>

> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

M Lawlor

unread,

Mar 14, 2016, 11:25:37 AM3/14/16

to raxml

Hi Alexis,

Thank you for getting back to me so quickly! No, I'm not sure about the PhyML interpretation. I know what the average substitutions should be along each branch for our data. RAxML is definitely showing substitutions per site. The branch lengths shown lead to the expected number of substitutions. However the PhyML branch lengths lead to the expected number of substitutions only when you account for the invariant sites. This is the reason I believer there to be scaling.

I am using the GTRGAMMA model in RAxML and yes, the GTR+GAMMA+P-Invar model in PhyML and the dataset contains no missing data. Are there any further settings in RAxML I should be using in order the compare the trees more reliably?

Ah, it's a pity about the scaling but thank you anyway!

Best,

M

Alexandros Stamatakis

unread,

Mar 14, 2016, 5:13:23 PM3/14/16

to ra...@googlegroups.com

well that makes it clear, you are comparing br-lens obtained under
GTR+GAMMA under RAxML with br-lens obtained with PHYML under
GTR+GAMMA+P-Invar, you can't just compare br-lens among different models
and expect them to be similar, just re-run PHYML under GTR+GAMMA or
RAxML under GTR+GAMMA+P-Invar and I am sure you will so the differences
vanish, it's now that adding +P-Invar mainly affects (usually shrinks)
the br-lens,

alexis

M Lawlor

unread,

Mar 16, 2016, 7:18:41 AM3/16/16

to raxml

Hi Alexis,

Thank you again for your response and your suggestions!

Yes you are correct and it looks like I should indeed be using the GTRGAMMAI model. Thank you for this.

However, while the differences vanish in the 'test case' I was using (approx. 20 taxa, not depicted in the original attachment). The problem with the scaling still persists with the full set of taxa (approx. 1050). See attachment. In this case the PhyML tree was run using GTR+GAMMA+P-invar and the RAxML tree was created using GTRGAMMAI. While the likelihoods associated with each of the trees are much closer now, the scaling still appears to be different. I have attached the stats/run info for each of the trees if you're interested. It is possible that it is still some model/parameter specification in RAxML that I am not using correctly?

I would really like to understand this scaling issue as it will be important for later validations using RAxML and PhyML.

Any input or feedback would be appreciated. Thanks again!

On Monday, March 14, 2016 at 11:19:08 AM UTC, M Lawlor wrote:

raxml_group_2.pdf

RAxML_info.T10.txt

phyml_info.T10.txt

Alexandros Stamatakis

unread,

Mar 17, 2016, 6:44:09 AM3/17/16

to ra...@googlegroups.com

> Thank you again for your response and your suggestions!

:-)

> Yes you are correct and it looks like I should indeed be using the
> GTRGAMMAI model. Thank you for this.
>
> However, while the differences vanish in the 'test case' I was using
> (approx. 20 taxa, not depicted in the original attachment). The problem
> with the scaling still persists with the full set of taxa (approx.
> 1050). See attachment. In this case the PhyML tree was run using
> GTR+GAMMA+P-invar and the RAxML tree was created using GTRGAMMAI. While
> the likelihoods associated with each of the trees are much closer now,
> the scaling still appears to be different. I have attached the stats/run
> info for each of the trees if you're interested. It is possible that it
> is still some model/parameter specification in RAxML that I am not using
> correctly?

No, I believe the runs are comparable now, but the first thing that
needs to be done is to quantify the br-len differences. In fact, could
you maybe send me the alignment file and tree such that I can check what
is happening?

Alexis

>
> I would really like to understand this scaling issue as it will be
> important for later validations using RAxML and PhyML.
>
> Any input or feedback would be appreciated. Thanks again!
> On Monday, March 14, 2016 at 11:19:08 AM UTC, M Lawlor wrote:
>
> Hi all,
>
> I have read through the documentation as well as a large chunk of
> posts in this group related to my query and couldn't find an answer
> to my question.
>
> I ran a phylogenetic analysis in PhyML and then RAxML using the same
> data set with the same settings (so far as this was possible).
> Output is attached. The topology is comparable (agreeing almost
> exactly) however the scale bars are very different. I think this is
> because (from what I can tell) the scale bar in PhyML represents the
> number of substitutions per variant site while the RAxML scale bar
> represents the substitutions per site. It is possible that this is
> already an expected difference, I was wondering if this was the
> case? And in particular, if there is an option in RAxML so that
> branch length can be scaled by the proportion of variant sites in
> the alignment?
>
> Thank you for your time.
>
> Best Regards,
> M
>

Alexandros Stamatakis

unread,

Mar 17, 2016, 11:50:38 AM3/17/16

to ra...@googlegroups.com

so this issue is closed, since it is apparently due to a bug in an older
version of phyml that Maire was using

alexis

Reply all

Reply to author

Forward