Three questions regarding the interpretation and meaning of branch lengths

1,454 views
Skip to first unread message

Christian A

unread,
Jan 4, 2012, 5:50:26 PM1/4/12
to raxml
Hi there,

In the past, I used Bayesian methods for tree inference, and I now,
for the first time, use RAxML. I would appreciate if anyone can answer
the following questions for me:

1) Does RAxML, similar to most other ML and Bayesian tree inference
programs, also provide branch lengths as expected substitutions per
site? For example, consider an alignment of length 100, and RAxML
estimates a particular branch length to be 0.1. If I wanted to know
the total number of substitutions on that branch, it should be a valid
procedure to multiply 0.1 by 100, shouldn't it?

2) For a particular sequence, how are sites with a gap or missing
information handled? How do they influence the branch lengths? For
example, if 50 out of the 100 sites are missing information for a
particular sequence, will the missing sites be assumed to be identical
with the ancestral state, or how exactly does it work? I am asking
because in my case, the outgroup sequence has a 70 sites gap at the
end of the alignment (and none of the other sequences have any gaps),
and I want to understand how that may influence the branch lengths.

3) I read in this forum that it may be better to not specify an
outgroup; instead, the trees can be rooted using a common tree drawer
program such as FigTree. Will that decision haver any influence on the
branch lengths, in particular for the outgroup sequence (I read a post
where someone said this is or was a known bug)?


Thanks for your time, your help will be greatly appreciated!

Best
Christian

Alexis

unread,
Jan 5, 2012, 3:49:11 AM1/5/12
to raxml
> In the past, I used Bayesian methods for tree inference, and I now,
> for the first time, use RAxML. I would appreciate if anyone can answer
> the following questions for me:
>
> 1) Does RAxML, similar to most other ML and Bayesian tree inference
> programs, also provide branch lengths as expected substitutions per
> site?

as mean number of substitutions per site.

> For example, consider an alignment of length 100, and RAxML
> estimates a particular branch length to be 0.1. If I wanted to know
> the total number of substitutions on that branch, it should be a valid
> procedure to multiply 0.1 by 100, shouldn't it?

yes that seems okay.

> 2) For a particular sequence, how are sites with a gap or missing
> information handled?

This has already been answered in a pervious thread, all gap or
missing data
is treated as undetermined chqaracter.

> How do they influence the branch lengths? For
> example, if 50 out of the 100 sites are missing information for a
> particular sequence, will the missing sites be assumed to be identical
> with the ancestral state, or how exactly does it work? I am asking
> because in my case, the outgroup sequence has a 70 sites gap at the
> end of the alignment (and none of the other sequences have any gaps),
> and I want to understand how that may influence the branch lengths.

This will mainly increase the branch length of the outgroup, since the
missing part
that is treated as undetermined characters will appear to be very
distant from anything else.

> 3) I read in this forum that it may be better to not specify an
> outgroup; instead, the trees can be rooted using a common tree drawer
> program such as FigTree. Will that decision haver any influence on the
> branch lengths, in particular for the outgroup sequence (I read a post
> where someone said this is or was a known bug)?

There was a bug allright, but in general and in principle, the
specification
of an outgroup is just a drawing option, hence whether you specify and
outgroup or not does
not have any influence on the computations or the model per se.

Alexis

Christian A

unread,
Jan 5, 2012, 4:05:33 AM1/5/12
to raxml
Hi Alexis,

thanks for the quick and helpful answer!

Two quick follow-up questions:

1) So the branch lengths will be increased due to the missing data in
the outgroup. Hmm, that is a problem, indeed. Do you see any
workaround for this (except deleting the last 70 sites in the
alignment altogether) to still somewhat reliably estimate the total
number of substitutions near the root of the tree where I think the
branch lengths are most affected by this selective missing character
bias?

2) In which version was the bug with the outgroup fixed? I didn't find
any version history file where the changes are documented. Currently,
the cluster system where I run RAxML uses version 7.0.4, and I am not
sure if there are any important bug fixes or changes since that (i.e.,
until version 7.2.8).


Thanks in advance, I appreciate your help!! One satisfied RAxML user
more :-)

Christian

Alexis

unread,
Jan 5, 2012, 4:45:34 AM1/5/12
to raxml


On 5 Jan., 11:05, Christian A <chrarnol...@googlemail.com> wrote:
> Hi Alexis,
>
> thanks for the quick and helpful answer!
>
> Two quick follow-up questions:
>
> 1) So the branch lengths will be increased due to the missing data in
> the outgroup. Hmm, that is a problem, indeed. Do you see any
> workaround for this (except deleting the last 70 sites in the
> alignment altogether) to still somewhat reliably estimate the total
> number of substitutions near the root of the tree where I think the
> branch lengths are most affected by this selective missing character
> bias?

Well if the data is not available it is just not available, so you
can't estimate anything reasonable anyway,
you may try to just estimate branch lengths on a fixed tree (e.g. the
ML tree) with a pruned alignment from which you have removed
the part where the outgroup has missing data and then extrapolate from
there.

> 2) In which version was the bug with the outgroup fixed? I didn't find
> any version history file where the changes are documented. Currently,
> the cluster system where I run RAxML uses version 7.0.4, and I am not
> sure if there are any important bug fixes or changes since that (i.e.,
> until version  7.2.8).

Yes, there have been many important bug fixes and performance
improvements, in particular with the
introduction of SSE3 vector instructions. You should definitely use
the latest GIT version of RAxML or tell
your sys-admins to install it.

> Thanks in advance, I appreciate your help!! One satisfied RAxML user
> more :-)

:-)

Alexis

Yiyuan Li

unread,
Apr 11, 2017, 3:26:04 PM4/11/17
to raxml
Hi Alexis and all
I also have a question for the meaning of branch length. And I found this old post a long time ago. It's really helpful to me. 

My issue is that made a phylogenetic tree based on amino acid sequences. And the branch length from sppA to sppB is 2.19.Does that mean between sppA and sppB, the average expected substitutions per amino acid site is 2.19? That seems a really high rate to me. Or does that ratio means 2.19 substitutions per 100 amino acid sites? 

Thank you so much for any advice
 
YY

Alexandros Stamatakis

unread,
Apr 11, 2017, 3:59:58 PM4/11/17
to ra...@googlegroups.com


On 11.04.2017 22:26, Yiyuan Li wrote:
> Hi Alexis and all
> I also have a question for the meaning of branch length. And I found
> this old post a long time ago. It's really helpful to me.
>
> My issue is that made a phylogenetic tree based on amino acid sequences.
> And the branch length from sppA to sppB is 2.19.Does that mean between
> sppA and sppB, the average expected substitutions per amino acid site is
> 2.19? That seems a really high rate to me.

yes it does ...

> Or does that ratio means 2.19
> substitutions per 100 amino acid sites?

it's per site, presumably this branch is very long, maybe you'll get
shorter branches if you use a different protein substitution model, did
you test for all possible models?

alexis
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Yiyuan Li

unread,
Apr 12, 2017, 11:14:32 AM4/12/17
to raxml
Hi Alexis
My data is 13 concatenated mitochondrial genes from 32 holometabolous insect species from four orders. I'm using the branch length from RAxML to test amino acid substitution rate of these genes. To compare the substitution rate between species, I calculate the branch length from each species to the outgroup (pea aphid). I was using PROTGAMMAAUTO, which picked the MTART as the best model. The median of branch length for Hymenoptera species is 2.12. I also tried the PROTGAMMAILGX model, which gave me similar and sometimes longer branch length. 

Are there other models that I should try? 

Thank you

YY

Yiyuan Li

unread,
Apr 12, 2017, 11:20:25 AM4/12/17
to raxml
I attached here the best tree I got from RAxML if that would help. I was using the following command: 

raxmlHPC-PTHREADS -f a -m PROTGAMMAAUTO -p 12345 -x 12345 -\# 100 -s Concatenated.aa.fas -n MT_R1 -o Acyrthosiphon_pisum -T 8

YY
MT.170411.nwk.txt

Alexandros Stamatakis

unread,
Apr 13, 2017, 12:36:05 AM4/13/17
to ra...@googlegroups.com


On 12.04.2017 18:14, Yiyuan Li wrote:
> Hi Alexis
> My data is 13 concatenated mitochondrial genes from 32 holometabolous
> insect species from four orders. I'm using the branch length from RAxML
> to test amino acid substitution rate of these genes. To compare the
> substitution rate between species, I calculate the branch length from
> each species to the outgroup (pea aphid). I was using PROTGAMMAAUTO,
> which picked the MTART as the best model. The median of branch length
> for Hymenoptera species is 2.12. I also tried the PROTGAMMAILGX model,
> which gave me similar and sometimes longer branch length.
>
> Are there other models that I should try?

no, apparently you are already selecting among several substitution
models, do you have a lot of missing data in your alignment (see here
for a potential solution:
https://academic.oup.com/bioinformatics/article/32/9/1331/1744346/Prediction-of-missing-sequences-and-branch-lengths)

alexis
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
> Adjunct Professor, Dept. of Ecology and Evolutionary Biology,
> University
> of Arizona at Tucson
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>

Alexandros Stamatakis

unread,
Apr 13, 2017, 12:36:42 AM4/13/17
to ra...@googlegroups.com


On 12.04.2017 18:20, Yiyuan Li wrote:
> I attached here the best tree I got from RAxML if that would help. I was
> using the following command:
>
> raxmlHPC-PTHREADS -f a -m PROTGAMMAAUTO -p 12345 -x 12345 -\# 100 -s
> Concatenated.aa.fas -n MT_R1 -o Acyrthosiphon_pisum -T 8


the command line looks okay,

alexis
> <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of
> Technology
> Adjunct Professor, Dept. of Ecology and Evolutionary Biology,
> University
> of Arizona at Tucson
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>

Yiyuan Li

unread,
Apr 13, 2017, 12:51:41 AM4/13/17
to raxml
Hi Alexis
Thank you for the reply. I just checked the proportion of gaps in the alignment. The proportion of gaps varies from 0.3% to 40% with the average proportion of gaps = 12%. I'll check the paper you shared with me and see if that method will improve the long branch length issue. 

Thank you

YY
Reply all
Reply to author
Forward
0 new messages