Estimating branch lengths from specific substitution type

Adrian Baez-Ortega

unread,

Dec 8, 2017, 1:33:00 PM12/8/17

to raxml

Hello,

I'd like to briefly ask a question about the possibility of estimating branch lengths from a specific substitution type.

My tree is made from CDS alignments containing the somatic SNVs found in ~500 whole-exome samples from a single contagious cancer lineage – slightly unusual.

Because the mutations are somatic, they accumulated largely independently of biological time, so the branch lengths in my tree are wrong, even if the topology is right. To sove this, I should re-estimate the branch lengths using a specific type of mutation that is known to be caused by a clock-like process: C>T transitions at NpCpG trinucleotides. I reckon that I may be able to do this by giving RAxML my tree and a modified alignment, containing only mutations of this type.

However, this introduces some problems. First, since we are excluding a huge number of mutations from the alignment, the estimation of invariant sites proportion will be way off, but I'm not sure how this affects branch length estimation. Second, since we only have C>T changes, it's not possible to estimate the GTR parameters.

I think there would be two solutions to the second problem. One would be to provide RAxML with the GTR parameter estimates from when I originally inferred the tree. However, I don't think I have them. Another way would be to choose a model which doesn't distinguish between mutation types, like Jukes-Cantor (JC).

In addition, it would be desirable to enforce the condition that all the tips of the tree should be roughly on the same level, since the samples were collected over a period of a few years, while the tree goes back some thousands of years. This is not what we are seen now (since the lengths are wrong), but I wonder if the approach described above will be enough to achieve this, or if there is some way to manually constrain this in RAxML.

This said, my questions are:
Is it possible at all to do this kind of length estimation on RAxML under these conditions?
Is it enough running "-f e" with a JC model, the original tree and the modified alignment?
Is it possible to enforce that all the tips should be (approximately) aligned in the tree?
What do you think of the problems introduced by the approach? Can you think of other obvious downsides?

Thank you very much in advance! :-)

Best,
Adrian

Alexandros Stamatakis

unread,

Dec 9, 2017, 8:25:59 AM12/9/17

to ra...@googlegroups.com

Dear Adrian,

First of all, we are working on dedicated cancer cell evolution models
that should become available in RAxML-NG some time in the first half of
2018 I hope.

> I'd like to briefly ask a question about the possibility of estimating
> branch lengths from a specific substitution type.
>
> My tree is made from CDS alignments containing the somatic SNVs found in
> ~500 whole-exome samples from a single contagious cancer lineage –
> slightly unusual.
>
> Because the mutations are somatic, they accumulated largely
> independently of biological time, so the branch lengths in my tree are
> wrong, even if the topology is right. To sove this, I should re-estimate
> the branch lengths using a specific type of mutation that is known to be

> caused by a clock-like process: C>T transitions at Np_C_pG

> trinucleotides. I reckon that I may be able to do this by giving RAxML
> my tree and a modified alignment, containing only mutations of this type.
>
> However, this introduces some problems. First, since we are excluding a
> huge number of mutations from the alignment, the estimation of invariant
> sites proportion will be way off, but I'm not sure how this affects
> branch length estimation. > Second, since we only have C>T changes, it's
> not possible to estimate the GTR parameters.

You could use a binary data matrix and correct for all the sites you
removed (that do affect branch lengths) via ascertainment bias
correction (see here: https://www.ncbi.nlm.nih.gov/pubmed/26227865).

> I think there would be two solutions to the second problem. One would be
> to provide RAxML with the GTR parameter estimates from when I originally
> inferred the tree. However, I don't think I have them.

RAxML does print them to the info file or you can just re-estimated them
with the -f e option.

> Another way would
> be to choose a model which doesn't distinguish between mutation types,
> like Jukes-Cantor (JC).

Yes, but as you are only interested in C <-> T you might as well use a
binary data matrix.

> In addition, it would be desirable to enforce the condition that all the
> tips of the tree should be roughly on the same level, since the samples
> were collected over a period of a few years, while the tree goes back
> some thousands of years. This is not what we are seen now (since the
> lengths are wrong), but I wonder if the approach described above will be
> enough to achieve this, or if there is some way to manually constrain
> this in RAxML.

No, this is done bye divergence time estimation programs or tools that
take into account some sort of molecular clock, i.e., by the
mathematical model that RAxML implements, it's not possible to achieve
this.

> This said, my questions are:
> Is it possible at all to do this kind of length estimation on RAxML
> under these conditions?

It's difficult I think.

> Is it enough running "-f e" with a JC model, the original tree and the
> modified alignment?

Yes, if you think the tree topology is correct that's sufficient.

> Is it possible to enforce that all the tips should be (approximately)
> aligned in the tree?

No.

> What do you think of the problems introduced by the approach? Can you
> think of other obvious downsides?

See some of my answers above, hope that helps a bit.

Alexis

>
>
> Thank you very much in advance! :-)
>
> Best,
> Adrian
>

> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

Adrian Baez-Ortega

unread,

Dec 9, 2017, 9:45:03 AM12/9/17

to raxml

Dear Alexis,

Thanks so much for your answers!

So the simplest way to go would be running "-f e" with a JC model, the original tree and the modified alignment (or a binary matrix with asc. bias correction), but it sounds like it may not be something we should use the models in RAxML for. I will consider whether to try this or to switch to BEAST, because we do want to estimate divergence times and confidence/credible intervals. I'll let you know if I have more questions about this.

Thanks again for your time.

Best,
Adrian

Alexandros Stamatakis

unread,

Dec 9, 2017, 2:04:57 PM12/9/17

to ra...@googlegroups.com

Dear Adrian,

> Thanks so much for your answers!

:-)

> So the simplest way to go would be running "-f e" with a JC model, the
> original tree and the modified alignment (or a binary matrix with asc.
> bias correction),

Yes, but depending how you set up the JC model matrix you should also
use asc. bias correction.

> but it sounds like it may not be something we should
> use the models in RAxML for.

Not really.

> I will consider whether to try this or to
> switch to BEAST,

That would make sense, you'd nonetheless have the RAxML tree to cross
check of the underlying tree topologies are similar.

> because we do want to estimate divergence times and
> confidence/credible intervals. I'll let you know if I have more
> questions about this.
>
> Thanks again for your time.

:-)

Alexis

> <https://www.ncbi.nlm.nih.gov/pubmed/26227865>).

> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.

> > For more options, visit https://groups.google.com/d/optout

> <https://groups.google.com/d/optout>.

>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>

> www.exelixis-lab.org <http://www.exelixis-lab.org>

Reply all

Reply to author

Forward