raxml/phylogeny with genotype likelihoods

560 views
Skip to first unread message

benjamin v

unread,
Nov 28, 2016, 9:16:19 AM11/28/16
to raxml
Hello all,

I'm trying to find a way to estimate a phylogeny given genotype likelihoods, rather than absolute base calls at the tips.  For example, for a site with 2 reads supporting A and 5 reads supporting G, you're less sure of the state than at a site with 10 and 10, respectively.  I see that you can use RAxML with SNP data, but it doesn't seem that you could directly incorporate the genotype likelihoods - although maybe by modifying the code to set likelihoods at the tips?  Similarly with other pipelines, like SNPhylo.  Maybe I'm missing something - does anyone have advice?

Thank you,
Benjamin Vernot

Alexey Kozlov

unread,
Nov 28, 2016, 9:42:24 AM11/28/16
to ra...@googlegroups.com
Hi Benjamin,

we are working on this right now, and this option will be available in the new RAxML, which will be hopefully released
in December (or early 2017). It will accept genotype likelihoods in VCF or in a simpler home-brew text format.

If you have any specific suggestions regarding input format or how you would model the likelihoods (e.g. given the read
counts), they are highly appreciated!

Best,
Alexey
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

benjamin v

unread,
Nov 28, 2016, 10:15:53 AM11/28/16
to raxml
we are working on this right now, and this option will be available in the new RAxML, which will be hopefully released
in December (or early 2017). It will accept genotype likelihoods in VCF or in a simpler home-brew text format.

This is a feature I would send chocolate for!
 
If you have any specific suggestions regarding input format or how you would model the likelihoods (e.g. given the read
counts), they are highly appreciated!

For me, it would be great if you could input something other than phred-scaled likelihoods (PL), since I want to take into account the differences between the sites.  Although I haven't thought about it a lot, so maybe PL would work.  I'm doing my own simple likelihood calculations, so I could really put it into any format.  What are you currently thinking?  I would be happy to test out code for you.

Thanks,
Benjamin

Alexey Kozlov

unread,
Nov 28, 2016, 11:08:03 AM11/28/16
to ra...@googlegroups.com
Hi Benjamin,

> we are working on this right now, and this option will be available in the new RAxML, which will be hopefully released
> in December (or early 2017). It will accept genotype likelihoods in VCF or in a simpler home-brew text format.

> This is a feature I would send chocolate for!

thanks, our postal address is easy to find :)

> For me, it would be great if you could input something other than phred-scaled likelihoods (PL), since I want to take
> into account the differences between the sites.

What kind of difference do you mean? If it's difference in evolution rates, we can use standard GAMMA/CAT model to model
this.

>Although I haven't thought about it a lot, so maybe PL would work. I'm
> doing my own simple likelihood calculations, so I could really put it into any format. What are you currently thinking?

That's definitely possible, we currently use the simple format like this:

4 11
Sample1 Sample2 Sample3 Sample4
YCCC 10,0,10,0 20,0,0,0 20,0,0,0 20,0,0,0
GGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20
AAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0
CCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0
AAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0
TTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0
CCCC 20,0,0,0 20,0,0,0 20,0,0,0 20,0,0,0
TTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0
TTTT 0,0,20,0 0,0,20,0 0,0,20,0 0,0,20,0
GGGG 0,0,0,20 0,0,0,20 0,0,0,20 0,0,0,20
AAAA 0,20,0,0 0,20,0,0 0,20,0,0 0,20,0,0

(here is an example for DNA data with 4 states and integer frequencies/counts)

> I would be happy to test out code for you.

Great, thanks! I will contact you as soon as we have a working pre-release version, presumably in December.

Best,
Alexey

benjamin v

unread,
Nov 28, 2016, 2:08:38 PM11/28/16
to raxml
So those counts are read counts?  It would be great to be able to directly input precomputed likelihoods, since some situations have non-binomial read count models, or different error rates for different sequence contexts.  e.g., RNAseq or ancient DNA.

wrt differences between sites, I simply mean that a site with 5 reads should have more ambiguity, and less corresponding weight, than a site with 20 reads.  I don't know that this can be achieved just by weighting the sites, but maybe.

benjamin v

unread,
Nov 28, 2016, 3:11:08 PM11/28/16
to raxml
Another question - you model heterozygous and homozygous sites at internal nodes?  So you can move from AA to AT, and AT to TT?

Alexey Kozlov

unread,
Nov 28, 2016, 4:45:02 PM11/28/16
to ra...@googlegroups.com
Hi Benjamin,

> Another question - you model heterozygous and homozygous sites at internal nodes? So you can move from AA to AT, and AT
> to TT?

So far we were experimenting with unphased genotype model with 10 states (e.g. AT = TA etc.), which obviously allows
such transitions.

> So those counts are read counts? It would be great to be able to directly input precomputed likelihoods, since some
> situations have non-binomial read count models, or different error rates for different sequence contexts. e.g.,
> RNAseq or ancient DNA.

Sure, it's also possible to specify the likelihoods instead of read counts.

> wrt differences between sites, I simply mean that a site with 5 reads should have more ambiguity, and less
> corresponding weight, than a site with 20 reads. I don't know that this can be achieved just by weighting the
> sites, but maybe.

Well, I'm still sure what's the best way to model it. One way would be to account for this when computing the
likelihoods, i.e. a site with 5 reads would get lower confidence (="flatter" likelihood distribution) then a site with
20 reads. Of course, we can also have explicit site weights, but the tricky part is how do we compute those weights?

Best,
Alexey

benjamin v

unread,
Nov 28, 2016, 5:01:14 PM11/28/16
to raxml
I think if the genotype likelihoods could be directly given, that would solve the problems of site weighting, at least for my case.  It's not like I actually want to downweight some sites, I just want the uncertainty to be taken into account.  I'm excited about this!  Definitely let me know when there's code to test.

Thanks,
Benjamin

Alexey Kozlov

unread,
Nov 28, 2016, 5:13:49 PM11/28/16
to ra...@googlegroups.com
OK, sure!
> <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

George Pacheco

unread,
Dec 8, 2016, 7:59:29 AM12/8/16
to raxml
Hello Alexey and Benjamin,

I am also very interested in this feature! Could I ask if you would have any news on that? I would also like to volunteer to test a Beta code if that would be desired.

Your best, George.

P.S. -- Also keen to post Danish chocolate :)

Alexandros Stamatakis

unread,
Dec 8, 2016, 2:47:06 PM12/8/16
to ra...@googlegroups.com
we hope to release it this year, there's basically just one component
missing,

alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

George Pacheco

unread,
Dec 9, 2016, 9:50:50 AM12/9/16
to raxml
Hej Alexis,

I am looking forward to it -- thanks very much for working on that!!

Your best, George.

George Pacheco

unread,
Jan 2, 2017, 9:08:31 AM1/2/17
to raxml
Hej all, 

First of all, Happy New Year to everybody -- I hope you guys have great celebrations!
Could I query if there would be any news on this front? I am sorry, but I have to work on a tree soon and I am very curious to see how it would turn out with this method.

Many thanks in advance, George.

Alexey Kozlov

unread,
Jan 3, 2017, 8:19:04 AM1/3/17
to ra...@googlegroups.com
Hi George,

sorry for the long silence and best New Year wishes from my side as well!

I will send the experimental code to you and Benjamin in the next days (there is just one bug that has to be fixed).

Public release still needs some work and will become available a bit later, presumably in February.

Best,
Alexey
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
> Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
> of Arizona at Tucson
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>

George Pacheco

unread,
Jan 3, 2017, 10:45:21 AM1/3/17
to raxml
Hej Alexey, 

That is great news! Please, do send it to me and I can try to run any test that you may wanna see run ;) 

Thanks a lot, George.

AC

unread,
Mar 21, 2017, 1:02:14 PM3/21/17
to raxml

Hi all,

I was just wandering whether there have been any updates on this feature? I would be very interested in using it as well!

Thanks,
AC

Alexandros Stamatakis

unread,
Mar 22, 2017, 4:17:41 PM3/22/17
to ra...@googlegroups.com
the RAxML re-design including this feature has been released this
Monday, please check the respective post in this group,

Alexis
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Alexey Kozlov

unread,
Mar 22, 2017, 7:25:34 PM3/22/17
to ra...@googlegroups.com
Hi Adam,

> the RAxML re-design including this feature has been released this Monday, please check the respective post in this group,

more precisely, with the new RAxML-NG you can read in a file with per-site nucleotide probabilities in CATG format (s.
description above), with a command line like:

raxml-ng --msa test.catg --prob-msa on --model GTR+G

(this feature isn't properly documented yet, sorry about this)

RAxML-NG is available here:

https://github.com/amkozlov/raxml-ng/releases/tag/0.1.0

Best,
Alexey

Camille Christe

unread,
Feb 11, 2022, 3:55:41 AM2/11/22
to raxml
Hello,

I would like to use the genotype likelihood to build a tree.
Following this discussion and others. It seems possible to do it with raxml-ng.

From the list of options in raxml-ng, vcf file or CATG could be used.

My questions are:
Is it really working with a vcf file ? I could not enter this format.
How it formatted the CATG format ? From another post I see that it can include the genotype likelihood but I cannot find any other description of this type of file. Is it correct that this option will only work for the last version of raxml-ng, that should be build from source?

Thank you very much for this very useful group.

camille

Alexey Kozlov

unread,
Feb 15, 2022, 8:34:34 AM2/15/22
to ra...@googlegroups.com
Hello Camille,

> My questions are:
> Is it really working with a vcf file ? I could not enter this format.

VCF support has not yet been integrated into the master branch of raxml-ng. You can either use cellphy:

https://github.com/amkozlov/cellphy

or compile the respective raxml-ng branch from source:

https://github.com/amkozlov/raxml-ng/tree/cellphy


> How it formatted the CATG format ? From another post I see that it can include the genotype
> likelihood but I cannot find any other description of this type of file.

it was described in my thesis, which is of course not very visible :)

so I just copied the description to the raxml-ng wiki here:

https://github.com/amkozlov/raxml-ng/wiki/Input-data#catg-file-format

>Is it correct that this
> option will only work for the last version of raxml-ng, that should be build from source?

yes, please use the latest version


Best,
Alexey
> <http://www.exelixis-lab.org>>
> >> <http://www.exelixis-lab.org <http://www.exelixis-lab.org>>
> >> >
> >> > --
> >> > You received this message because you are subscribed to the
> >> Google Groups "raxml" group.
> >> > To unsubscribe from this group and stop receiving emails from
> >> it, send an email to raxml+un...@googlegroups.com
> >> > <mailto:raxml+un...@googlegroups.com>.
> >> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>
> >> <https://groups.google.com/d/optout <https://groups.google.com/d/optout>>.
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "raxml" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> >> an email to raxml+un...@googlegroups.com
> >> <mailto:raxml+un...@googlegroups.com>.
> >> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> >
>
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/63275e84-b37e-4352-9b19-ed42918c85fan%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/63275e84-b37e-4352-9b19-ed42918c85fan%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages