strange site-likelihood results

38 views
Skip to first unread message

Christopher Desjardins

unread,
Nov 14, 2016, 12:52:39 PM11/14/16
to raxml
Hi,

I'm running into a problem generating site-likelihoods for 2 trees for input into CONSEL, where the likelihoods generated by -f G are very different even though the initial likelihoods when I generated the trees were very similar. To explain more carefully, first I generated an unconstrained tree:

raxmlHPC-PTHREADS-SSE3 -p 12345 -s fastas/CNAG_03465.fa -n CNAG_03465 -m GTRGAMMA

In the resulting RAxML info file it says:

Final GAMMA-based Score of best tree -3469.405072

Then a build a constrained tree like so:

raxmlHPC-PTHREADS-SSE3 -p 12345 -s fastas/CNAG_03465.fa -n CNAG_03465_contraint -m GTRGAMMA -g crypto_387_wref_constraints.nwk

The info file says:

Final GAMMA-based Score of best tree -3480.234513

However, when I cat the two trees together and run raxml like so:

cat RAxML_bestTree.CNAG_03465 RAxML_bestTree.CNAG_03465_contraint > CNAG_03465.tree_plus_constrained_tree.nwk

raxmlHPC-PTHREADS-SSE3 -p 12345 -f G -s fastas/CNAG_03465.fa -n CNAG_03465.tree_plus_contrained_tree_G -m GTRGAMMA -z CNAG_03465.tree_plus_constrained_tree.nwk

The info file says:

Found 2 trees in File CNAG_03465.tree_plus_constrained_tree.nwk

Tree 0: -3469.406815
Tree 1: -4020.782508

Can anyone tell me why tree1 -4020 and not -3480?

This was all done with version 8.2.4. I tried to replicate it with another version I had installed, 7.7.8, but -f G crashed with the following error while calculating the tree1 likelihood:

raxmlHPC-PTHREADS-SSE3: optimizeModel.c:2884: modOpt: Assertion `inputLikelihood == tr->likelihood' failed.
Aborted (core dumped)

One last note is that the dataset is for a single gene and does contain identical sequences, I don't know if that could affect things. However, I really need to leave them in because I know some of the conflict that I'm looking for is caused by these sequences being forced apart in the constrained tree.

Thanks for reading and any advice,
Chris

Alexey Kozlov

unread,
Nov 15, 2016, 4:22:37 AM11/15/16
to ra...@googlegroups.com
Hi Chris,

it could be that model/branch length optimization got stuck in a local optimum. In particular, model optimization for
"Tree 1" (constrained) will start from the values obtained for "Tree 0" - which might be biased.

In order test this, you could either flip the order of trees in the file, or try using "-f e" option on the constrained
tree alone:

raxmlHPC-PTHREADS-SSE3 -p 12345 -f e -s fastas/CNAG_03465.fa -n test -m GTRGAMMA -t RAxML_bestTree.CNAG_03465_contraint


Best,
Alexey
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Christopher Desjardins

unread,
Nov 15, 2016, 9:54:53 AM11/15/16
to raxml
Thanks Alexey,

I flipped the order of the trees, reran -f G, and it again optimized the first tree correctly and not the second:

Found 2 trees in File CNAG_03465.constrained_tree_plus_tree.nwk

Tree 0: -3480.223765
Tree 1: -4009.307557

I found I could replicate the same effect for both input orders using -f H and also with different input values of -p. As you said, this made me suspect the parameters for the two trees were very different, so I ran -f e on each tree:

unconstrained tree:

Final GAMMA  likelihood: -3469.406815

Number of free parameters for AIC-TEST(BR-LEN): 782
Number of free parameters for AIC-TEST(NO-BR-LEN): 9


Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA
alpha: 0.020000
Tree-Length: 0.055152
rate A <-> C: 2.689291
rate A <-> G: 5.842629
rate A <-> T: 0.596124
rate C <-> G: 1.770038
rate C <-> T: 8.026142
rate G <-> T: 1.000000

freq pi(A): 0.251200
freq pi(C): 0.251938
freq pi(G): 0.246768
freq pi(T): 0.250093

constrained tree:

Final GAMMA  likelihood: -3480.223765

Number of free parameters for AIC-TEST(BR-LEN): 782
Number of free parameters for AIC-TEST(NO-BR-LEN): 9


Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA
alpha: 0.020000
Tree-Length: 0.056412
rate A <-> C: 2.761557
rate A <-> G: 5.868927
rate A <-> T: 0.560134
rate C <-> G: 1.705717
rate C <-> T: 7.738993
rate G <-> T: 1.000000

freq pi(A): 0.251200
freq pi(C): 0.251938
freq pi(G): 0.246768
freq pi(T): 0.250093

Now to me those parameter sets look very similar and I'm surprised that there could be some untraversable gap between them. Looking at this, do you still think local optima is the problem?

Thanks so much for your help on this,
Chris

Alexey Kozlov

unread,
Nov 15, 2016, 10:17:02 AM11/15/16
to ra...@googlegroups.com
Hi Chris,

this does look pretty weird.

Could you please send your alignment and both trees to my e-mail?

Best,
Alexey
> <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.

Alexandros Stamatakis

unread,
Nov 21, 2016, 3:03:47 PM11/21/16
to ra...@googlegroups.com
An update: Alexey found a solution for this, we shall post a fix rather
soonish,

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org
Reply all
Reply to author
Forward
0 new messages