newlik : log computation

38 views
Skip to first unread message

Laurent Guéguen

unread,
Oct 23, 2015, 6:05:13 AM10/23/15
to Bio++ Development Forum
Hi folks,

now it is possible to ask that likelihood computations are performed with logs.
So it is possible to get rid of the problems due to very large trees. 

Up to now, default computation is as before, but with option

useLog=yes

in the declaration of the phylolikelihoods (in BPPsuite syntax), the computation
on all nodes is done with likelihoods.

Actually I have programmed it so that the kind of computation at each node of the tree is independent,
and it is possible to mix regular and log computations (but I do not know if it could be of any use).

Cheers,
have fun with your big trees,
Laurent
 

jesse...@gmail.com

unread,
Feb 13, 2016, 12:23:10 AM2/13/16
to Bio++ Development Forum
Hi Laurent,

Not sure if I am doing something wrong, but I keep getting likelihoods of nan when I use the useLog option. These are for trees that work just fine without this option.

--Jesse

Laurent Guéguen

unread,
Feb 13, 2016, 5:39:28 PM2/13/16
to Bio++ Development Forum
Dear Jesse,

yes indeed I saw this bug too, but hopefully fixed it. You can try with the latest git version.

Tell me if you have still this pb.

Cheers,
Laurent

jesse...@gmail.com

unread,
Feb 14, 2016, 12:35:49 AM2/14/16
to Bio++ Development Forum
Hi Laurent,

Thanks for looking into this so quickly. I am still getting the nan error with useLog, so I don't think the problem was fixed by the latest version of newlik that I pulled with git.

Thanks,
Jesse

Laurent Guéguen

unread,
Feb 14, 2016, 2:24:24 PM2/14/16
to Bio++ Development Forum
Ok then,

could you send me your example, so that I fix it?

Cheers,
L

jesse...@gmail.com

unread,
Feb 14, 2016, 3:48:04 PM2/14/16
to Bio++ Development Forum
Hi Laurent,

I run the analysis on the files in the attached zip using: 
bppml param=newlik.bpp

On my computer, as soon as the optimization starts and output appears on the screen, it says: "Initial log likelihood.................: nan".

I am using bppml built from the latest pulls of the master branch of bpp-core and bpp-seq, and the latest pull of the newlik branch of bpp-phyl.

The output gives a valid numerical log likelihood when I remove the useLog=yes option.

Thanks,
Jesse

testbpp.zip

Laurent Guéguen

unread,
Feb 15, 2016, 6:35:09 AM2/15/16
to Bio++ Development Forum
Ok thanks.

I forgot to ask you if you updated bpp-core also? Because in my fixing I had to change
stuff in it.

L

jesse...@gmail.com

unread,
Feb 15, 2016, 10:03:40 AM2/15/16
to Bio++ Development Forum
Yes, I pull the most recent bpp-core. This is the one with the following commit message (from February 3):

470afc0 better Application String mgmt

Laurent Guéguen

unread,
Feb 15, 2016, 4:08:44 PM2/15/16
to Bio++ Development Forum
Hi Jesse,

it is fixed (until next bug). The reason was numerical approximation of transition probas on very short branches.
Some were negative (ie -2e-17), and log was not happy with it.

Please note that this option makes the computations much longer (because of many log of sums of exponentials
instead of plain sums). In the code,  it is possible to use log-likelihoods only on some branches
(such as the deeper ones where likelihoods are too small), but this option is not directly available from bppsuite.
If needed I can do it.

I admit that some computation with the log option can be fastened (such as storing some logs instead of
recomputing them).

Cheers,
Laurent


jesse...@gmail.com

unread,
Feb 16, 2016, 11:06:46 AM2/16/16
to Bio++ Development Forum
Hi Laurent,

Thanks so much -- this fixed the problem. And yes, it is a bit slower, but not dramatically so. At least for my tree, the run time increases only by about 30% with the useLog option.

Thanks,
Jesse

jesse...@gmail.com

unread,
Feb 17, 2016, 6:55:19 PM2/17/16
to Bio++ Development Forum
Hi Laurent,

Sorry to keep bothering you. But your latest commit to newlik (which handles branches less than a small constant) fixes the problems with useLog. But it creates a new problem -- using the old likelihood methods, the topology inference with NNI now fails to converge when the alignment contains highly similar sequences. Not sure if this is something to fix, or if the NNI tree inference will be totally removed or re-written in the newlik branch. But I thought you should be aware of this. My guess is the problem is that now different topologies with very short branches are equivalent, which somehow now causes problems with NNI topology search...

--Jesse

Laurent Guéguen

unread,
Feb 20, 2016, 1:03:33 PM2/20/16
to Bio++ Development Forum
Hi Jesse,

we can not hide you  anything! Yes indeed as I said a problem in log  computation was due to numerical issues, with negative transition probabitilies
instead of near null ones. I tried to fix this by saying that when a branch is shorter than 1e-6, the transition matrix is the identity. I thought that
it would not be a problem.  I fixed it in a less radical way. Tell me if you have still the problem.

About the implementation of NNI in newlik, it should be rewritten in newlik, but I really do not know when. I think (hope) this term does  I do not depend on me, since
personnally I do not see any light in my schedule for this.


Cheers,
L

jesse...@gmail.com

unread,
Mar 3, 2016, 9:56:30 PM3/3/16
to Bio++ Development Forum
Thanks Laurent. This fixed the problem beautifully.
Reply all
Reply to author
Forward
0 new messages