Discrepancy in likelihood values between PAML and Bio++ for M2 model

20 views
Skip to first unread message

Keren Halabi

unread,
Dec 2, 2018, 7:22:13 AM12/2/18
to Bio++ Usage Help Forum
Dear Bio++ team,

I tried to compare the likelihood computation in PAML and Bio++ with respect to the codon site model M2.  Since I don't know how to fix all the parameters in PAML, I let PAML infer the parameters and then set the values the execution gave in Bio++ parameters file.
The values of the parameters are as follows:

kappa=1
omega0=0.26662
omega2=3.12756
theta1=0.83970
theta2=0 (p1=0)
codon frequencies=F0
branch lengths=fixed

I used the small lysosome data available in your data examples in:
~./bppsuite/Examples/Data/lysozymeLarge.fasta
~./bppsuite/Examples/Data/lysozymeLarge.dnd

Please find attached the control file I used for PAML and the parameters file I used for BppML (a script to which I added printing of the initial likelihood computation before the optimization procedure).
The results I received are:

PAML log likelihood = -1046.930491
Bio++ log likelihood = -1071.4306767474

This means a 24.5 log likelihood units discrepancy. Do you have any idea what causes this?

I there is anything else I can provide, please let me know.

Many thanks!
Keren

codeml.ctl
param.bpp

Keren Halabi

unread,
Dec 3, 2018, 9:02:56 AM12/3/18
to Bio++ Usage Help Forum
Dear Bio++ team,

The issue was caused by an error on my side: I thought I fixed kappa and the branch lengths via PAML, but I actually didn't. After fixing these two issues (the fixed control file for PAML and parameters file for BppML.cpp are attached), I received similar results in both platforms:

LogL in PAML: -1069.823958
LogL in Bio++: -1069.82481049786

Thus, there is NO discrepancy between the two platforms with respect to M2 model likelihood computation.

I apologize for the mistake.
Keren
param.bpp
codeml.ctl

Julien Y. Dutheil

unread,
Dec 3, 2018, 4:20:13 PM12/3/18
to Bio++ Usage Help Forum
Dear Keren,

There is also a small difference regarding the way stop codon frequencies are distributed in Bio++ and PAML. To get the PAML behavior, you need to specify mgmtStopCodon=uniform in the frequencies specification (eg F3X4(mgmtStopCodon=uniform) ).

Cheers,

Julien.

Keren Halabi

unread,
Dec 5, 2018, 2:12:45 AM12/5/18
to Bio++ Usage Help Forum
Thank you, Julien!

I didn't notice that there was an option to determine the distribution of the frequencies of the stop codons in F3X4 estimation framework. Does this feature also allow correction for stop codons (CF3X4)?

Cheers,
Keren

Laurent Guéguen

unread,
Dec 6, 2018, 3:28:47 AM12/6/18
to Bio++ Usage Help Forum
Hi Keren,

we did not implement the CF3X4 correction, because in my point of view this correction is just a patch
to fix the bad estimates under the assumption of stationarity when this assumption is false. Since in bio++
we can infer non stationary modelling, this patch is useless.

By the way, after many data analysis, I have the experience that likelihood based tests always reject
stationarity, and that the estimates of parameters may be biased by this assumption when it is false.

Cheers,
Laurent


Reply all
Reply to author
Forward
0 new messages