Estimate codon frequencies (F3X4 or F1X4) from alignment

430 views
Skip to first unread message

Yifei Huang

unread,
Oct 26, 2012, 2:32:52 PM10/26/12
to biopp-he...@googlegroups.com
Hi everyone,

I am trying to implement a customized codon model, in which F3X4 (or F1X4) is used to model the frequencies of codons. In paml, it seems the nucleotide frequencies are counted directly from the alignment and fixed in the step of maximizing likelihood (please correct me if I'm wrong). I have no idea how to implement such an counting strategy based on Bio++. I can leave the nucleotide frequencies as free parameters and estimate them in the step of maximizing likelihood, but it may waste a lot of computation time so I want to use the method in paml. Any help will be highly appreciated.

Best,

Yifei

--
Yifei Huang
Department of Biology
McMaster University

Laurent Guéguen

unread,
Oct 28, 2012, 1:56:15 AM10/28/12
to biopp-he...@googlegroups.com
Hi Yifei,

in bpp-phyl, in class Model/FrequenciesSet/CodonFrequenciesSet, there is a function getFrequenciesSetForCodons that from a short int input and an alphabet
returns the ad hoc FrequenciesSet  metching with F0, F1X4, F3X4 and F61. Once the frequenciesSet is built, you can use the method setFrequencies that computes the FrequenciesSet from a vector of frequencies, or setFrequenciesFromMap that computes de FrequenciesSet from a map  codon:count.

A+,
L

Julien Yann Dutheil

unread,
Oct 28, 2012, 4:24:16 PM10/28/12
to biopp-he...@googlegroups.com
Hi Yifei,

More generally, the strategy in Bio++ to use observed frequencies if
to set the corresponding parameter values from the given data set
(some methods exist for that purpose, as Laurent mentioned), and then
tell the optimizer to not estimate those parameters.

Best,

Julien.
> --
> You received this message because you are subscribed to the Google Groups
> "Bio++ Usage Help Forum" group.
> To post to this group, send email to biopp-he...@googlegroups.com.
> To unsubscribe from this group, send email to
> biopp-help-for...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/biopp-help-forum/-/1CDSAvllQm8J.
>
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



--
Julien Y. Dutheil, Ph-D
0 (+49) 6421 178 530

§ Max Planck Institute for Terrestrial Microbiology
Department of Organismic Interactions
Marburg -- GERMANY

§ Intitute of Evolutionary Sciences - Montpellier
University of Montpellier 2 -- FRANCE

Yifei Huang

unread,
Oct 29, 2012, 1:05:22 PM10/29/12
to biopp-he...@googlegroups.com
Hi Laurent and Julien,

Thanks a lot for your great help! I checked the function getFrequenciesSetForCodons. If I use the F3X4 model, it will return a CodonFromIndependentFrequenciesSet object. However, it seems the three sets of nucleotide frequencies (one for each nucleotide site in the codon) can only be changed in the constructor of CodonFromIndependentFrequenciesSet. I checked the class members of CodonFromIndependentFrequenciesSet but couldn't find anything relevant to nucleotide frequencies. The setFrequencies function mentioned by Laurent might not work properly for my propose because it accepts a vector of codon frequencies instead of three sets of nucleotide frequencies. Probably the only way to change codon frequencies after construction is to directly change the parameters (Codon1_Full.theta, Codon1_Full.theta1, Codon1_Full.theta2, et al).

Anway I solved my problem and got the same likelihood as paml. But I still have a few general very questions on optimization and, if you have time, can you please help me out? Thanks a lot. 

1) I am wondering how constraints are handled by newton-like optimizers. If my understanding is correct, in each step of optimization most of the newton-like algorithms firstly calculate a searching direction based on the gradient and Hessian and then perform a linear search. How is the searching direction determined if the constraint is reached?

2) Which algorithm is used to perform linear search in multiple dimensional optimizers?

3) I am not sure how to use MetaOptimizer. I think you mentioned somewhere that in MetaOptimizer we can use newtown-like optimizers firstly and then refine the result by derivative-free optimizers when we are close to the Maximum. This method might be faster because numerical instability of numerical differentiation might hurt the convergence. However, I don't know how to implement it. Could you please tell me where I can find an example?

Sorry about so many questions. I learnt a lot from you guys and the source code of Bio++. Without Bio++, it is almost impossible for me to develop my own models because of the very steep learning curve of numerical algorithms and the obscure code in most C based phylogenetic softwares. I appreciate your works a lot.

Best,

Yifei

Julien Yann Dutheil

unread,
Oct 29, 2012, 3:00:31 PM10/29/12
to biopp-he...@googlegroups.com
Hi Yifei,

I will let Laurent answer the codon part, I'll take the optimization one :)

On Mon, Oct 29, 2012 at 6:05 PM, Yifei Huang <huangy...@gmail.com> wrote:
> 1) I am wondering how constraints are handled by newton-like optimizers. If
> my understanding is correct, in each step of optimization most of the
> newton-like algorithms firstly calculate a searching direction based on the
> gradient and Hessian and then perform a linear search. How is the searching
> direction determined if the constraint is reached?

In all optimizers, there are two general way to handle constraints

1) the "push-it-to-the-bound" way (default): after each optimization
step, the optimizer proposes a new set of values. If some of these
values fall outside the allowed interval, the corresponding value is
set to the closest bound. The new set of parameters is then forwarded
to the function for evaluation.
2) the "parameter transform" way, which is the only
mathematically-correct way, yet slower in practice. This uses
parameter transforms to get a function defined on -inf,+inf.
>
> 2) Which algorithm is used to perform linear search in multiple dimensional
> optimizers?

The "direction" procedure is only used for Gradient functions (and
Powell's method). There it uses a Brent optimization procedure. The
newton method however does not use such linear search: it approximates
the local curve by a paraboloid, using the hessian matrix. Then it
gets the minimum of the paraboloid and uses it as a new point. Newton
algorithms are most efficient for function that are "paraboloid-like"
(it finds the minimum of a paraboloid function in one step only). In
practice, homogeneous likelihoods are very close to paraboloid curves,
hence the success of the method.

The "push-it-to-the-bound" method works well with newton. It has,
however, horrible results with direction functions, as the constraints
break the linearity of the transform.
>
> 3) I am not sure how to use MetaOptimizer. I think you mentioned somewhere
> that in MetaOptimizer we can use newtown-like optimizers firstly and then
> refine the result by derivative-free optimizers when we are close to the
> Maximum. This method might be faster because numerical instability of
> numerical differentiation might hurt the convergence. However, I don't know
> how to implement it. Could you please tell me where I can find an example?

A MetaOptimizer does not do this. What you suggest can be done by
calling two optimizers in a raw, the first one with a higher tolerance
(for instance 1e-4), and the second one with a finer one (for instance
1e-6).
What the MetaOptimizer does is to use different optimization procedure
for various set of parameters. Typically, a Newton optimizer for
parameters for which we have analytical derivatives, and a Brent
optimizer for parameters for which no analytical derivative is
available.
I don't think there is an example of use available. A MetaOptimizer is
used in OptimizationTools::optimizeNumericalParameters, maybe that can
help? The idea is that we have to tell the MetaOptimizer which
parameter should be optimized with what. This is achieved through a
class called "MetaOptimizerInfo", which has a addOptimizer method.
Once you have instanciated such an info class, you pass it to the
constructor of the MetaOptimizer together with the function.

Hope this helps a bit, do not hesitate if you have further questions.

>
> Sorry about so many questions. I learnt a lot from you guys and the source
> code of Bio++. Without Bio++, it is almost impossible for me to develop my
> own models because of the very steep learning curve of numerical algorithms
> and the obscure code in most C based phylogenetic softwares. I appreciate
> your works a lot.

This is good to know :)

Cheers,

Julien.

Yifei Huang

unread,
Oct 29, 2012, 4:25:18 PM10/29/12
to biopp-he...@googlegroups.com
Hi Julien,

Thank you so much for your very helpful reply! It makes much more sense to me now. So if we use conjugate-gradient instead of newton-like methods, it is better to transform parameters to remove constraints rather than push-it-to-the-bound, right? In addition, I think now I understand why Felsenstein-Churchill correction is used in the pseudoNewtonOptimizer. If the paraboloid approximation is not good, we may get a higher function value. In this case, the optimizer will simply be less aggressive and try half the step size until we get a lower function value. By the way, the idea of the Felsenstein-Churchill correction is from their HMM paper published in MBE in 1996, right?

It seems the Felsenstein-Churchill correction is only implemented in the pseudoNewtonOptimizer. Isn't it also useful for other newton-like optimizers, such as the BfgsMultiDimensions? I think this might be the most popular optimization algorithm if the second-order derivatives can not be analytically calculated.

Thanks again and happy Halloween in advance :)

Yifei

Julien Yann Dutheil

unread,
Oct 29, 2012, 4:44:33 PM10/29/12
to biopp-he...@googlegroups.com
Hi Yifei,

On Mon, Oct 29, 2012 at 9:25 PM, Yifei Huang <huangy...@gmail.com> wrote:
> Hi Julien,
>
> Thank you so much for your very helpful reply! It makes much more sense to
> me now. So if we use conjugate-gradient instead of newton-like methods, it
> is better to transform parameters to remove constraints rather than
> push-it-to-the-bound, right?

Yes!

In addition, I think now I understand why
> Felsenstein-Churchill correction is used in the pseudoNewtonOptimizer. If
> the paraboloid approximation is not good, we may get a higher function
> value. In this case, the optimizer will simply be less aggressive and try
> half the step size until we get a lower function value. By the way, the idea
> of the Felsenstein-Churchill correction is from their HMM paper published in
> MBE in 1996, right?

I have to admit I am not sure about that... I took that trick from
Nicolas Galtier's code in its NHML package, and at that time (almost
10 years ago :s), he told me (as far as I remember) that the original
idea was from Felsenstein and Churchill. But I never checked the exact
reference...

>
> It seems the Felsenstein-Churchill correction is only implemented in the
> pseudoNewtonOptimizer. Isn't it also useful for other newton-like
> optimizers, such as the BfgsMultiDimensions? I think this might be the most
> popular optimization algorithm if the second-order derivatives can not be
> analytically calculated.

I could be indeed. I am not sure, however, that it is required
there... The pseudo-Newton optimizer does not user the
cross-derivatives, therefore it might be that this type of issue does
not arise so often in real Newton(-like) optimization.
>
> Thanks again and happy Halloween in advance :)

Your welcome. Thanks to you for your continuous interest and patience
in using the libraries!

J.

Yifei Huang

unread,
Oct 29, 2012, 5:07:37 PM10/29/12
to biopp-he...@googlegroups.com
Hi Julien,

Thanks again. I just want to let you know that I rechecked the original paper by Felsenstein and Churchill (A Hidden Markov Model Approach to Variation Among Sites in Rate of Evolution) and found the following sentences.

"In DNAML from PHYLIP 4.0 these derivatives are used to estimate the branch length by use of the Newton-Raphson method. This is modified so that it always moves in an uphill direction; if it overshoots, points 1/2, 1/4, l/8 . . . of the way are tried successively until one finally results in an increase in the likelihood."

I guess this is the Felsenstein-Churchill correction used in Bio++. In addition, could you please let me know where the put-it-to-the-bound method for constraint optimization came from? I just want to learn more detail about constraint optimization in general.

Best,

Yifei

Julien Yann Dutheil

unread,
Oct 29, 2012, 5:21:57 PM10/29/12
to biopp-he...@googlegroups.com
Hi again Yifei,

On Mon, Oct 29, 2012 at 10:07 PM, Yifei Huang <huangy...@gmail.com> wrote:
> Hi Julien,
>
> Thanks again. I just want to let you know that I rechecked the original
> paper by Felsenstein and Churchill (A Hidden Markov Model Approach to
> Variation Among Sites in Rate of Evolution) and found the following
> sentences.
>
> "In DNAML from PHYLIP 4.0 these derivatives are used to estimate the branch
> length by use of the Newton-Raphson method. This is modified so that it
> always moves in an uphill direction; if it overshoots, points 1/2, 1/4, l/8
> . . . of the way are tried successively until one finally results in an
> increase in the likelihood."
>
Cool, that is very good to know!

> I guess this is the Felsenstein-Churchill correction used in Bio++. In
> addition, could you please let me know where the put-it-to-the-bound method
> for constraint optimization came from? I just want to learn more detail
> about constraint optimization in general.

It is something I just made up, most likely after discussing with
Nicolas Galtier... It probably also dates back to Felsenstein and
Phylip, but I am quite sure this is a rather common trick. I do not
know the computer science literature on optimization well enough to be
able to say more :s Maybe Laurent has more info on that?

Laurent Guéguen

unread,
Nov 5, 2012, 5:47:54 AM11/5/12
to biopp-he...@googlegroups.com
Hi Yifei,

yes, you are right about the way to set frequencies in CodonFromIndependentFS class. If you have only per phase frequencies, I think that using the parameters is the best way.

Cheers,
Laurent

Laurent Guéguen

unread,
Nov 5, 2012, 5:49:22 AM11/5/12
to biopp-he...@googlegroups.com
Hi again,

sorry, I do not know much more than this, I did not looked very deep into
this topic neither, I must admit.

A+,
L
Reply all
Reply to author
Forward
0 new messages