Dear Tae-Kun,
Thanks for your message! Please find below some suggestions that Ziheng has
given in previous instances:
* * *
The following code can be found in `tools.c`:
int H_end(double x0[], double x1[], double f0, double f1, double e1, double e2, int n)
/* Himmelblau termination rule.
return 1 for stop, 0 otherwise.
The stopping rule used in CODEML is apparently called
Himmelblau termination rule. Such a rule is based on the idea that changes in
the variable values should be small, and that the change in the objective
function should be small. To this end, the stopping rule used by default
in CODEML is probably too stringent (i.e., the small value used in the
algorithm is actually too small). If you want to change this setting, you can
open the file `codeml.c` and find the following lines:
if (com.method == 1)
j = minB(noisy > 2 ? frub : NULL, &lnL, x, xb, e, com.space);
else if (com.method == 3)
j = minB2(noisy > 2 ? frub : NULL, &lnL, x, xb, e, com.space);
else
j = ming2(noisy > 2 ? frub : NULL, &lnL, com.plfun, NULL, x, xb, com.space, e, np);
Once you locate the code above, please update it following the code below and recompile the program:
if (com.method == 1)
j = minB(noisy > 2 ? frub : NULL, &lnL, x, xb, e*10, com.space);
else if (com.method == 3)
j = minB2(noisy > 2 ? frub : NULL, &lnL, x, xb, e*10, com.space);
else
j = ming2(noisy > 2 ? frub : NULL, &lnL, com.plfun, NULL, x, xb, com.space, e, np);
In the code, `e` is assigned to `e=1e-8`, and so `e*10` will
be `1e-7`. Nevertheless, you may also want to try `e*100`, which could be good
as well. Please note that the stop rule uses both relative error and absolute
error at the same time. The idea follows something like
|x* - x| < ( |x| + 1) e
so that relative error is used when "x" is large
(much larger than 1) and absolute error is used if "x" is close to 0.
If you were to use site models such as M7 or M8 with many parameters to optimise (or in case other users here see this message and want to run M7 or M8 models with large datasets and the analyses are taking too long to finish), you can first estimate the branch lengths under M0 while fixing the tree topology of interest. Then use the same tree topology with fixed branch lengths (those estimated under M0) as initial values, but please make sure you add option `fix_blength = 1`in the control file to enable this property! Nevertheless, it seems that enabling `fix_blength = 1` can lead to long runs as the stopping rule is quite stringent, and so you can modify the source code as mentioned above, recompile the program, and run again. Some notes on options `method` and `fix_blength`:
- When using `method = 1`, the program optimises the branch lengths with other parameters (kappa, p, q, w) fixed, and then optimises the other parameters (using BFGS) with the branch lengths fixed. The iteration is very slow because the two sets of parameters are correlated, so that the algorithm zigzags forever. This process will however give you the MLEs under M7/M8. If you want to save time and get approximate values under M7/M8, you can use `fix_blength = 3`. When enabling this option, CODEML will assume that the M7/M8 branch lengths are proportional to the branch lengths in the tree (in other words, branch lengths under M0), so it estimates a scale factor c, together with parameters p, q, w, etc. in M7/M8. The iteration takes place under BFGS with 4 parameters for M7 (c, kappa, p, q) and 7 for M8 (c, kappa, prob, p, q, w_s). Following previous observations, the lnL may be higher when you set `fix_blength = 1` (i.e., the estimates are MLEs) than when you set `fix_blength = 3` (the estimates are approximated).
- In the PAML documentation, p. 26, you have a note regarding option `fix_blength`: "The value 3 (proportional) means that branch lengths will be proportional to those given in the tree file, and the proportionality factor is estimated by ML". Note that systematic tests never took place and it is hard to know when the approximation can be good or bad. Nevertheless, it is likely to be good when sequences are similar and/or when most sites are conserved with w < 1.
* * *
Hope you find these suggestions useful, and please let us know if updating and
recompiling the code following the suggestions above worked!
All the best,
Sandy