Hi Laurent,
Thank you for the clarification regarding the normalization!
I do have 2 follow-up questions regarding your explanation:
1) According to my understanding, the normalization should achieve the goal of having the sum of diagonal in a matrix be 1. However, when creating a simple YN98(kappa=1,omega=0.1,frequencies=F0) model, I see that the sum of the diagonal is -58.816397228637342, and it is not complemented to 1 by some reciprocal value in data member rate_. Thus, I am not sure what the normalization achieved in this case.
In practice, when I infer the parameters with YNGP_M2 based on data simulated in the above manner, I receive relatively poor results.
If I understood you correctly, by creating 3 independent YN98 models during the simulation, I only normalize, but don't homogenize. So, When I infer the parameters with YNGP_M2 (which homogenizes in addition to normalization), It may come as no surprise that the inferred values vary from the simulated ones.
Could this possibly explain why I see relatively poor inference of the omega parameters when inferring with a mixture model parameters based on simulated data?
As for the discrepancy issue between site models and branch-site models, I did as you suggested with a minor addition:
1) executed likelihood computation using F1X4 with 123_Full.theta=0.2, 123_Full.theta1=0.3, 123_Full.theta2=0.8
2) executed likelihood computation using F1X4 with 123_Full.theta=0.3333333333333, 123_Full.theta1=0.3333333333333, 123_Full.theta2=0.3333333333333
I executed the two computations on a simpler and smaller data, as you suggested.
In the first case I received the same log likelihood in both computations (that is, branch-site model with two copies of the same model, and a single site model).
In the second case I received discrepancy of 0.00043952825 log likelihood units. This may seem minor, but on a larger, less simple dataset, the discrepancy grows into 0.059210998. Naturally, the more parameters require accuracy, and the larger the data, the greater the discrepancy is.
Please find attached the tester, the data and the results.
Many thanks again!
Keren