parameter setup in PAML

544 views
Skip to first unread message

GLY

unread,
Aug 29, 2014, 4:24:02 PM8/29/14
to pamlso...@googlegroups.com
Dear All,

I plan to use codeml to test positve selection in PAML. Since PAML is sensitive to the model setups, I would like to ask about the setups of several parameters in PAML. It will be very grateful if you can give me some suggestions.

Here are the parameters I feel confused in codeml.ctl file.

1.  aaDist = 0 * 0: equal, +: geometricö -:linear, 1-6: G1974, Miyata, c, p, v, a

What is aaDist mean? Do we have to change the setup of aaDist in different models? How could we know which vaule should be used for aaDist? Or is there anyway to give me a resonable choice?

2. aaRatefile=dat/jones.dat *only used for aa seqs with model=empirical (F) *dahoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

Again... what is aaRatefile mean?

3. Mgene=0

I know that Mgene is used to specify the partition models, for example if you have the pre-knowledge of the sites. So if I don,t have the priori knowledge of the sites, then set Mgene=0, is that means variable selective rates across all of the sites (means the random-site model) ?

4. fix_alpha

This parameter decide the shape parameter alpha of the gamma distribution. I noticed that many people use fix_alpha=1 and alpha=0, which means single rate for all sites. I do not know how people decide whether the same rate should be used or different rate should be used. And if use (discrete-) gamma model, how to decide the numbers of categories (ncatG)??? How will be this factor affect the result of positive selection detection?


5. Malpha

And what is this parameter mean...? How to set this value?

6. ncatG

1) The setting of ncatG is really confusing... If I understand right, the ncatG is the number of the site categories of the gamma distribution. So only models related with sites have to consider this parameter, right? Like site models, branch-side models, and clade models. And the branch models will not consider about this, right?

2) PAML manual page 32 said that "the option variable ncatG is ignored when you specify brach site models A and B, and clade model C, since the number of cat egories is fixed in the model". So in these cases, my control file don,t need write ncatG, right?

3) And from the answer of Professor Yang in this google group, "for the site models, ncatG is set by the program, so you don,t have to change it or worry about it."

So above all, if I would like test positive selections using branch models, site models, branch-site models and clade models, actually I can ignore ncatG setting!!!! Is that true???

Actually in what situation should we consider about the settings of ncatG? And how could we decide the number of ncatG? I noticed that different example file in PAML sometimes have different ncatG .....

7. Small_Diff

I cannot find the meanning for small_diff in the manual... what is this mean and how will this affect the result?

8. fix_blength

I plan to run M0 model with the original tree without branch length by PAUP (old tree) on my data first, and then copy the branch length from M0 result into the old tree file to get a new tree file, and use this new tree file do other model analysis. So in this case, fix_blength should be set to 2, which means using the branch lengths to be fixed at the given tree file. ....???

9. method=0

Under the no clock models, will method=0 or method=1 have big effect on the result? From the manual, I found that method=0 means that PAML will update all parameters including branch lengths simultaneously. And method=1 means updating branch lengths one by one. Is there big differences for the setup for this parameter?

Looking forward to your reply!

Best,

GLY

Reply all
Reply to author
Forward
0 new messages