M1a/M2a and M7/M8 yield different result, proper interpretation

Bin He

unread,

Jan 18, 2022, 5:10:48 PM1/18/22

to PAML discussion group

Dear Ziheng and all,

I want to make sure that I interpret the PAML results properly, so any suggestions would be really appreciated!

I ran `codeml` for a codon alignment of 8 sequences, 975 sites. Alignment in general look good. I ran the site models M1a, M2a, M7 and M8 and below are the summarized result:

20220117 B8441 Hil1-8 PAML result summary.png

My main question is: M7/M8 comparison suggests a very small number of sites are under strong positive selection, while M1a/M2a suggest no positive selection (when CodonFreq=0/1). I've read some of the classic papers written by Ziheng and others on how to properly conduct such analyses, but I'm still uncertain about this here. Related to this, I'd like to understand what does the dN/dS estimates mean in the site model output -- when there are more than one site classes, is this the average dN/dS across all classes? And in particular, it is concerning or not that the M8 model shows a lot higher average dN/dS than the other models (again, for CodonFreq=0/1). And lastly, CodonFreq=2 does lead to results inconsistent with CodonFreq=0/1, which I suspect has to do with the small size of the dataset, leading to difficulty in estimating a F3x4 matrix accurately.

I attached the main result files from the CodonFreq=1/2 output, since 0/1 are very similar.

Many thanks in advance!

Bin He, Assistant Professor

Biology Department

University of Iowa

mlc-codonfreq2

mlc-codonfreq1

Bin He

unread,

Jan 18, 2022, 5:11:57 PM1/18/22

to PAML discussion group

Sorry for the small image of the table. Here is the full sized file. -- Bin

20220117 B8441 Hil1-8 PAML result summary.png

Ziheng

unread,

Feb 17, 2022, 12:15:36 PM2/17/22

to PAML discussion group

i suggest that you use CodonFreq = 2 rather than 0 or 1. it is rare for a gene to have the same frequency for each codon, or for the three codon positions to have the same base compositions. In your case lnL is much higher for CodonFreq = 2 anyway. you can discuss the effects of CodonFreq as a kind of sensitivity analysis, if you like, but the results for CodonFreq = 2 are likely to be more reliable than those under CodonFreq = 0 or 1.

second, some of w2 values are very large but they have large sampling errors, so the precise values may not be reliable. the test is more trustable. often the M1a-M2a comparison is more stringent than the M7/M8 comparison. I am not sure whether it is useful to look at the sites identified.

yes, the dN/dS value is indeed an average over all sites or site classes. under the sites models, it is not very important.