Dear Professor Yang,
I hope this message finds you well.
I am currently using PAML (codeml) to detect positive selection with Clade Model C, following the specification of model = 3 and NSsites = 2. My goal is to compare different clades in a phylogeny of 13 species of African mole-rats (family Bathyergidae), grouped into three clades of interest.
For the Clade Model C analysis, I used the following control file settings (excerpt below):
seqfile = /data/01/p1/user157/Bathyergidae-sociaty/2024812.newcactus/03.proteincoding-selected-analysis/01.getgene/09.onlyB.13.fa2paml/evm.model.ptg000001l.10.paml
treefile = /data/01/p1/user157/Bathyergidae-sociaty/2024812.newcactus/03.proteincoding-selected-analysis/01.getgene/12.1.paml/onlyB/00.tree/unroot.tree.tri
outfile = clade.tri.mlc * main result file name
noisy = 9 * 0,1,2,3,9: how much rubbish on the screen
verbose = 0 * 0: concise; 1: detailed, 2: too much
runmode = 0 * 0: user tree
seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs
CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
clock = 0 * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
* Model C: model = 3 NSsites = 2
* Model D: model = 3 NSsites = 3
model = 3 * 3 = clade models
NSsites = 2 * choose "2" or "3"
icode = 0 * 0:universal code; 1:mammalian mt; 2-10:see below
fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated
kappa = 2.5 * initial or fixed kappa
fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate
omega = 0.13579 * initial or fixed omega, for codons or codon-based AAs
fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha
alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)
ncatG = 3 * # of categories in dG of NSsites models
getSE = 0 * 0: don't want them, 1: want S.E.s of estimates
RateAncestor = 0 * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)
Small_Diff = .2e-6
cleandata = 0 * remove sites with ambiguity data (1:yes, 0:no)?
The tree is unrooted, and I manually specified the three clades as follows in the tree file:
(Bsu #1, ((Cho #2, ((((Fan, Fmi), Fdm), Fda), Fme)), Hgl), Gca #1);
For the null model (M2a_rel), I used the same sequence file and an unlabeled unrooted tree, with the following config file:
seqfile = /data/01/p1/user157/Bathyergidae-sociaty/2024812.newcactus/03.proteincoding-selected-analysis/01.getgene/09.onlyB.13.fa2paml/evm.model.ptg000001l.10.paml
treefile = /data/01/p1/user157/Bathyergidae-sociaty/2024812.newcactus/03.proteincoding-selected-analysis/01.getgene/12.1.paml/onlyB/00.tree/unroot.tree.raw
outfile = M2a_rel.mlc * Path to the output file
noisy = 3 * How much rubbish on the screen
verbose = 1 * More or less detailed report
seqtype = 1 * Data type
ndata = 1 * Number of data sets or loci
icode = 0 * Genetic code
cleandata = 0 * Remove sites with ambiguity data?
model = 0 * Models for ω varying across lineages
NSsites = 22 * Models for ω varying across sites
CodonFreq = 7 * Codon frequencies
estFreq = 0 * Use observed freqs or estimate freqs by ML
clock = 0 * Clock model
fix_omega = 0 * Estimate or fix omega
omega = 0.5 * Initial or fixed omega
The the two model seems to run normally. However, for all genes analyzed under Clade Model C, the output file clade.tri.mlc does not show any positively selected sites detected by BEB (Bayes Empirical Bayes), even though I expected some sites to be identified in the foreground clades.
I have attached an example output file (clade.tri.mlc and clade.raw.mlc(M2a_rel)) from one of the orthologs for several species.
Thank you very much for your time and for developing such a powerful tool.
Best regards,
Na Wan