Thanks for taking time reading my question.
I have questions about total number of sites (the number of N + the number of S) in the result of codeml.
1)
For example, when I put 20 sequences in codeml,
pairwise comparison (Goldman & Yang 1994)
seq seq N S dN dS dN/dS Paras.
2 1 22322.5 6885.5 0.0013 0.0006 2.1128 0.0033 5.5060 2.1128 -38647.267
the number of N+S is around the length of the whole sequence.
But when I put around 150 sequences into codeml, the result looks like
pairwise comparison (Goldman & Yang 1994)
seq seq N S dN dS dN/dS Paras.
2 1 9001.4 3139.6 0.0010 0.0003 3.1455 0.0025 19.8616 3.1455 -16013.324
3 1 9127.5 3013.5 0.0010 0.0003 2.9457 0.0025 8.7926 2.9457 -16016.518
the number of N+S decreases.
I suppose that the total number of sites may decreas because more sequences with missing data exist. Is it a reasonable explanation?
In this case, what kinds of measure can I take in order to get a more precise result? Maybe I can delete the sequences with the high percentage of missing data?
2)
Below is the control file of codeml. I am wondering if the decrease of total number of sites is related to the model I used?
seqfile = test.phylip
outfile = results.txt
oisy = 0
verbose = 0
runmode = -2
seqtype = 1
CodonFreq = 2
model = 0
NSsites = 0
Thanks again!
Chenyu Fan