dN/dS does not match dN divided by dS

242 views
Skip to first unread message

Nicholas Bailey

unread,
Jun 10, 2021, 4:56:01 PM6/10/21
to PAML discussion group
Hello all.

I have codeml output and am trying to extract dN and dS separately so I can exclude branches with low dS and calculate dN/dS myself. But in trying to do this I see that I sometimes have non-zero dN/dS for branches with 0 dN and/or dS. This is shown in the following output:

dN & dS for each branch

branch          t       N       S   dN/dS      dN      dS  N*dN  S*dS

  27..28     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  28..29     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  29..30     0.000  1116.2   452.8  0.2681  0.0000  0.0000   0.0   0.0

  30..31     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  31..2      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  31..14     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  30..32     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  32..1      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  32..15     0.004  1116.2   452.8  0.0487  0.0002  0.0040   0.2   1.8

  30..33     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  33..4      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  33..19     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  29..34     0.004  1116.2   452.8  0.0001  0.0000  0.0044   0.0   2.0

  34..35     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  35..5      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  35..17     0.004  1116.2   452.8  0.0487  0.0002  0.0040   0.2   1.8

  34..36     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  36..7      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  36..20     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  34..37     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  37..12     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  37..18     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  34..38     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  38..8      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  38..24     0.008  1116.2   452.8  0.0487  0.0004  0.0079   0.4   3.6

  28..39     0.000  1116.2   452.8  0.5418  0.0000  0.0000   0.0   0.0

  39..40     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  40..3      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  40..21     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  39..41     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  41..6      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  41..16     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  39..42     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  42..10     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  42..25     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  27..43     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  43..44     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  44..45     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  45..13     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  45..26     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  44..46     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9

  46..9      0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  46..23     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  44..47     0.004  1116.2   452.8  0.0487  0.0002  0.0040   0.2   1.8

  47..11     0.000  1116.2   452.8  0.0487  0.0000  0.0000   0.0   0.0

  47..22     0.002  1116.2   452.8  0.0487  0.0001  0.0020   0.1   0.9


Prior to analysis I labeled branches corresponding to 29..30, 29..34 and 28..39. I am particularly interested in branch 29..30. I realize the branches without special labeling all have the same dN/dS, which partially explains why they won't always match up to dN and dS individually. Regardless I am still confused about exactly what calculation is taking place for each branch such that individual dN/dS values don't match the corresponding dN/dS. I would greatly appreciate some help here as I can't imagine how I'll filter based on dS values when I don't know how well they correspond to the given dN/dS values.

Ziheng

unread,
Jun 25, 2021, 6:55:49 AM6/25/21
to PAML discussion group

the output in this block represents in effect predicted values from the model using the MLEs of parameter values.  they are not observed values.  we do not really observe those things.
t is branch length, expected number of nucleotide substitutions per codon
N: the number of nonsynonymous sites.
S: the number of synonymous sites.  N and S are calculated using sequence length, kappa, and codon frequencies.  see chapter 2 in yang 2014.
dN/dS: the w ratio.  this is the MLE of the parameter in the model.  i think you are using a branch model so a few ratios (w0, w1, w2, ...) are estimated and some branches have the same ratio.
dN: the expected number of nonsynonymous substitutions per nonsynonymous site
dS: the expected number of synonymous substitutions per synonymous site.  dN and dS are calculated using the branch length, kappa, w, codon frequencies.  agains see chapoter 2 in yang 2014.
N*dN: this gives the (predicted) number of nonsynonymous substitutions for the whole gene sequence along the branch
S*dS: : the (predicted) number of synonymous substitutions for the whole gene sequence along the branch

so along branch 47..22, we have 0.1 and 0.9, which means that most likely one change occurred alongthat branch and it is more likely to be a nonsynonymous change rather than synonymous change.  because we do not observe the ancestral sequences, there will be uncertainties in such kind of guesses or inference.

in theory if you know what the columns are, it should be easy to explain why for each branch dN divided by dS is not dN/dS.

"I have codeml output and am trying to extract dN and dS separately so I can exclude branches with low dS and calculate dN/dS myself."
the sequences are extremely similar, some apparently idnetical, so there is little information in the data about things like dN/dS.  you may be looking for trouble if you want to calculate dN/dS yourself.
ziheng

Nicholas Bailey

unread,
Jun 29, 2021, 11:15:31 AM6/29/21
to PAML discussion group
Thank you for this thorough explanation! I had read the chapter you reference but I see I clearly still had conceptual difficulties. I've been reading it over again and believe I'm starting to get a better sense of what codeml is doing. Regardless, I'll be sure to filter the data if there's little info and use the dN/dS values already provided.
Reply all
Reply to author
Forward
0 new messages