Hello,
I would like to use mapnh to estimate dN, dS, and dN/dS for a dataset of thousands of genes from a phylogeny of several sister species, and am particularly interested in the values at the tips of the phylogeny. I used bppml to fit a YN98 model (nullModelParams = YN98.omega*=1) for each gene, and then used mapnh to estimate dN and dS for each branch for each gene separately.
I have a question about the output. The manual states that the branch lengths in the output Newick trees are counts, but I am confused about what the units actually represent.
For example, I have a 681bp alignment of several species including species A and species B. Species A has 3 synonymous substitution that can be parsimoniously assigned to its branch, and species B has 1 synonymous substitution (neither has any nonsynonymous substitutions). mapnh gave species A dS=0.0454104, dN=9.07718e-07; and species B dS=0.0266222, dN=3.52644e-07. I am confused about why the dS estimate is less than twice as high in species A even though species A has three times more substitutions than species B, and that the units do not seem to be counts or number of synonymous changes per synonymous site. Am I interpreting the units incorrectly? Alternatively, is this approach inappropriate to estimate dS for very shallowly diverged species with only a few substitutions per branch? (I am planning to average dS across thousands of genes to get a more accurate genome-wide estimate, and am aware of biases with calculating dN/dS for very shallowly-diverged species)
Thank you for this program!