mapnh output

131 views
Skip to first unread message

elsemikk

unread,
Jun 23, 2021, 5:12:40 PM6/23/21
to Bio++ Usage Help Forum
Hello,
I would like to use mapnh to estimate dN, dS, and dN/dS for a dataset of thousands of genes from a phylogeny of several sister species, and am particularly interested in the values at the tips of the phylogeny. I used bppml to fit a YN98 model (nullModelParams = YN98.omega*=1) for each gene, and then used mapnh to estimate dN and dS for each branch for each gene separately.

I have a question about the output. The manual states that the branch lengths in the output Newick trees are counts, but I am confused about what the units actually represent. 
For example, I have a 681bp alignment of several species including species A and species B. Species A has 3 synonymous substitution that can be parsimoniously assigned to its branch, and species B has 1 synonymous substitution (neither has any nonsynonymous substitutions). mapnh gave species A dS=0.0454104, dN=9.07718e-07; and species B dS=0.0266222, dN=3.52644e-07. I am confused about why the dS estimate is less than twice as high in species A even though species A has three times more substitutions than species B, and that the units do not seem to be counts or number of synonymous changes per synonymous site. Am I interpreting the units incorrectly? Alternatively, is this approach inappropriate to estimate dS for very shallowly diverged species with only a few substitutions per branch? (I am planning to average dS across thousands of genes to get a more accurate genome-wide estimate, and am aware of biases with calculating dN/dS for very shallowly-diverged species)

Thank you for this program!

Laurent Guéguen

unread,
Jun 29, 2021, 3:27:18 AM6/29/21
to Bio++ Usage Help Forum

Hello,

sorry I thought I had already replied to you, but apparently not.

The normalization of dS and dN is done through the division of the counts  by the expected counts the same model, but neutral, would have done on the same history.
It means that, in comparison with the usual normalization way, which is division by the number of (non-)synonymous sites, those dN & dS are per time unit, and not
proportional to the branch length.
If you want to have counts comparable to the usual dN & dS, you have then to multiply these values by the branch lengths ( and I would guess that in your case branch to A is
roughly 3 times longer than branch to B).

Cheers,
Laurent

elsemikk

unread,
Jun 29, 2021, 1:40:21 PM6/29/21
to Bio++ Usage Help Forum
Ah ok thank you very much! Yes species A has its branch length (0.0260) longer than species B (0.0171), so that makes sense.
Best wishes,
Else

elsemikk

unread,
Sep 18, 2021, 2:42:29 PM9/18/21
to Bio++ Usage Help Forum
Hello,
I have a quick follow up question: 
my understanding was that the dN and dS output by mapnh are already standardized by the number of synonymous or nonsynonymous sites as in standard dN and dS, but I just read a paper where the authors divide the dN and dS output of mapnh by the number of synonymous or nonsynonymous sites before computing dN/dS. Are the dN and dS outputs of mapnh already standardized by the number of nonsynonymous and synonymous sites? Or is that a necessary step for the user to do before computing dN/dS from the output?
Thank you!

Laurent Guéguen

unread,
Sep 19, 2021, 3:00:37 PM9/19/21
to Bio++ Usage Help Forum
Hello,

it depends of the paper. In the first versions of mapnh, there was no normalization, so the outputs where the raw counts of (non-)synonymous substitutions, that were divided by the number of relevant sites, as in the usual way. Now, if you set
a normalization factor in mapnh, the output is the normalized number, without the need of division per the number of sites.
I agree that the change may be confusing, but if you do not set the normalization option on the parameter file of mapnh, the raw number of counts is still output.
I hope it is more clear now.

Cheers,
Laurent

elsemikk

unread,
Sep 20, 2021, 1:54:48 AM9/20/21
to Bio++ Usage Help Forum
Thank you for explaining, I'm afraid I'm still a bit confused. I could not find any parameter about normalization in the mapnh manual. I am using mapnh v1.1.1 (downloaded from testnh release version 2.3.2).
My output from mapnh seems like it is already in the format of substitutions per site, for example:
I have a large alignment where I observe 0.44% divergence between two sister sequences. I used bppml to optimize branch lengths of my phylogeny and got branchlengths of 0.00713467 and 0.00654979 for the two sisters (which seems to make sense, to expect ~1.4% divergence per codon). Then I estimate dN and dS with mapnh and get dS=0.00524192 and dS=0.00563235 for the two sister branches, added together that gives pairwise dS=0.011 between the two, which seems to make sense and matches an estimate from paml of dS=0.013. If I were to multiply the dS output values by the branchlengths of the input tree, then I would get a number that is a few orders of magnitude smaller than expected for dS. Maybe I am confused, but are the values output in *.counts_dS.dnd not already in substitutions per site? The numbers seem to be too small to be [(counts)/(expected counts)] unless I am misunderstanding.
Thank you for your help,
Else

Laurent Guéguen

unread,
Sep 22, 2021, 4:36:19 AM9/22/21
to Bio++ Usage Help Forum

Hi Else, could you show your config files, please?
Cheers,
Laurent
Reply all
Reply to author
Forward
0 new messages