branch and branch-sites tests yield different results

Lulu

unread,

Jan 24, 2024, 9:53:17 PMJan 24

to PAML discussion group

Hello!

I used codeml to test for positive selection using the sites, branch, and branch-sites tests. I noticed that the branch-sites test identified one branch as under selection (significant pchi log likelihood test, one site from BEB analysis was significant). However, when I did the branch test for this branch, the pchi log likelihood test was not significant. I was wondering why this could have occurred and how to interpret these conflicting outcomes. Conversely, one branch I tested with branch-sites test was NOT significant (via pchi log likelihood test) but was significant with the branch test (via pchi log likelihood test).

Thank you!!

Sandra AC

unread,

Feb 8, 2024, 5:45:55 AMFeb 8

to PAML discussion group

Hi there!

Note that each model that you have been running with CODEML assumes different scenarios regarding the omega ratio (ω). Below, you can find a short summary of what we explain in our latest protocol (Álvarez-Carretero et el., 2023):

Branch models assume different ω ratio parameters for different branches on the phylogeny (Yang 1998; Yang and Nielsen 1998). They may be used to detect positive selection acting on particular lineages, without averaging the ω ratio throughout the phylogenetic tree. For instance, they are useful for detecting positive selection after gene duplications, where one copy of the duplicates may have acquired a new function, and thus may have evolved at an accelerated rate.
Site models treat the ω ratio for any site (codon) in the gene as a random variable from a statistical distribution, thus allowing ω to vary among codons (Nielsen and Yang 1998; Yang et al. 2000). Positive selection is defined as the presence of some codons at which ω > 1.
Branch-site models aim to detect positive selection that affects only a few sites on prespecified lineages (Yang and Nielsen 2002). Branches under test for positive selection are called foreground branches, whereas all other branches on the tree are the background branches.

For instance, the branch-site model assumes that ω varies both among lineages and across sites. By what you suggest in your message, it seems that your analyses with CODEML identified one amino acid along the foreground branch/branches that you specified (not sure if you specified one or more branches as foreground) to be under positive selection -- it is not that you have detected a branch under selection, that may be what confused you.
Under a branch model, the assumptions are different: ω is assumed to vary across the branches of the tree (not across sites!). If the null model (M0) fitted better your data than the branch model (i.e., results from the LRT), then the ω ratios for the lineages you labelled as foreground are not significantly different from the ω ratios of the background branches. Note that you always need to have an "a priori" hypothesis when you label branches as "foreground branches" (i.e., you specify the lineages you believe may have been under positive selection as "foreground branches" before you test for positive selection). In addition, remember that if multiple branches on the tree are tested for positive selection (i.e., foreground branches) without any "a priori" hypothesis, then you need to correct for multiple testing.

You can go through our protocol to better understand how to run CODEML to test for positive selection alongside our GitHub "positive-selection" repository -- you can go through every example in the protocol and, at the same time, run them in a reproducible manner by following our repository.

Hope this helps!
S.

Lulu

unread,

Mar 11, 2024, 3:54:25 PMMar 11

to PAML discussion group

Hi! This was a very, very helpful response, thank you! I apologize for my late response.

In regard to your point about testing multiple branches on the tree--I am interested in a trait that has evolved in two species (from the same genus) but is not present in the other two species (from a different, yet closely related genus) in my gene tree. My hypothesis is that the branches for these two species are under selection. However, I have been testing each branch (branch test) for each species (and correcting for multiple tests) but was unsure if is this approach is correct. My reasoning for testing all branches is because I wouldn't want to conclude the two lineages I expected to be under selection are unique if it could just as well be under selection in the two other species (and thus be a more general trait), but I just didn't test for it. What would you recommend?

Thank you for your help!!

Lulu

unread,

Mar 11, 2024, 4:35:49 PMMar 11

to PAML discussion group

Oh, I wanted to add I have been referring to the paper by Anisimova & Yang 2007 (especially Figure 2 and Table 6 have been helpful to me conceptually). I was wondering, in that example, they found foreground branches 2 (cow), 5 (cat), and 6 (internal branch leading to clade that contains cow and cat) were under selection. Does the finding that 6, a more ancestral branch, strengthen their confidence that 2 and 5 are under selection, because the ancestral sequence had experienced selection as well by detecting an instance of episodic or prolonged selection on an ancestral branch?

Thank you!!

Sandra AC

unread,

Mar 18, 2024, 10:33:47 AMMar 18

to PAML discussion group

Hi Lulu,

There is not a unique answer to your questions, hence why this is part of the biological question you are trying to answer :)

With regards to your first message, you say that you have a hypothesis (i.e., two branches are under positive selection based on the protein-coding gene you are analysing, if I understood correctly), but yet you are running a branch test analysis that does not match your hypothesis (i.e., multiple testing). If you have an a priori hypothesis (which you do), I would not test other hypotheses -- mainly because what you would observe may not be answering your initial question/s.

With regards to the results you highlight in Anisimova & Yang 2007, they wrote the following:

```
The factors potentially driving such lineage-specific positive selection in CD2 are not well understood. As discussed by Lynn et al. (2005), CD2 has different counterreceptors in different species (CD58 in humans, pigs, and cats but CD48 in rodents). As a result, there may be selective pressure to optimize the interaction of CD2 with its counterreceptor. Interactions with viral proteins could also be responsible for species-specific positive selection driving adaptive evolution in CD2, as different mammals act as hosts for different viruses.
```

Note that, once you evaluate the results obtained from a positive selection test, you need to go back to your initial hypothesis/es and think of the protein-coding gene/s you are analysing (e.g., function, differences across species, etc.). To my understanding, in the study cited above, they put their results in context with regards to the gene they were analysing and how, biologically, the positive selection results could be explained. I would not say that one branch strengthened the confidence of another branch being under selection as they were tested as independent foreground branches (but I may be wrong).

What I want to emphasise is that you need to always go back to your biological question/s to understand why the branch/es you tested may (or may not) be under positive selection -- that's why understanding the data you are analysing is key! Perhaps other people in this discussion group may have other comments/suggestions to add :)

All the best,
S.

Reply all

Reply to author

Forward