Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Detect positive selection at two branches connected to the root simultaneously?

79 views
Skip to first unread message

Wenqiang

unread,
Dec 4, 2024, 3:34:25 AM12/4/24
to PAML discussion group
Dear PAML authors,

We aim to identify rapidly evolving genes on two branches connected to the root node across 3,000 single-copy gene families. Our assumption is that some genes experienced rapid evolution on each ancestral branch following the divergence between groups A and B.

Here are the rooted tree structures we used:

  • Tree A: ((A1, A2)#1, (B1, B2))
  • Tree B: ((A1, A2), (B1, B2)#1)

We employed the two-ratio branch model to analyze these trees and calculated p-values using likelihood ratio tests (LRT). However, we observed that the rapidly evolving genes identified on branch A are exactly the same as those identified on branch B. Similarly, we obtained identical results using the branch-site model.

For reference, the outputs from CodeML for one gene family are attached. These include:

  • The b_free directory for the alternative model,
  • The M0 directory for the null model,
  • Results for the two groups, labeled as "argas" and "hard,"
  • A summary table (bfree.table.csv) containing the LRT results.

Based on my understanding of the two-ratio branch model, the results for the two branches should differ. Could you kindly clarify if I might be misunderstanding the model, or if there is another explanation for these observations?

Thank you very much for your time and assistance.

Best regards,
Wenqiang

OG0001481.zip

Wenqiang

unread,
Dec 4, 2024, 7:32:49 PM12/4/24
to PAML discussion group
For instance, the LRT test give following results for Tree A and Tree B:

Test family model_A Ln model_null Ln p-value-LRT w0 w1 dS
Tree A OG0001481 b_free.A -3609.250709 M0 -3614.540765 0.001143107 0.0276 0.07608 1.2796
Tree B OG0001481 b_free.B -3607.14045 M0 -3614.540765 0.000119496 0.02682 0.41626 0.1684

So the two branches connected to the root shows significant positive selection. Can I trust both or just select one with bigger Ln value?

Wenqiang

Sishuo Wang

unread,
Dec 4, 2024, 9:52:08 PM12/4/24
to PAML discussion group
Dear wenqiang,

It's an interesting question. Do you mean by "identical results" that it's the same genes that are detected for all genes you analyzed? In your example, the p-values are different but both < 0.05 (0.001143107 vs. 0.000119496). Is this what you meant?

best,
sishuo

Wenqiang

unread,
Dec 5, 2024, 2:39:28 AM12/5/24
to PAML discussion group
Thanks for the quick reply!

Tests on tree A identified 621 families with pval<0.05, and tests on tree B identified 625 families with pval<0.05. 455 gene families are shared by both tests. The example I gave is one of the 455 genes.

Wenqiang

Sishuo Wang

unread,
Dec 11, 2024, 10:29:51 PM12/11/24
to PAML discussion group
Dear Wenqiang,

Thank you!

So if i understand it correctly, that is not "we observed that the rapidly evolving genes identified on branch A are exactly the same as those identified on branch B."? Perhaps there's sth that i misunderstood.

best,
sishuo

Wenqiang

unread,
Dec 12, 2024, 3:18:07 AM12/12/24
to PAML discussion group
Thanks for the reply!

The initial description is misleading. The numbers of the positive selection on each branch are accurate. 

Overall, the number of overlapped positively-selected genes between two branches are relatively high. I'm wondering whether its true biological signal or statistical artifact? As the two branches are connected to the root, it might be difficult for PAML to infer the ancestral state of the root. I'm wondering whether codeml had lower sensitivity around the root node?

Best regards,
Wenqiang


Reply all
Reply to author
Forward
0 new messages