Taxon sampling for PAML

40 views
Skip to first unread message

Yihang Zhao

unread,
Feb 21, 2025, 4:19:15 PMFeb 21
to PAML discussion group
Hi

I'm working on a project whose scope focuses on sauropsida species. When choosing sequences to included in the dataset, I wonder 1): can i include two identical sequences from two closely-related species or will that bias the results estimated by paml and 2): since different sauropsida genera have very different number of sequenced species so will it bias paml's estimates if I pick sequences from multiple species in one group but only one sequence from other groups (for example, i have 2-3 sequences from the genera hydrophis, lacerta, varanus, protobothrops but for most other groups only one sequence is available for each genus).

Thanks in advance for answering my question!

Best
Yihang

Yihang Zhao

unread,
Feb 21, 2025, 4:35:01 PMFeb 21
to PAML discussion group
Oh and one more question related to my previous question, since I plan to investigate some questions about snakes using my sauropsida dataset I have disproportionally more snake sequences in my dataset than sequences from other groups (lizards, turtles, birds, etc.). Will this disproportionally more snake sequences bias estimates from paml? is there any general rule about how many species should be included for each group of species to avoid biased estimates caused by overrepresenting one group (e.g. like should i keep roughly the same number of species for each group, or maybe it depends on how many species exist for a particular group, or maybe overrepresentation is not an issue, etc.)?

Best
Yihang
Reply all
Reply to author
Forward
0 new messages