Dear PAML authors,
Thanks for developing the great package!
I'm doing branch site tests for 4 species on genome scale about 8000 gene families. The branch of interest is the ancestor of three species. The rooted tree file is like the following:
((Hl_A,(Hl_B,Hl_C)) #1,HL)
LRT tests identified 300 gene families under positive selection.
I'm not sure whether 4 species are sufficient for the analysis. So I added two outgroup species within same genus (within 150 MYA), and did the similar analysis on ~5700 one-to-one genes families with unrooted trees. After LRT tests, there are only 21 families under selection, and most of the genes are not overlapped with the identified genes based on 4 species analysis. One example is in the following figure. The amino acid G in the last column is identified as positively selected in 6 species model but not in the 4 species model. The output of codeml are in the attachment, in which bs.A directory is for the alternative model, bs.A1 is for the null model, and cmp.table.csv is the results of LRT.
Is this behavior expected? I'm wondering how many species I should include in the analysis, or did I make some mistakes in the analysis.