Interpretation of p value of 0 in branch model b_neut vs b_free

35 views
Skip to first unread message

Kitty Murphy

unread,
Nov 24, 2021, 9:54:13 AM11/24/21
to The ETE toolkit
Hi ETE3 team, 

Could you clarify how to interpret a p value of 0 using the branch model test b_neut vs b_free? Initially I thought these would be interpreted as significant, but I realised for the same gene I would have an insignificant p value after running the LRT M0 vs b_free and then a significant value when running the second LRT b_neut vs b_free, which didn't make sense to me. 

After looking into it, it seems that in PAML outputs a p-value of 0 when the LRT is negative: https://www.biostars.org/p/166718/, however, as ETE3 doesn't output the LRT value it's not possible to decipher this.

Many thanks, 
Kitty


Kitty Murphy

unread,
Jan 20, 2022, 7:36:22 AM1/20/22
to The ETE toolkit
I also found this to be the case when comparing genes analysed using the branch and branch-site model, such that genes with a p-value of 0 using the branch model had a non-significant p value using the branch-site model. 

fransua

unread,
Jan 20, 2022, 9:42:09 AM1/20/22
to The ETE toolkit
Hi,
ETE3 returns p-value computed internally either using chi2 from scipy if installed or using built-in function (I recommend to install scipy).
p-value of 0 should mean significant.
btw, ETE3 is not removing files generated by PAML, you should be able to get the likelihoods from the main outfiles and reproduce the LRT.

finally I do not see inconsistencies in the first case you explain,  seems that b_free (the most complicated model) is better than M0 and than b_neut.
In the second case I don't understand what you did...

let me know if I can help further (would need more details :) )

cheers

Kitty Murphy

unread,
Jan 21, 2022, 9:42:18 AM1/21/22
to The ETE toolkit
Hi Fransua, 

Thank you for the explanation regarding how the p-value is computed, and good to know about the PAML files, I rarely look past the log file with the main results! 

I just assumed if the b_neut b_free LRT is significant (the branch of interest has omega significantly greater than 1), then the M0 b_free LRT should also be significant (the branch of interest is not evolving neutrally). I should probably look back at exactly how these models are working.. 

I'm still confused about the second case (significant in branch model (bneut,bfree) but not branch-site (bSA1,bsA)), perhaps I'm calling the models incorrectly? Below is my code for running the branch and branch-site models. 

BRANCH
ete3 evol -t species_tree.nw --alg $gene-clean.fasta -o $gene-branch --models M0 b_neut b_free --tests b_free,M0 b_free,b_neut --cpu 16 --mark ${marks[*]} >> $gene.branch.log

BRANCH-SITE
ete3 evol -t species_tree.nw --alg $gene-clean.fasta -o $gene-branch-site --models M0 M1 bsA bsA1 --tests M1,bsA bsA1,bsA --cpu 16 --mark ${marks[*]} >> $gene.branch-site.log

Let me know if there is anything else I can share. 

Thanks

fransua

unread,
Jan 21, 2022, 9:50:23 AM1/21/22
to The ETE toolkit
Hi Kitty,
it's a bit complicated without the outputs, but b_free vs b_neut does not mean positive selection, it just means that b_neut (omega = 1) is not working well, but b_free can be pointing at omega <1 :)
also (even if your omega is > 1 ) testing on branches (b_free b_neut) is very hard to extrapolate to tests on branches and sites, the number of degrees of freedom is very different. Meaning that even if overall a given branch has a high likelihood to be evolving at a omega rate higher than 1, it can be that at the level of sites their is not enough statistical power.

Kitty Murphy

unread,
Jan 21, 2022, 9:56:51 AM1/21/22
to The ETE toolkit
Thanks for the quick response! 

Yes, sorry, I am only interpreting b_free vs b_neut as positive selection if omega on the branch of interest is greater than 1.  

I see, this make sense! Thank you so much! 

Reply all
Reply to author
Forward
0 new messages