CODEML - Calculate dN/dS ratio by free-ratio model(model = 1; NSsites = 0)

928 views
Skip to first unread message

Elaine

unread,
Apr 17, 2020, 4:13:59 AM4/17/20
to PAML discussion group

Hi there everyone,

I am a beginner for codeml and I am calculating the dN/dS ratio for 42 orthologs (cds) from 32 species. 

  1. In the head of result file(something like "mlc"), there are some question marks(?) in the sequences, is that means stop codons? Will it influence the result of dN/dS ratio?

  2. When I do LTR, can I use the lnL from the one-ratio model and free-ratio model to get the p-value? If p-value < 0.05, that means dN/dS ratio from free-ratio model for each branch is reliable?

  3. Is it reliable for free-ratio model (model = 1; NSsites = 0) to calculate the dN/dS ratio for these 42 branches?
  •  I tried to test it, some branches present a large omega value...I found a reply from Ziheng says "That model is parameter rich and the estimates may involve large sampling errors. A very large omega for a very short branch, for example, does not mean much.” Anyone know what's that mean?
  • Or do I need to set each branch as foreground clade separately get their ratio? 
  • Are there any good starting points or ways to make it?


Thanks so much!
Elaine

Ziheng

unread,
Jul 12, 2020, 4:41:32 AM7/12/20
to pamlso...@googlegroups.com
  1. In the head of result file(something like "mlc"), there are some question marks(?) in the sequences, is that means stop codons? Will it influence the result of dN/dS ratio?

i am not sure.  can they be alignment gaps or ambiguity characters.  what do you have in your data?
  1. When I do LTR, can I use the lnL from the one-ratio model and free-ratio model to get the p-value? If p-value < 0.05, that means dN/dS ratio from free-ratio model for each branch is reliable?

no.  p-value < 0.05 means that the dN/dS ratio is variable among branches, or more precisely that the null hypothesis of the same dN/dS for all branches is rejected. it does not mean that branch-specific dN/dS values are reliably estimated.  
  1. Is it reliable for free-ratio model (model = 1; NSsites = 0) to calculate the dN/dS ratio for these 42 branches?
  •  I tried to test it, some branches present a large omega value...I found a reply from Ziheng says "That model is parameter rich and the estimates may involve large sampling errors. A very large omega for a very short branch, for example, does not mean much.” Anyone know what's that mean?
  • Or do I need to set each branch as foreground clade separately get their ratio? 
  • Are there any good starting points or ways to make it?
depending on the data and the species phylogeny, it may not be possible to estimate the dN/dS ratios for the individual branches reliably.  
you can run the same analysis multiples to make sure that you get the MLEs, but they may not be reliable, simply because there is not much information in the data.  
if you flip a coin 10 times (or even 5 times), you won't get a reliable estimate of the probability of heads.  here the branch length is like the total number of coin tosses, and when that is small, you won't see many synonymous or nonsynonymous changes, and it won't be possible to get reliable estimates of their ratios.
if you concatenate the 42 genes and then use the model, the estimates will much smaller sampling errors.  obviously they only mean averages over all codons and over all genes.
ziheng

Reply all
Reply to author
Forward
0 new messages