interpreting codeml output

4,121 views
Skip to first unread message

Stacy Hung

unread,
May 24, 2013, 2:19:13 PM5/24/13
to pamlso...@googlegroups.com
Hi,

I would like to calculate the dN/dS values for a large number of genes of interest.  For each gene, I have a codon alignment across 19 strains of one species.  It seems like codeml should give me the results I need, but when I run it through the command line the output is very convoluted and not easy to interpret.  Going through the output file, I don't see any value that would represent an "alignment-wide" omega value; rather I only see pairwise dN/dS values.

I've also tried obtaining the ratios using the BioPerl implementation of codeml, and can get this to work for runnmode = -2 (i.e. pairwise); however, I want to use codeml with runmode = 0 which produces an error (Can't call method "get_MLmatrix" on an undefined value) since I am only interested in a single omega for the entire alignment. 

Does anyone have any suggestions or information that can help me retrieve the values I am looking for?

Thanks,
Stacy

Ziheng

unread,
May 25, 2013, 7:09:35 AM5/25/13
to pamlso...@googlegroups.com
you can prepare an alignment, a tree file, and in the control file have model = 0, NSsites = 0.  This specifies the M0 (one-ratio) model, which estimates one w (dN/dS) ratio for the whole tree.
ziheng
 

Christina Castro

unread,
Jun 12, 2013, 3:41:20 PM6/12/13
to pamlso...@googlegroups.com
I'm having the same issue as Stacy. I can get the pairwise dN/dS values, but I would like the "alignment-wide" omega value. I set the parameters model=0, NSsites=0, but where exactly is the omega value in the output files?

Seth Kasowitz

unread,
Jun 13, 2013, 8:00:23 AM6/13/13
to pamlso...@googlegroups.com
The output file has a section that begins with
"Detailed output identifying parameters"

With model = 0 and NSsites = 0 you will just see two parameter estimates. They should be presented as

"kappa (ts/tv) = ________
omega (dN/dS) = ________"

This will be followed by the dN and dS for each branch. Depending on what you have the noisy parameter set to (e.g. 9) you may have to scroll very far down the file to find the omega for your tree.

If you still are unable to find the parameter estimates, check your control file and look at what runmode is set to. You want it set to 0 (user tree) in this case.

Seth

Christina Castro

unread,
Jun 13, 2013, 2:44:23 PM6/13/13
to pamlso...@googlegroups.com
Thanks for the quick reply Seth!

Hmm... that particular section is not listed anywhere in the output, probably why I can't find it.

I have noisy=9, runmode=0, model=0, NSsites=0. 

Is this a parameter setting issue or possibly something is wrong with the input files? My input files are generated from Pal2Nal, and PHYLIP for the tree files. They are not giving any errors, so it's just really perplexing to me.

Any more suggestions will be welcoming. 

Christina Castro

unread,
Jun 13, 2013, 4:46:33 PM6/13/13
to pamlso...@googlegroups.com
I'm also using PAML version 4.7 if that makes a difference.

Kevin Wyatt McMahon

unread,
Jul 10, 2013, 2:15:05 PM7/10/13
to pamlso...@googlegroups.com
I'm having the exact same issue: Everything seems to run fine (below is my mlc output file):


Before deleting alignment gaps
     
3    856


g14906_278                      MGSFRREKDE EEDGPTNAYQ NLEKTSVLQE TRTFNETPVN ARKCIHILTK ILYLINQGEQ LVAREATDCF FAMTKLFQSK DIVLRRMVYL GIKELSSIAE DVIIVTSSLT KDMTGKEDLY RAAAIRALCS ITDPTMLQAV ERYMKQCIVD KNAAVSCAAL VSSLRLASTA GDVVKRWANE AQEAMNSDNI MVQYHALGLL YHIRKSDRLA VSKLVNKLTR GSLKSPYAVC MLIRIACKLI EEEDLPAEEL SDSPLYTFIE TCLRHKSEMV IYEAAHAIVN IKNTNPRMLS PAFSILQLFC
GSPKATLRFA AVRTLNKVAM THPAAVTTCN LDLEGLITDS NRSVATLAIT TLLKTGAESS VERLMKQIST FVAEISDEFK VVVVQAICAL CAKYPRKHTV LMNFLSGMLR EEGGLEYKTS IVDTIITIIE ENADAKESGL SHLCEFIEDC EHVSLAVRIL HLLGKEGPFA ATPSKYIRFI YNRVILESPI VRAAAVTALA QFGASCPALL ANILVLLGRC QMDPDDEVRD RATYYLTILN TAKPELYKNY IIERENCSLA LLEKALSDHL NGDLQTHFDI SVVPKAAIVK HEISNDIMMV TSAPRAPKIT REEESAARLA QLPGIQVLGP
 IHRSTAPIQL TESETEYTVQ CIKHIFAQHV VFQFDCLNTL SEQFLENVRV ELTLPEGYTT RAIIPCPKLA YNELQTTFVI VEFPPDPSSS IEWLQAEDTF VLSAVNNLQD AVNT
------ --IIKILG-- ---------- ------LGAA NLSEK----V PEGTHL---- ------HTVH CSGTFRGGVE VLVRAKLALS EGVTLNLTVR STDQDVAELI TAAIG*
hd010221_2                      MGSFRREKDE EEDGPTNAYQ NLEKTSVLQE TRTFNETPVN ARKCIHILTK ILYLINQGEQ LVAREATDCF FAMTKLFQSK DIVLRRMVYL GIKELSSIAE DVIIVTSSLT KDMTGKEDLY RAAAIRALCS ITDPTMLQAV ERYMKQCIVD KNAAVSCAA
- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -LIRIACKLI EEEDLPAEEL SDSPLYTFIE TCLRHKSEMV IYEAAHAIVN MKNTNPRMLS PAFSILQLFC
GSPKATLRFA AV
-------- ---------- ---------- ---------- ---------- ---------- ---------- -VVVQAICAL CAKYPRKHTV LMNFLSGMLR EEGGLEYKTS IVDTIITIIE ENADAKESGL SHLS------ ---------- ---------- ---------- ---------- -----VTALA QFGASCPALL ANILVLLGRC QMDPDDEVRD RATYYLTILN TAKPELYKNY IIERENCSLA LLEKALSDHL NGDLQTHFDI SVVPKAAIVK HEISNDIMMV TSAPRAPKIT REEESAARLA QLPGIQVLGP
 IHRSTAPIQL TESETEYTVQ CIKHIFAQHV VFQFDCLNTL SEQFLENVRV ELTLPEGYTT RAIIPCPKLA YNELQTTFVI VEFPPDPSSS IATFGATLRF AVKDCDPVTG EPDSDESYDD EYMLEDLEVT VADQIQKTKK NNFQVAWDAA DSEGKSSYQI ILSIHIIIIC IICLQNGYKL RTRTFRGGVE VLVRAKLALS EGVTLNLTVR STDQDVAELI TAAIG
*
s865340_1_                      MGSFRREKDE EEDGPTNAYQ NLEKTSVLQE TRTFNETPVN ARKCIHILTK ILYLINQGEQ LVAREATDCF FAMTKLFQSK DIVLRRMVYL GIKELSSIAE DVIIVTSSLT KDMTGKEDLY RAAAIRALCS ITDPTMLQAV ERYMKQCIVD KNAAVSCAAL
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
 
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------






After deleting gaps. 159 sites
     
3    159


g14906_278                      MGSFRREKDE EEDGPTNAYQ NLEKTSVLQE TRTFNETPVN ARKCIHILTK ILYLINQGEQ LVAREATDCF FAMTKLFQSK DIVLRRMVYL GIKELSSIAE DVIIVTSSLT KDMTGKEDLY RAAAIRALCS ITDPTMLQAV ERYMKQCIVD KNAAVSCAA
hd010221_2                      MGSFRREKDE EEDGPTNAYQ NLEKTSVLQE TRTFNETPVN ARKCIHILTK ILYLINQGEQ LVAREATDCF FAMTKLFQSK DIVLRRMVYL GIKELSSIAE DVIIVTSSLT KDMTGKEDLY RAAAIRALCS ITDPTMLQAV ERYMKQCIVD KNAAVSCAA
s865340_1_                      MGSFRREKDE EEDGPTNAYQ NLEKTSVLQE TRTFNETPVN ARKCIHILTK ILYLINQGEQ LVAREATDCF FAMTKLFQSK DIVLRRMVYL GIKELSSIAE DVIIVTSSLT KDMTGKEDLY RAAAIRALCS ITDPTMLQAV ERYMKQCIVD KNAAVSCAA






Printing out site pattern counts




         
3         19  P


g14906_278                      ACDEFGHIKL MNPQRSTVY
hd010221_2                      ACDEFGHIKL MNPQRSTVY
s865340_1_                      ACDEFGHIKL MNPQRSTVY






   
15    5    9   13    5    5    1   12   11   14    6    6    3    7   10
   
9   13   10    5


AAML
(in paml version 4.7, January 2013)  Q29AE5.phy.fasta.transl.phy.cleaned
Model: Poisson for branches, ns =   3  ls = 159




Frequencies..
                                    A      R      N      D      C      Q      E      G      H      I      L      K      M      F      P      S      T      W      Y      V
g14906_278                    
0.09434 0.06289 0.03774 0.05660 0.03145 0.04403 0.08176 0.03145 0.00629 0.07547 0.08805 0.06918 0.03774 0.03145 0.01887 0.05660 0.08176 0.00000 0.03145 0.06289
hd010221_2                    
0.09434 0.06289 0.03774 0.05660 0.03145 0.04403 0.08176 0.03145 0.00629 0.07547 0.08805 0.06918 0.03774 0.03145 0.01887 0.05660 0.08176 0.00000 0.03145 0.06289
s865340_1_                    
0.09434 0.06289 0.03774 0.05660 0.03145 0.04403 0.08176 0.03145 0.00629 0.07547 0.08805 0.06918 0.03774 0.03145 0.01887 0.05660 0.08176 0.00000 0.03145 0.06289


Homogeneity statistic: X2 = 0.00000 G = 0.00000


Average                        0.09434 0.06289 0.03774 0.05660 0.03145 0.04403 0.08176 0.03145 0.00629 0.07547 0.08805 0.06918 0.03774 0.03145 0.01887 0.05660 0.08176 0.00000 0.03145 0.06289


# constant sites:    159 (100.00%)
ln
Lmax (unconstrained) = -449.294977


AA distances
(raw proportions of different sites)


g14906_278    
hd010221_2      
0.0000
s865340_1_      
0.0000  0.0000




TREE
#  1:  (2, 3, 1);   MP score: 0
lnL
(ntime:  3  np:  3):   -476.323339      +0.000000
   
4..2     4..3     4..1  
 
0.000004 0.000004 0.000004


tree length
=   0.00001


(2: 0.00000, 3: 0.00000, 1: 0.00000);


(hd010221_2: 0.00000, s865340_1_: 0.00000, g14906_278: 0.00000);


Time used:  0:00


You'll notice there is no omega or dNdS values.

cajawe

unread,
Jul 14, 2013, 5:20:22 PM7/14/13
to pamlso...@googlegroups.com
There are no omega values because you're analyzing an amino acid data set.  

You want to analyze a codon data set instead.

anavale...@gmail.com

unread,
Dec 26, 2013, 2:32:40 PM12/26/13
to pamlso...@googlegroups.com
Hi PAML users,

I'm having very similar problems with my codeml output file.

I want to anlayze the dN/dS values for each cluster of a ML tree, using a nucleotide coding alignment of 40 sequences. In the codeml.ctl file I set the parameters:

runmode=0
seqtype=1
model=2
NSsites=0

As input files I have tried with nucleotide alignments (stop codons removed) in phylip and fasta format, and newick format for tree file. I also removed the bootstrap values from the trees, the distances and added '#1', '#2', etc to each node.
I have checked the names of taxa in alignment and in the tree and they match.

My problem is
After the pairwise dN/dS values there is nothing but "NOTE: -1 means that NG86 is inapplicable" in my output file.
Also, in the command prompt it appears an strange question: "Species taxa_gene1?", which is one of the OTUs in the middle of the alignment.

Could anyone give me some advice? It would be very much wellcome!

Ana

Ziheng

unread,
Feb 1, 2014, 5:13:38 AM2/1/14
to pamlso...@googlegroups.com

"NOTE: -1 means that NG86 is inapplicable"
you can ignore this warning if your alignment is trustable.

the other error message means that the sequence/species name in the tree file does not match any of the names in the sequence file. the match has to be exact.
ziheng

Reply all
Reply to author
Forward
0 new messages