codeML: Interpreting output, identifying variance and significance

1,095 views
Skip to first unread message

ja...@hi.is

unread,
Oct 9, 2017, 1:40:18 PM10/9/17
to PAML discussion group
Hi PAML group,

I'm new to the program and am trying to get a handle on the PAML output. I have three (presumably) homologous sequences across three species that I would like to calculate dNdS ratios for. So far I generated a phylip formatted alignment in clustal, then generated a tree file using phyML. From there I've been testing out the various models, but I'm having trouble identifying the relevant output (potentially because I've set noisy=9). 

Ideally the output I would like to source are dN, dS, dNdS and some information regarding the significance and variation corresponding to these outputs. The pairwise analysis from what I can tell seems to be the closest to this. However, I'm not 100% sure what the values of Paras. and the three unnamed outputs following it are.

pairwise comparison (Goldman & Yang 1994)

seq seq        N       S       dN       dS     dN/dS   Paras.

  2   1    344.4    129.6   0.2743   0.0801   3.4238   0.6637   4.6911   3.4238  -918.567

  3   1    350.4    123.6   0.5944   0.3697   1.6079   1.6074   2.1959   1.6079 -1118.794

  3   2    353.4    120.6   0.5411   0.4553   1.1883   1.5579   1.8663   1.1883 -1121.761


However, from my understanding the pairwise mode doesn't take into account my tree data. After changing run mode to 0 (user tree) I cannot seem to locate a summary table similar to that produced by the pairwise analysis. Likely this has something to do with my .ctl file not being designed appropriately so I will post it here:

      seqfile = Family_692.nuc * sequence data filename

     treefile = Family_692_tree.txt      * tree structure file name

      outfile = Family_692_codeml_output           * main result file name


        noisy = 9  * 0,1,2,3,9: how much rubbish on the screen

      verbose = 2  * 0: concise; 1: detailed, 2: too much

      runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic

                   * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise


      seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs

    CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table


*        ndata = 1

        clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis

       aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a

   aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)

                   * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own


        model = 1

                   * models for codons:

                       * 0:one, 1:b, 2:2 or more dN/dS ratios for branches

                   * models for AAs or codon-translated AAs:

                       * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F

                       * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)


      NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;

                   * 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ

                   * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;

                   * 13:3normal>0


        icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below

        Mgene = 0

                   * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff

                   * AA: 0:rates, 1:separate


    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated

        kappa = 2  * initial or fixed kappa

    fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate

        omega = .4 * initial or fixed omega, for codons or codon-based AAs


    fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha

        alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)

       Malpha = 0  * different alphas for genes

        ncatG = 8  * # of categories in dG of NSsites models


        getSE = 0  * 0: don't want them, 1: want S.E.s of estimates

 RateAncestor = 1  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)


   Small_Diff = .5e-6

    cleandata = 1  * remove sites with ambiguity data (1:yes, 0:no)?

*  fix_blength = -1  * 0: ignore, -1: random, 1: initial, 2: fixed

       method = 0  * Optimization method 0: simultaneous; 1: one branch a time


* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,

* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,

* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt.,

* 10: blepharisma nu.


I apologise if these are overly simple questions. I've tried to scour the google group and source material, but haven't found the answers I was looking for. Hopefully you can help as codeML seems very powerful and thus a program I'd like to continue using in future. 

 
Message has been deleted

Ziheng

unread,
Oct 29, 2017, 2:16:41 PM10/29/17
to PAML discussion group
not sure what caused the ugly formatting with your post.
if you have only 3 sequences, there is only one tree so there is no need to use a program to infer the tree.
the tree should be someting like
(1, 2, 3);
if you use model = 0 you get one dN/dS for the whole tree.
if you use model = 1 you get one dN/dS for each branc on the tree.
ziheng

Ziheng

unread,
Oct 29, 2017, 2:22:27 PM10/29/17
to PAML discussion group

some of the options look strange (fix_kappa = 1, kappa = 0.333, clock = 1), although i am not sure whether they were chosen intentionally.
perhaps use fix_kappa = 0, clock = 0.


Ziheng

unread,
Oct 29, 2017, 8:13:08 PM10/29/17
to PAML discussion group
(A) also the alignment may need to be redone. 
you have a stop codon TGA in seq. # 29 (ursMar1), at nucleotide site 127.  and you have a codon
-TG
inside dasNov3, with an out-of-frame deletion. 
codeml reads and interprets this as ?TG, but the alignment may be suspicious, so you should check.

(B) I also note that codeml has a bug that causes the number of parameters to be counted incorrectly when fix_kappa = 1.  I have fixed this and updated the update paml4.9f.tgz.  This may not be an issue if you fix the control file with fix_kappa = 0.
ziheng

Reply all
Reply to author
Forward
0 new messages