How to calculate Ka/Ks value for large amount of orthologous genes?

1,381 views
Skip to first unread message

Lingyun Chen

unread,
Jul 21, 2013, 7:33:13 PM7/21/13
to pamlso...@googlegroups.com
Dear professor,

I want to use the Codeml to calculate the Ka/Ks values for each pair of orthologous gene from two species. I have more than 2, 000 pairs.
Could you tell me how I can run the Codeml? so I can calculate the Ka/Ks for the 2, 000 pairs of genes quickly (no need to analysis the genes one by one). Maybe, a script?
Thank you.
Best regards.

Yours,
Ling-Yun Chen

Feifei Zhang

unread,
Jul 23, 2013, 12:16:36 PM7/23/13
to pamlso...@googlegroups.com
Hi,

I think the number of gene pairs is not a problem. You can put them all in one single file and run codeml once. See PAML FAQ page 10:

How can I get codeml to accept more that one data set at a time?
You use the control variable ndata.
By default, the line is commented out with the asterisk * at the beginning of the line. Make sure
you remove it.


I do think the number of species is too small. Also see FAQ page 9:

How many species are needed?
I suppose the absolute minimum is 4 or 5 if the sequence divergence is optimal. 10 would
be good, while 20 would be much better. This will depend on how divergent the sequences
are.


Hope this helps.

Feifei

Lingyun Chen

unread,
Jul 26, 2013, 3:35:23 AM7/26/13
to pamlso...@googlegroups.com
Dear Feifei,

Thanks for your useful suggestions.
Could you tell me the format of the file, which includes multiple datasets?
For example, how can i separate the data set 1 (for example, rbcl from species 1 and 2), data set 2 (matk from species 1 and 2), data set 3....?

'I do think the number of species is too small. Also see FAQ page 9:'
However, a paper (Zhang et al., 2013, BMC Genomics,14:329; ) just used two species to detect the positively selected genes
'For the remaining orthologs, synonymous substitution rates (Ks) and non-synonymous rates (Ka) were estimated using a maximum-likelihood method [46] implemented by yn00 in the PAML toolkit'.

I am confused by the paper.

Could you help me?

Best regards.

Sincerely yours,
Ling-Yun

Ziheng

unread,
Sep 5, 2013, 6:19:20 PM9/5/13
to pamlso...@googlegroups.com
Example.
Ziheng yang


2 6
a tct ttt
b tcc ttt


2 9
a tct ttt aaa
b tcc ttt aag

Shiliang HU

unread,
Oct 16, 2013, 6:01:33 AM10/16/13
to pamlso...@googlegroups.com
Dear Prof.Yang,

For calculating the Ks/Ks for genes, there are 80 genes for me,  I I followed your idea to  make sequence file like :

but how to make the tree file ?,as we need to prepare the tree files right?


Shiliang HU

unread,
Oct 28, 2013, 1:29:41 PM10/28/13
to pamlso...@googlegroups.com
 Dear Ling-Yun Chen,

have you solve your problem? hope I can learn some thing from you.

how to contact with you ?

Best Regards
HU


Ziheng

unread,
Dec 3, 2013, 4:54:37 PM12/3/13
to pamlso...@googlegroups.com
I think the you use ndata in the control file to specify the number of alignments.
The tree can be simply
(1,2);

Or yn00 does not need a tree if that is what you are using.
The document is in the doc folder.
Ziheng

Lingyun Chen

unread,
Jun 20, 2015, 3:52:39 PM6/20/15
to pamlso...@googlegroups.com
I resolved the problems by using perl script. You can email me lych...@qq.com. If you need helps
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

clin...@umn.edu

unread,
Jan 23, 2018, 10:59:46 AM1/23/18
to PAML discussion group
I used a simple perl script to run codeml. 


system "rm -f codeml.ctl"; 

if($ARGV[0]=~/([^\/]+$)/){
 $output=$1;
 }

open (FILE,">codeml.ctl")||die $!;

print FILE " seqfile = $ARGV[0] * sequence data filename
 treefile = common_Rbu.trees * tree structure file name
 outfile = $output.Rbu_null_codeml * main result file name

 noisy = 0   * 0,1,2,3,9: how much rubbish on the screen
 verbose = 0   * 1: detailed output, 0: concise output
 runmode = 0   * 0: user tree;  1: semi-automatic;  2: automatic
 * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

 seqtype = 1   * 1:codons; 2:AAs; 3:codons-->AAs
 CodonFreq = 2   * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
 model = 2   * codonml:  0:one, 1:b, 2:2 or more dN/dS ratios for branches

 NSsites = 2   * 0:one w;1:neutral;2:positive; 3:discrete;4:freqs;
 * 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ
 * 10:beta&1+gamma; 11:beta&1>normal; 12:0&2normal; 13:3normal
 icode = 0   * 0:standard genetic code; 1:mammalian mt; 2-10:see below
 Mgene = 0   * codonml: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff

 fix_kappa = 1   * 1: kappa fixed, 0: kappa to be estimated
 kappa = 1   * initial or fixed kappa
 fix_omega = 1   * 1: omega or omega_1 fixed, 0: estimate 
 omega = 1  * initial or fixed omega, for codons or codon-based AAs

 ncatG = 3   * # of categories in dG of NSsites models

 getSE = 0   * 0: don't want them, 1: want S.E.s of estimates

 Small_Diff = 3e-7
 cleandata = 0  * remove sites with ambiguity data (1:yes, 0:no)?
 method = 0  * 0: simultaneous; 1: one branch at a time.\n";
 
close FILE;

system "./codeml.exe";

system "mv $output.Rbu_null_codeml codeml_result_file"; 

I cannot upload the example data. So if you want. Please write to me lych...@qq.com
Reply all
Reply to author
Forward
0 new messages