How to calculate Ka/Ks value for large amount of orthologous genes?

Lingyun Chen

unread,

Jul 21, 2013, 7:33:13 PM7/21/13

to pamlso...@googlegroups.com

Dear professor,

I want to use the Codeml to calculate the Ka/Ks values for each pair of orthologous gene from two species. I have more than 2, 000 pairs.

Could you tell me how I can run the Codeml? so I can calculate the Ka/Ks for the 2, 000 pairs of genes quickly (no need to analysis the genes one by one). Maybe, a script?

Thank you.

Best regards.

Yours,

Ling-Yun Chen

Feifei Zhang

unread,

Jul 23, 2013, 12:16:36 PM7/23/13

to pamlso...@googlegroups.com

Hi,

I think the number of gene pairs is not a problem. You can put them all in one single file and run codeml once. See PAML FAQ page 10:

How can I get codeml to accept more that one data set at a time?

You use the control variable ndata.

By default, the line is commented out with the asterisk * at the beginning of the line. Make sure

you remove it.

I do think the number of species is too small. Also see FAQ page 9:

How many species are needed?

I suppose the absolute minimum is 4 or 5 if the sequence divergence is optimal. 10 would

be good, while 20 would be much better. This will depend on how divergent the sequences

are.

Hope this helps.

Feifei

Lingyun Chen

unread,

Jul 26, 2013, 3:35:23 AM7/26/13

to pamlso...@googlegroups.com

Dear Feifei,

Thanks for your useful suggestions.
Could you tell me the format of the file, which includes multiple datasets?
For example, how can i separate the data set 1 (for example, rbcl from species 1 and 2), data set 2 (matk from species 1 and 2), data set 3....?

'I do think the number of species is too small. Also see FAQ page 9:'
However, a paper (Zhang et al., 2013, BMC Genomics,14:329; ) just used two species to detect the positively selected genes
'For the remaining orthologs, synonymous substitution rates (Ks) and non-synonymous rates (Ka) were estimated using a maximum-likelihood method [46] implemented by yn00 in the PAML toolkit'.

I am confused by the paper.

Could you help me?

Best regards.

Sincerely yours,
Ling-Yun

Ziheng

unread,

Sep 5, 2013, 6:19:20 PM9/5/13

to pamlso...@googlegroups.com

Example.
Ziheng yang

2 6
a tct ttt
b tcc ttt

2 9
a tct ttt aaa
b tcc ttt aag

Shiliang HU

unread,

Oct 16, 2013, 6:01:33 AM10/16/13

to pamlso...@googlegroups.com

Dear Prof.Yang,

For calculating the Ks/Ks for genes, there are 80 genes for me, I I followed your idea to make sequence file like :

but how to make the tree file ?,as we need to prepare the tree files right?

Shiliang HU

unread,

Oct 28, 2013, 1:29:41 PM10/28/13

to pamlso...@googlegroups.com

Dear Ling-Yun Chen,

have you solve your problem? hope I can learn some thing from you.

how to contact with you ?

Best Regards
HU

Ziheng

unread,

Dec 3, 2013, 4:54:37 PM12/3/13

to pamlso...@googlegroups.com

I think the you use ndata in the control file to specify the number of alignments.
The tree can be simply
(1,2);

Or yn00 does not need a tree if that is what you are using.
The document is in the doc folder.
Ziheng

Lingyun Chen

unread,

Jun 20, 2015, 3:52:39 PM6/20/15

to pamlso...@googlegroups.com

I resolved the problems by using perl script. You can email me lych...@qq.com. If you need helps

Message has been deleted

clin...@umn.edu

unread,

Jan 23, 2018, 10:59:46 AM1/23/18

to PAML discussion group

I used a simple perl script to run codeml.

system "rm -f codeml.ctl";

if($ARGV[0]=~/([^\/]+$)/){

$output=$1;

}

open (FILE,">codeml.ctl")||die $!;

print FILE " seqfile = $ARGV[0] * sequence data filename

treefile = common_Rbu.trees * tree structure file name

outfile = $output.Rbu_null_codeml * main result file name

noisy = 0 * 0,1,2,3,9: how much rubbish on the screen

verbose = 0 * 1: detailed output, 0: concise output

runmode = 0 * 0: user tree; 1: semi-automatic; 2: automatic

* 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs

CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table

model = 2 * codonml: 0:one, 1:b, 2:2 or more dN/dS ratios for branches

NSsites = 2 * 0:one w;1:neutral;2:positive; 3:discrete;4:freqs;

* 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ

* 10:beta&1+gamma; 11:beta&1>normal; 12:0&2normal; 13:3normal

icode = 0 * 0:standard genetic code; 1:mammalian mt; 2-10:see below

Mgene = 0 * codonml: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff

fix_kappa = 1 * 1: kappa fixed, 0: kappa to be estimated

kappa = 1 * initial or fixed kappa

fix_omega = 1 * 1: omega or omega_1 fixed, 0: estimate

omega = 1 * initial or fixed omega, for codons or codon-based AAs

ncatG = 3 * # of categories in dG of NSsites models

getSE = 0 * 0: don't want them, 1: want S.E.s of estimates

Small_Diff = 3e-7

cleandata = 0 * remove sites with ambiguity data (1:yes, 0:no)?

method = 0 * 0: simultaneous; 1: one branch at a time.\n";

close FILE;

system "./codeml.exe";

system "mv $output.Rbu_null_codeml codeml_result_file";

I cannot upload the example data. So if you want. Please write to me lych...@qq.com

Reply all

Reply to author

Forward