error in codeml data analysis

Skip to first unread message

Banisha Phukela

Jan 1, 2024, 7:47:18 AMJan 1
to PAML discussion group

I have been trying to run codeml analysis for my data file of 82 sequences for dn/ds values. However, it is showing an unexpected error in sequence file. I don't understand whether its a problem with the sequence file. Could you please help. I am also sharing my sequence data file with you.

Ambiguity character definition table:

T (1): T
C (1): C
A (1): A
G (1): G
U (1): T
Y (2): T C
R (2): A G
M (2): C A
K (2): T G
S (2): C G
W (2): T A
H (3): T C A
B (3): T C G
V (3): C A G
D (3): T A G
- (4): T C A G
N (4): T C A G
? (4): T C A G
ns = 82         ls = 3117
Reading sequences, sequential format..
Reading seq # 1: Aqcoe6G084200

Error in sequence data file: Q at 1010 seq 1.
Make sure to separate the sequence from its name by 2 or more spaces.

Janet Young

Jan 1, 2024, 2:40:11 PMJan 1
to PAML discussion group

Hi Banisha,


Make sure the two numbers on the first line match to (a) the number of sequences and (b) the number of bases.


The “3117” at the top of your file looks wrong, so PAML doesn’t know where the first sequence ends.


Also – it looks like your sequences are not aligned.  Your input sequence file should be codon-aligned, otherwise your results will make no sense.   There is more information here


All the best,



Banisha Phukela

Jan 2, 2024, 10:34:30 AMJan 2
to PAML discussion group
Hi Janet,

thanks,  I aligned the file and it worked. 
now, the error is in tree file. It shows that i need to define branches (because I am doing branch site model so i think i should define foreground branches).
Is it manually done in the tree or there are other ways


Janet Young

Jan 2, 2024, 3:27:47 PMJan 2
to PAML discussion group
here's the documentation - read the section on "Tree file format and representations of tree topology"

it can be done manually, or programmatically e.g. in R/ape and I'm sure there are similar python tools.

Banisha Phukela

Jan 3, 2024, 10:14:39 AMJan 3
to PAML discussion group
hey there,

I want to calculate dn/ds values for my dataset. In my output files, it was completely missing. I am sharing with you my control file and output files with you all. Please let me know if i have to configure my control file and run analysis again.


Janet Young

Jan 3, 2024, 12:10:00 PMJan 3
to PAML discussion group
I haven't looked in detail at your input and output, but I can see the dN/dS, I think.  In the aln.out file you will find a line that says "MLEs of dN/dS (w) for site classes (K=4)", and underneath that are the dN/dSs and proportions of sites in each class.

Banisha Phukela

Jan 3, 2024, 12:37:49 PMJan 3
to PAML discussion group
yes i can see in my output file.. But i wanted to know for few specified foreground branches?
MLEs of dN/dS (w) for site classes (K=4)

site class             0        1       2a       2b
proportion       0.54413  0.40235  0.03077  0.02275
background w     0.15118  1.00000  0.15118  1.00000
foreground w     0.15118  1.00000  8.14856  8.14856

Janet Young

Jan 3, 2024, 2:58:48 PMJan 3
to PAML discussion group
It's a bit unclear what it is you really want to learn from your sequences.  You used model=2, meaning that codeml assumes there are two classes of branches, foreground and background, so it is not analyzing each branch individually.  You also have NSsites=2, meaning the sites are three dN/dS classes. The free-ratio analysis with NSsites=0  might be more similar to what you want - I'm not sure. See the FAQs, this section "How do I test positive selection along specific lineages?" for more information, as well as the paper it references.  

It is worth taking the time to read a paper or two to get a better understanding of what PAML can actually do: there are several possible analyses. That will help you think about whether you are you actually interested in understanding how dN/dS varies between sites, or more interested in dN/dS along individual branches (free-ratio analysis).

Reply all
Reply to author
0 new messages