failure in reconstructing ancestor sequences

49 views
Skip to first unread message

dvd

unread,
Jan 12, 2024, 11:35:02 AMJan 12
to PAML discussion group
Dear friends, 

I fail at getting ancestor sequences for my alignment+tree pair using codeml.
My input is an alignment of 25 sequences and associated tree with same ideas. 
Alignment seems to be formatted correctly, codeml was complaining at first but
now i think i get the program to start, though in throws the error below. I bet the
problem comes from the formatting of the tree. 

the different files i attach are:
codeml.ctl    itself , i tried different set of parameters ... let me know which
one help to reconstruct ancestors, 
the 2 input files:
test_aa.fa
test.phy_phyml_tree.txt
the output file:
results.out

 7         runmode | runmode                0.00
  4         seqtype | seqtype                2.00
 16           model | model                  2.00
 20         NSsites | NSsites                0.00
 22           icode | icode                  0.00
 23           Mgene | Mgene                  1.00
  9           clock | clock                  0.00
 37     fix_blength | fix_blength            0.00
 11           getSE | getSE                  1.00
 12    RateAncestor | RateAncestor=1         1.00
 15         verbose | verbose                0.00
  6       cleandata | cleandata              1.00
AAML in paml version 4.9j, October 2019
ns = 25   ls = 130
Reading sequences, sequential format..
Reading seq # 1: genomic      
Reading seq # 2: 11816      
Reading seq # 3: 5743      
Reading seq # 4: 414      
Reading seq # 5: 2863      
Reading seq # 6: 6316      
Reading seq # 7: 1931      
Reading seq # 8: 9173      
Reading seq # 9: 20557      
Reading seq #10: 57247      
Reading seq #11: 918      
Reading seq #12: 11134      
Reading seq #13: 892      
Reading seq #14: 81254      
Reading seq #15: 8863      
Reading seq #16: 15511      
Reading seq #17: 10410      
Reading seq #18: 19311      
Reading seq #19: 12924      
Reading seq #20: 14744      
Reading seq #21: 24898      
Reading seq #22: 108292      
Reading seq #23: 6632      
Reading seq #24: 30824      
Reading seq #25: 13058      

Sites with gaps or missing data are removed.

    12 ambiguity characters in seq. 1
     8 ambiguity characters in seq. 2
     8 ambiguity characters in seq. 3
     8 ambiguity characters in seq. 4
     8 ambiguity characters in seq. 5
     8 ambiguity characters in seq. 6
     8 ambiguity characters in seq. 7
     8 ambiguity characters in seq. 8
     8 ambiguity characters in seq. 9
     8 ambiguity characters in seq. 10
     8 ambiguity characters in seq. 11
     8 ambiguity characters in seq. 12
     8 ambiguity characters in seq. 13
     8 ambiguity characters in seq. 14
     8 ambiguity characters in seq. 15
     8 ambiguity characters in seq. 16
     8 ambiguity characters in seq. 17
     8 ambiguity characters in seq. 18
     8 ambiguity characters in seq. 19
     8 ambiguity characters in seq. 20
     8 ambiguity characters in seq. 21
     8 ambiguity characters in seq. 22
     8 ambiguity characters in seq. 23
     8 ambiguity characters in seq. 24
     8 ambiguity characters in seq. 25
12 sites are removed.  10 31 32 33 34 60 61 73 107 108 116 117
Sequences read..
Counting site patterns..  0:00
Compressing,     45 patterns at    118 /    118 sites (100.0%),  0:00
Collecting fpatt[] & pose[],     45 patterns at    118 /    118 sites (100.0%),  0:00
Counting frequencies..

     2400 bytes for distance
    14400 bytes for conP
        2 bytes for fhK
  5000000 bytes for space

TREE #  1
(((((((((((15, 13), 2), 20), (18, 23)), 1), 14), (25, 6)), ((22, 21), 24)), ((17, 16), 19)), ((12, (10, 11)), ((7, 8), 9))), 5, (4, 3));   MP score: 36
   165600 bytes for conP, adjusted


Reading matrix from jones.dat
error when opening file jones.dat
tell me the full path-name of the file?

results.out
test_aa.phy
codeml.ctl
test.phy_phyml_tree.txt

Sandra AC

unread,
Jan 23, 2024, 2:17:27 PMJan 23
to PAML discussion group
Hi there,

It seems that you are not specifying the location of the `jones.dat` file. Note that you need to include the `aaRatefile` option in your control file. 

You can find all the available matrices in the `dat` directory that you should have when you downloaded the PAML software. If you cannot find this, you can go to the `dat` directory in the PAML GitHub repository. Please download the matrix that you want to use for your analysis and save the file in your preferred location. Then, please type the absolute/relative path to this file in the `aaRatefile` option in your control file. That should fix the problem :) You can find all the information regarding the control file settings in the PAML documentation too as well as other notes for ancestral sequence reconstruction analyses.

Hope this helps!
S.
Reply all
Reply to author
Forward
0 new messages