codeml stalled

51 views
Skip to first unread message

Amber Nashoba

unread,
Mar 5, 2024, 5:45:31 PMMar 5
to PAML discussion group
On my desktop I am running codeml site models for three different data sets with 6,10, and 16 species and up to 53 genes. My machine is 2.3 GHz, six core i7 with 16GB of memory. It batched through the genes fine for the first several ours, then has now each been stuck on a gene for the past 22 hours.
-To my knowledge there is no particular quality of those genes that should be causing an error. 
-There are no errors messages appearing on the terminal window
-Sequences are named in proper conventions and in correct multiples of 3
Thanks in advance for any help

Sandra AC

unread,
Mar 7, 2024, 4:53:27 AMMar 7
to PAML discussion group
Hi Amber,

It is quite hard to troubleshoot what may be going on without the input files and control files you are using nor the output file or screen output generated by CODEML. Could you please share this info with us so that we can see which settings you have specified and/or whether there are some issues with the format of your input files (alignment and tree)?

All the best,
S.

Amber Nashoba

unread,
Mar 7, 2024, 1:44:19 PMMar 7
to PAML discussion group
That is part of the problem. There is no output to show... it just stalls. no error message, just no progress. think old beachball when your desktop freeezes. Puttering around, I have found that twice if I remove sequences in the gene it stalls on, that have lots of ?, then the gene sometimes will complete the run. The most recent stall, has no such issue, it's alignment looks fairly normal. .
I have attached the sequence, tree, and control files
OG0013713.raxml.bestTree
OG0013713_8a_Ctl_File.txt
OG0013713_RepOrthologues.fa
OG0013713_01278_Ctl_File.txt

Sandra AC

unread,
Mar 11, 2024, 1:00:20 PMMar 11
to PAML discussion group
Hi Amber,

I have just had time to check your input files now, apologies for the delay. It seems that you are not using the correct format (please see the PAML Wiki for details about required format). You may want to read our latest protocol (Álvarez-Carretero et al., 2023) before you run your analysis to get familiar with the input files and the tests that you can run with CODEML. In addition, you will find all the details regarding input files, examples of tests that you can run, and results interpretation.

In short, your sequence file is in FASTA format and your tree file still has branch lengths. You need to format these input files so that your alignment file is in PHYLIP format and your tree file contains a PHYLIP header (<num_taxa> <num_trees, e.g., `5 1` in your case) followed by the tree in Newick format without branch lengths or other labels that are not labels to be used by CODEML (e.g., labels to identify foreground branches under the branch and branch-site model) -- please refer to the PAML Wiki or our protocol for further details. When checking your input files, I also incorporated only the relevant options in the control file to run a sites model in batch (M0, M1a, M2a, M7, and M8; you also had tried to run this in your contrl file) as you had other options that are not relevant (please check our GitHub repository, which is supposed to be used alongside our protocol). You seem to have a STOP codon in seq #3 (Mac.mul_NP_001035330.1) as per the warning in CODEML, so you may want to check the sequences before you run CODEML -- please note that you should run CODEML only once you have finished to filter your data and you have a well curated alignment and tree files after all filtering processes.

Once you filter your dataset and make the required formatting changes, you shall be able to run CODEML. I have attached the modified files with which I have troubleshooted your issue in case you want to use them as templates :)

Hope this helps!
S.
OG0013713_RepOrthologues.phy
OG0013713.tree
OG0013713_01278_Ctl_File.ctl
Reply all
Reply to author
Forward
0 new messages