Segmentation fault from codeml in the presence of too many (?) ambiguous codons

106 views
Skip to first unread message

Jean-Baka Domelevo Entfellner

unread,
Jan 19, 2021, 2:15:45 PM1/19/21
to PAML discussion group
Dear Ziheng, dear all,

I am running into trouble with codeml 4.10.0 (the current GitHub HEAD, last commit from Sep 2020), executing on Debian GNU/Linux 64bit (Linux 5.9.0-2-amd64 #1 SMP Debian 5.9.6-1 (2020-11-08) x86_64 GNU/Linux). I repeatedly get a segmentation fault right after the message "Counting codons...", whichever level of verbosity (I tried from 0 to 3).

Reading one of Ziheng's answers to a similar (?) issue, I have tried different compilation options:
- compiling according to the original Makefile
- cc -o codeml -O3 codeml.c tools.c -lm
- cc -o codeml -O2 codeml.c tools.c -lm
- cc -o codeml -g codeml.c tools.c -lm

And all give the same execution with the same segfault in the end.

I can see this doesn't happen when I use the option cleandata = 1, but then the number of codon sites shrinks from 80 to 5, which is not very reasonable in terms of the robustness of results calculated on 5 sites...

In this specific example I have 60 sequences of 240 nt, I am trying to fit a simple model (model = 0, NSsites = 0, and I have tried values 0, 1 and 2 for CodonFreq, all yielding the same error).

The tail of my execution log is as follows:

stop codon TAG in seq. #  59 (LR882593_l), nucleotide site 193
 1 columns are converted into ??? because of stop codons
Press Enter to continue

Sequences read..
Counting site patterns..  0:01
Compressing,     79 patterns at     80 /     80 sites (100.0%),  0:01
Collecting fpatt[] & pose[],     79 patterns at     80 /     80 sites (100.0%),  0:01
158 ambiguous codons are seen in the data:
 ??? AMC AAS ACR WTC CYC CCK SCR CKC MRG GAS RCY ARC ATR CCR CMG CRG GAM RYY RRC AWR CCS GCR CWG ARG AYY GRC AMT AWG MAG AGS ASC YAG RAC GGY TAS ATM RTC SAR RCC GYG --- AC- SCG CSG CAR MGG GRM AWC RAG AYC SRG RAS RCM AAM RKC SAG GMT MTC CYY GGS MSG AKM GRT A-- RWG KCC GYC GRA RMM -AC AT- CAS CGR SGG TAY YTC RYC RGC RGG YGG AAY RSC RRT RRG WRC RST RMT TTY AYT RYM TRT CYK SRC KAC ACM MRC AKG CKY SYK GKG KCA CSR CGY KKC KMT RKG YCC MRY GTY GCM TCY RGA MAT YAR SGA RTG WCC TYC GMG WCS AWS ATY YCG CYG SSA CKG WCY RAR CSS YKC GMM AWY RMC -GC GSA TRK TGW AYM WWG WKM YGA SYM RWC YAT WGG MYS MMY MWR MSW STG MWS GRR MRM KWS SMM AGR ACK
Counting codons..
Segmentation fault

Any idea how I could solve this issue, please?

Many thanks,
   JB

Ziheng

unread,
Mar 13, 2021, 11:50:26 AM3/13/21
to PAML discussion group
there is a limit to the number of ambiguous codons in the program.  it is either 256 or 128, depending on whether i have used signed or unsigned char for a codon, but it looks like it is 128.
i suspect there might be some gross errors or a mismatch between your data and the way the program interprets them.  why do you have those many ambiguous codons?  they do not look like any sequences that are generated by those sequencing technologies.  do you use YRW etc to represent heterozygotes, or uncertaines in the reads.  they look very strange since you have only 80 sites in the alignment.
best, ziheng
Reply all
Reply to author
Forward
0 new messages