Special Characters in Tcoffee .fasta Interpretation

Alex Men Tong

unread,

Jun 24, 2024, 8:50:38 PM6/24/24

to Tcoffee

Hi!

I work with Peter Yoon and am attempting to implement a DALI to Tcoffee workflow.

Tcoffee runs perfectly locally, but upon processing the same files through our server, Tcoffee seems to produce nonASCII characters in the first merge step causing a fatal error when merging the final msa. Our files and scripts are all UTF-8 encoded and the error only appears upon the first Tcoffee call in a singularity container.

A workaround to this is to isolate each merge step and iterate the merging however this scales the processing time by 100x. The characters are printable in ISO-8859-1 Latin 1 encryption. Using a previous version and simply implementing sanitization functions do not fix this issue.

I was wondering if we are surpassing Tcoffee's processing limit like the MAXNPID error of previous models or if there were any encryption logic errors.

Thanks!

Alex

Cedric Notredame

unread,

Jun 25, 2024, 2:04:03 AM6/25/24

to tco...@googlegroups.com

Dear Alex

I do not think it has to do with the max-PID. If you could send one of the faulty output I will investigate

Not much thought got into the encoding. I am just using the standard ascii, there is also not too much checking going on on the input side and it could be that some char slipping through may cause damages at some point.

Cheers

Cedric

--

Dr Cedric Notredame, PhD
ORCID - https://orcid.org/0000-0003-1461-0988

On 25 Jun 2024, at 02:50, 'Alex Men Tong' via Tcoffee <tco...@googlegroups.com> wrote:

Hi!

--
You received this message because you are subscribed to the Google Groups "Tcoffee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tcoffee+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tcoffee/1430f5fb-4eb9-41c3-86b3-d2c750e150fdn%40googlegroups.com.

Message has been deleted

Alex Men Tong

unread,

Sep 9, 2024, 5:28:38 PM9/9/24

to tco...@googlegroups.com, Peter Hyungjun Yoon, Kenneth Loi

Dear Cedric,

Sincere apologies for the late email.

I’ve attached below some files containing a test run on 16 .fasta sequences and the corresponding T-Coffee slurm.out output. Below is a screenshot example of some of the non-printable characters. In the folder I’ve also included the script we used to run T-Coffee in this instance. In this script, we are simply combining many pairwise alignments to a reference into an MSA. Our HPC OS is Ubuntu 22.04.4 LTS, and we are currently using T-Coffee Version_13.46.0.919e8c6b (though other versions lead to the same issue). Quite notably, regardless of whether we use a pre-compiled copy, re-compile ourselves, or even use a container provided on github, the issue persists when the program is run on HPC. Interestingly, on my PC, I am able to run it without issue, although this severely limits the usage.

Our current workaround involves removing the “faulty” files found and rerunning T-coffee, however this is not optimal. These files have no issues or special characters initially.

I hope this will help you in your investigation and please let me know if you have any other requests/inquiries! Thank you for your time.

Best,

Alex

Screenshot 2024-09-09 at 12.36.24 PM.png

You received this message because you are subscribed to a topic in the Google Groups "Tcoffee" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tcoffee/ohXf_QbXPsM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tcoffee+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tcoffee/829CB9E1-41DE-4540-A637-F00307563894%40crg.eu.

T_Coffee_debugging.zip

Reply all

Reply to author

Forward