T-coffee expresso configuration question

91 views
Skip to first unread message

David Mathog

unread,
Jan 15, 2020, 4:22:39 PM1/15/20
to tco...@googlegroups.com
Greetings,

We have T-COFFEE_distribution_Version_13.39.0.d675aed.tar.gz installed.
It is activated as an LMOD module which was configured like this:


(script to configure is here:

ftp://newsaf.bio.caltech.edu/pub/software/linux_or_unix_tools/module_generate_from_directory.sh
)


TOPDIR=/usr/common/modules/el7/x86_64/software/t-coffee/13.39.0-CentOS-vanilla
module_generate_from_directory.sh \
t-coffee \
13.39.0 \
CentOS/vanilla \
$TOPDIR \
"A collection of tools for Computing, Evaluating and Manipulating
Multiple Alignments of DNA, RNA, Protein Sequences and Structures." \
"http://www.tcoffee.org/"
cat
>>/usr/common/modules/el7/x86_64/modules/all/t-coffee/13.39.0-CentOS-vanilla.lua <<'EOD'
-- added manually
prepend_path("PATH", root .. "/bin/linux")
setenv("DIR_4_TCOFFEE",root)
setenv("PLUGINS_4_TCOFFEE",root .. "/plugins/linux")
setenv("TMP_4_TCOFFEE","/tmp/TCOFFEE/tmp")
setenv("LOCKDIR_4_TCOFFEE","/tmp/TCOFFEE/lockdir")
setenv("CACHE_4_TCOFFEE","/tmp/TCOFFEE/cache")
setenv("PDB_DIR","/pdb")
setenv("NO_REMOTE_PDB_DIR","1")
EOD

The contents of /pdb match the download site, that is the top level
consists of
directories which are two letter hashes like "b3" and in those
directories there are
files like "1b30.pdb.gz". It does not have any "unreleased" data
locally.

The intent is to NOT do any local blasts or other searches against that
PDB database, just
to retrieve from it when needed.

Some problem using Expresso like this though:

module load t-coffee
t_coffee -mode expresso -seq /tmp/three.pfa -email
mat...@caltech.edu >three.out 2>&1

the output file "tree.out" contains a very large number of warnings
like:

19541 -- WARNING: PDB_ENTRY_TYPE_FILE must be set to the location of
<pdb>/derived_data/pdb_entry_type.txt when using NO_REMOTE_PDB_DIR=1
19541 -- WARNING: Cannot find pdb_entry_type.txt; 3CHNC is assumed to
be valid; add
ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_entry_type.txt in
/tmp/TCOFFE
E/cache/// to automatically check name status
19541 -- WARNING: UNREALEASED_FILE must be set to the location of your
unrealeased.xml file as downloaded from
http://www.rcsb.org/pdb/rest/getUnreleased when
using NO_REMOTE_PDB_DIR=1
19541 -- WARNING: UNREALEASED_FILE must be set to the location of your
unrealeased.xml file as downloaded from
http://www.rcsb.org/pdb/rest/getUnreleased when
using NO_REMOTE_PDB_DIR=1
19541 -- WARNING: Cannot find unrealeased.xml; 3CHNC is assumed to be
released;

Each of the three input sequences also generates a:

>one No Template Selected

message. That is odd because these three are derived from pir:a1hu,
each with tiny modifications. Unmodified pir:a1hu matches perfectly
with Swissprot P01876.2, for which there are PDB structures:

https://www.ncbi.nlm.nih.gov/protein/P01876.2?report=genbank&log$=prottop&blast_rank=1&RID=

So I think something must be configured wrong here. Any idea what that
might be?

The input and output files are attached.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
three.pfa
three.out

David Mathog

unread,
Jan 15, 2020, 6:03:27 PM1/15/20
to tco...@googlegroups.com
On 2020-01-15 13:22, David Mathog wrote:

>
> TOPDIR=/usr/common/modules/el7/x86_64/software/t-coffee/13.39.0-CentOS-vanilla
> module_generate_from_directory.sh \
> t-coffee \
> 13.39.0 \
> CentOS/vanilla \
> $TOPDIR \
> "A collection of tools for Computing, Evaluating and Manipulating
> Multiple Alignments of DNA, RNA, Protein Sequences and Structures." \
> "http://www.tcoffee.org/"
> cat
> >>/usr/common/modules/el7/x86_64/modules/all/t-coffee/13.39.0-CentOS-vanilla.lua <<'EOD'
> -- added manually
> prepend_path("PATH", root .. "/bin/linux")
> setenv("DIR_4_TCOFFEE",root)
> setenv("PLUGINS_4_TCOFFEE",root .. "/plugins/linux")
> setenv("TMP_4_TCOFFEE","/tmp/TCOFFEE/tmp")
> setenv("LOCKDIR_4_TCOFFEE","/tmp/TCOFFEE/lockdir")
> setenv("CACHE_4_TCOFFEE","/tmp/TCOFFEE/cache")
> setenv("PDB_DIR","/pdb")
> setenv("NO_REMOTE_PDB_DIR","1")
> EOD



>
> the output file "tree.out" contains a very large number of warnings
> like:
>
> 19541 -- WARNING: PDB_ENTRY_TYPE_FILE must be set to the location of
> <pdb>/derived_data/pdb_entry_type.txt when using NO_REMOTE_PDB_DIR=1
> 19541 -- WARNING: Cannot find pdb_entry_type.txt; 3CHNC is assumed to
> be valid; add
> ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_entry_type.txt in
> /tmp/TCOFFE
> E/cache/// to automatically check name status
> 19541 -- WARNING: UNREALEASED_FILE must be set to the location of your
> unrealeased.xml file as downloaded from
> http://www.rcsb.org/pdb/rest/getUnreleased when
> using NO_REMOTE_PDB_DIR=1
> 19541 -- WARNING: UNREALEASED_FILE must be set to the location of your
> unrealeased.xml file as downloaded from
> http://www.rcsb.org/pdb/rest/getUnreleased when
> using NO_REMOTE_PDB_DIR=1
> 19541 -- WARNING: Cannot find unrealeased.xml; 3CHNC is assumed to be
> released;

Figured out part of it.

cd /pdb
wget ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_entry_type.txt
wget http://www.rcsb.org/pdb/rest/getUnreleased
ln -s getUnreleased unrealeased.xml
ln -s getUnreleased unreleased.xml

Then add to the module definition:

setenv("PDB_ENTRY_TYPE_FILE","/pdb/pdb_entry_type.txt")
setenv("PDB_UNREALEASED_FILE","/pdb/unreleased.xml")

(Yes, "UNREALEASED", not "UNRELEASED".)

#then this runs without all the warnings:

module purge
module load t-coffee
t_coffee -mode expresso -seq /tmp/three.pfa -email mat...@caltech.edu
>three.out 2>&1


The alignment is a little off though, because one of the small changes
is before the part which has a structure:

CLUSTAL FORMAT for T-COFFEE Version_13.39.0.d675aed
[http://www.tcoffee.org] [MODE: expresso ], CPU=0.00 sec, SCORE=100,
Nseq=3, Len=353

one ASPTSPKVFPLSLCSTQPDGNVVIACLVQGFFPQEPLSVTWSESGQGVTA
two ----ASPTSPFPLSLCSTQPDGNVVIACLVQGFFEPLSVTWSESGQGVTA
three ASPTSPKVFPLSLCSTQPDGNVVIACLVQGFFPQEPLSVTWSESGQGVTA
:. . *:.*. ..: : :. ****************

should be:

one ASPTSPKVFPLSLCSTQPDGNVVIACLVQGFFPQEPLSVTWSESGQGVTA
two ASPTSP FPLSLCSTQPDGNVVIACLVQGFF EPLSVTWSESGQGVTA
three ASPTSPKVFPLSLCSTQPDGNVVIACLVQGFFPQEPLSVTWSESGQGVTA
******::************************::****************

I don't know how expresso works, but one might imagine that it would
fall back
to some other alignment mode between conserved domains. If it is doing
so then
there is a bug somewhere, since alignment in this region was way off.

Regards,

Miles Pemberton

unread,
Feb 23, 2022, 9:26:40 AM2/23/22
to Tcoffee
Hi David,

I am having a similar problem trying to run T-Coffee expresso locally. 

I have installed the software as specified in the main documentation and am running the same command line option as runs when I use the webserver:

t_coffee -in=sequence.fasta -mode=expresso -blast=LOCAL -pdb_db=/projects/x/y/z/pdb_seqres.fasta -evaulaue_mode=t_coffee_slow -output=score_htlm clustalw_aln fasta_aln score_ascii phylip -maxnseq=150 -maxlen=2500 -case=upper -seqnos=off -outorder=input -run_name=result -multi_core=4 -quiet=stdout

I am getting repeated warnings of:

WARNING: Could not download http://www.rcsb.org/pdb/rest/getUnreleased
WARNING: Cannot find unrealeased.xml; 1abcA is assumed to be released;

Was this your fix for that:


    cd /pdb
    wget ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_entry_type.txt
    wget http://www.rcsb.org/pdb/rest/getUnreleased
    ln -s getUnreleased unrealeased.xml
    ln -s getUnreleased unreleased.xml

    Then add to the module definition:

    setenv("PDB_ENTRY_TYPE_FILE","/pdb/pdb_entry_type.txt")
    setenv("PDB_UNREALEASED_FILE","/pdb/unreleased.xml")

If so, please could you provide more detail on how to do this? For example what do you mean by 'add to the module definition'? Or perhaps someone could suggest something else that is wrong with my input?

Thanks in advance.

Regards,

Miles

Athanasios Baltzis

unread,
Feb 24, 2022, 3:54:04 AM2/24/22
to tco...@googlegroups.com
Hi David,

Thank you for reporting this bug. We have pushed a fix on github (https://github.com/cbcrg/tcoffee/tree/0125a58be083d3e3a41368761f70387c967f26f9). You should be able now to run expresso locally without any warning.

Please feel free to contact us, if you encounter any additional issues.

Best,

--
You received this message because you are subscribed to the Google Groups "Tcoffee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tcoffee+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tcoffee/d25fb43e-8c53-4324-8ec2-c15020b90cc4n%40googlegroups.com.


--
Athanasios Baltzis
PhD Fellow in Bioinformatics | Data scientist
Notredame's lab - Comparative Bioinformatics Group
Centre for Genomic Regulation (CRG), Barcelona (Spain)
Reply all
Reply to author
Forward
0 new messages