Problem with t-coffee finding local pdb files as templates

11 views
Skip to first unread message

Esteban López Tavera

unread,
Aug 23, 2022, 11:06:19 AMAug 23
to Tcoffee
Dear T-coffee team,

I'm running t-coffee locally (T-COFFEE Version_13.45.0.4846264), with mode expresso, using a database I made from a small set of sequences. I have the corresponding pdb files, some from structure models and some crystal structures, which I'd like to be used as templates.

My command looks like this:
t_coffee -in=my_input_seqs.fa -mode=expresso\
 -blast=LOCAL\
 -pdb_db=<Absolute path to my db>\
 -pdb_type=dnm\
 -output=score_html,clustalw_aln,fasta_aln -outorder=input\
 -run_name=test -cache=$PWD -pdb_min_sim 85 -pdb_min_cov 90


I have set the environment variables accordingly:
  • PDB_DIR=<absolute path to where the template pdb files are located>
  • NO_REMOTE_PDB_DIR=1
  • PDB_ENTRY_TYPE_FILE=<Path to a txt file with my template ids, all as  prot diffraction, in the same format as the actual pdb_entry_type>
The pdb files are named in the format id_in_the_database.pdb
I also tried copying all the pdb files to the current directory.

Yet, I'm still getting the following after it fails to retrieve any template:

**<PROTEIN_ID> [PDB NOT RELEASED or WITHDRAWN]
<Sequence_ID> No Template Selected

And as a consequence, for all my sequences I get:

Method  cannot be applied to [<sequence_id1> vs <sequence_id2>], proba_pair will be used instead.

Could you help me figure out how to use my pdb files as templates?

Best,
Esteban




Athanasios Baltzis

unread,
Aug 24, 2022, 4:56:11 AMAug 24
to tco...@googlegroups.com
Dear Esteban,

I cannot reproduce the issue you mentioned using the latest tcoffee commit (https://github.com/cbcrg/tcoffee). Please update your tcoffee version.

Best,
Athanasios Baltzis
PhD Fellow in Bioinformatics | Data scientist
Notredame's lab - Comparative Bioinformatics Group
Centre for Genomic Regulation (CRG), Barcelona (Spain)
  


--
You received this message because you are subscribed to the Google Groups "Tcoffee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tcoffee+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tcoffee/8e1a95e6-938b-4f4c-9e3f-6d575ef4704cn%40googlegroups.com.

Esteban López Tavera

unread,
Aug 24, 2022, 10:03:10 AMAug 24
to Tcoffee
Thank you, Athanasios.

I have updated it, but the problem persists.

This is the code that I'm running:
export PDB_DIR="/Users/esteban/t_coffee_test/test_pdbs"
echo PDB_DIR=$PDB_DIR
echo PDB_DIR contents: $(ls $PDB_DIR)
export NO_REMOTE_PDB_DIR=1
t_coffee -version

t_coffee -in=reduced_CBP21_nopipe_ren.fa -mode=expresso -blast=LOCAL\
 -pdb_db="/Users/esteban/t_coffee_test/test_db/test_db.fa"\
 -pdb_type=dnm\
 -output=score_html,clustalw_aln,fasta_aln -outorder=input\
 -run_name=test -cache=update -pdb_min_sim 50 -pdb_min_cov 90


And this is the first chunk of output that I'm getting:
PDB_DIR=/Users/esteban/t_coffee_test/test_pdbs
PDB_DIR contents: AOM11.pdb CBP21.pdb CELS2.pdb QIS10.pdb SC10D.pdb
PROGRAM: T-COFFEE Version_13.45.60.cd84d2a (Version_13.45.60.cd84d2a)

PROGRAM: T-COFFEE Version_13.45.60.cd84d2a (Version_13.45.60.cd84d2a)
-full_log          S    [0]
-genepred_score    S    [0]     nsd
-run_name          S    [1]     test
-mem_mode          S    [0]     mem
-extend            D    [1]     1
-extend_mode       S    [0]     very_fast_triplet
-max_n_pair        D    [0]     10
-seq_name_for_quadruplet    S    [0]     all
-compact           S    [0]     default
-clean             S    [0]     no
-do_self           FL    [0]     0
-do_normalise      D    [0]     1000
-template_file     S    [1]     EXPRESSO
-setenv            S    [0]     0
-export            S    [0]     0
-template_mode     S    [0]
-flip              D    [0]     0
-remove_template_file    D    [0]     0
-profile_template_file    S    [1]     EXPRESSO
-in                S    [1]     reduced_CBP21_nopipe_ren.fa
-seq               S    [0]
-aln               S    [0]
-method_limits     S    [0]
-method            S    [1]     sap_pair
-lib               S    [0]
-profile           S    [0]
-profile1          S    [0]
-profile2          S    [0]
-pdb               S    [0]
-relax_lib         D    [0]     1
-filter_lib        D    [0]     0
-shrink_lib        D    [0]     0
-out_lib           W_F    [0]     no
-out_lib_mode      S    [0]     primary
-lib_only          D    [0]     0
-outseqweight      W_F    [0]     no
-seq_source        S    [0]     ANY
-cosmetic_penalty    D    [0]     0
-gapopen           D    [0]     0
-gapext            D    [0]     0
-fgapopen          D    [0]     0
-fgapext           D    [0]     0
-nomatch           D    [0]     0
-newtree           W_F    [0]     default
-tree              W_F    [0]     NO
-usetree           R_F    [0]
-tree_mode         S    [0]     nj
-distance_matrix_mode    S    [0]     ktup
-distance_matrix_sim_mode    S    [0]     idmat_sim1
-quicktree         FL    [0]     0
-outfile           W_F    [0]     default
-maximise          FL    [1]     1
-output            S    [1]     score_html    clustalw_aln    fasta_aln
-len               D    [0]     0
-infile            R_F    [0]
-matrix            S    [0]     default
-tg_mode           D    [0]     1
-profile_mode      S    [0]     cw_profile_profile
-profile_comparison    S    [0]     profile
-dp_mode           S    [0]     linked_pair_wise
-ktuple            D    [0]     1
-ndiag             D    [0]     0
-diag_threshold    D    [0]     0
-diag_mode         D    [0]     0
-sim_matrix        S    [0]     vasiliky
-transform         S    [0]
-extend_seq        FL    [0]     0
-outorder          S    [1]     input
-inorder           S    [0]     aligned
-seqnos            S    [0]     off
-case              S    [0]     keep
-cpu               D    [0]     0
-ulimit            D    [0]     -1
-maxnseq           D    [0]     -1
-maxlen            D    [0]     -1
-sample_dp         D    [0]     0
-weight            S    [0]     default
-seq_weight        S    [0]     no
-align             FL    [1]     1
-mocca             FL    [0]     0
-domain            FL    [0]     0
-start             D    [0]     0
-len               D    [0]     0
-scale             D    [0]     0
-mocca_interactive    FL    [0]     0
-method_evaluate_mode    S    [0]     default
-color_mode        S    [0]     new
-aln_line_length    D    [0]     0
-evaluate_mode     S    [0]     triplet
-get_type          FL    [0]     0
-clean_aln         D    [0]     0
-clean_threshold    D    [1]     1
-clean_iteration    D    [1]     1
-clean_evaluate_mode    S    [0]     t_coffee_fast
-extend_matrix     FL    [0]     0
-prot_min_sim      D    [0]     0
-prot_max_sim      D    [100]     100
-psiJ              D    [0]     3
-psitrim_mode      S    [0]     regtrim
-psitrim_tree      S    [0]     codnd
-psitrim           D    [100]     100
-prot_min_cov      D    [90]     90
-pdb_type          S    [1]     dnm
-pdb_min_sim       D    [50]     50
-pdb_max_sim       D    [100]     100
-pdb_min_cov       D    [90]     90
-pdb_blast_server    W_F    [0]     EBI
-blast             W_F    [1]     LOCAL
-pdb_db            W_F    [1]     /Users/esteban/t_coffee_test/test_db/test_db.fa
-protein_db        W_F    [0]     uniref50
-method_log        W_F    [0]     no
-struc_to_use      S    [0]
-cache             W_F    [1]     update
-print_cache       FL    [0]     0
-align_pdb_param_file    W_F    [0]     no
-align_pdb_hasch_mode    W_F    [0]     hasch_ca_trace_bubble
-external_aligner    S    [0]     NO
-msa_mode          S    [0]     tree
-et_mode           S    [0]     et
-master            S    [0]     no
-blast_nseq        D    [0]     0
-lalign_n_top      D    [0]     10
-iterate           D    [0]     0
-trim              D    [0]     0
-split             D    [0]     0
-trimfile          S    [0]     default
-split             D    [0]     0
-split_nseq_thres    D    [0]     0
-split_score_thres    D    [0]     0
-check_pdb_status    D    [0]     0
-clean_seq_name    D    [0]     0
-seq_to_keep       S    [0]
-dpa_master_aln    S    [0]
-dpa_maxnseq       D    [0]     0
-dpa_min_score1    D    [0]
-dpa_min_score2    D    [0]
-dpa_keep_tmpfile    FL    [0]     0
-dpa_debug         D    [0]     0
-multi_core        S    [0]     templates_jobs_relax_msa_evaluate
-n_core            D    [0]     1
-thread            D    [0]     1
-max_n_proc        D    [0]     1
-lib_list          S    [0]
-prune_lib_mode    S    [0]     5
-tip               S    [0]     none
-rna_lib           S    [0]
-no_warning        D    [0]     0
-run_local_script    D    [0]     0
-proxy             S    [0]     unset
-email             S    [0]
-clean_overaln     D    [0]     0
-overaln_param     S    [0]
-overaln_mode      S    [0]
-overaln_model     S    [0]
-overaln_threshold    D    [0]     0
-overaln_target    D    [0]     0
-overaln_P1        D    [0]     0
-overaln_P2        D    [0]     0
-overaln_P3        D    [0]     0
-overaln_P4        D    [0]     0
-exon_boundaries    S    [0]
-display           D    [0]     100

INPUT FILES
    Input File (S) reduced_CBP21_nopipe_ren.fa  Format fasta_seq
    Input File (M) sap_pair

Identify Master Sequences [no]:

Master Sequences Identified
Looking For Sequence Templates:

    Template Type: [EXPRESSO] Mode Or File: [EXPRESSO] [Start
!    Process: >AMO81812 [LOCAL/blast//Users/esteban/t_coffee_test/test_db/test_db.fa][COMPUTE CACHE]
        **CBP21 [PDB NOT RELEASED or WITHDRAWN]
        **AOM11 [PDB NOT RELEASED or WITHDRAWN]
        **CBP21 [PDB NOT RELEASED or WITHDRAWN]
        **AOM11 [PDB NOT RELEASED or WITHDRAWN]
         >AMO81812 No Template Selected
!    Process: >QXJ62528 [LOCAL/blast//Users/esteban/t_coffee_test/test_db/test_db.fa][COMPUTE CACHE]
        **AOM11 [PDB NOT RELEASED or WITHDRAWN]
        **AOM11 [PDB NOT RELEASED or WITHDRAWN]
         >QXJ62528 No Template Selected
!    Process: >QXX84031 [LOCAL/blast//Users/esteban/t_coffee_test/test_db/test_db.fa][COMPUTE CACHE]
        **CBP21 [PDB NOT RELEASED or WITHDRAWN]
        **AOM11 [PDB NOT RELEASED or WITHDRAWN]
        **CBP21 [PDB NOT RELEASED or WITHDRAWN]
        **AOM11 [PDB NOT RELEASED or WITHDRAWN]
         >QXX84031 No Template Selected

...
Then further down, many lines like this, one for each pair:

pid 15360 -- Method  cannot be applied to [AMO81812 vs QXJ62528], proba_pair will be used instead

So, it seems like t_coffee is assigning correctly the templates, but then it fails to find the corresponding structures, which are in the PDB_DIR.

Best,
Esteban 

Cedric Notredame

unread,
Aug 24, 2022, 10:25:56 AMAug 24
to tco...@googlegroups.com

Dear Esteban, Dear all

I think I know what is the issue: Expresso assumes you are working on public PDBs and it checks them against a list downloaded from PDB and stored in ~/.t_coffee/cache/pdb_entry_type.txt, since your PDBS are not public, it complains.

There are two workaround:

1- the correct one, a bit more involved: use 3D-Coffee instead of Expresso. 3D-Coffee is like Expresso without BLAST. You will need to build your your own template files that bind sequences and PDBs

><seq name> _P_ <PDB>

as many lines as needed

Then run t_coffee -in=reduced_CBP21_nopipe_ren.fa -template_file <your template file> -method sap_pair

you can add other structural pairwise methods, mustang_pair, TMalign_pair are the main ones I would recommand. In our hand TMalign_pair + sap_pair is often the best.

OR, quick and dirty

2-update ~/.t_coffee/cache/pdb_entry_type.txt by concatenating a list of all your PDBs following the original format (one entry per list)

111d    nuc     diffraction
111l    prot    diffraction
111m    prot    diffraction
112d    nuc     diffraction
112l    prot    diffraction

You may want to run the cat in your script to avoid your pdb_entry_file being updated. Note that I have not had time to check it. YOu may want to add TWO PDBs manualy and see if it works


All of these decision were taken before AF2... In the next release I will make Expresso more tolerant to unrealeased PDBs. Thanks for pointing this out.

Cheers,

Cedric


PS: I am also preparing a fix for the missing binaries. More on this soon.

To view this discussion on the web visit https://groups.google.com/d/msgid/tcoffee/1e0d445d-da9a-4c89-826a-187474e0ed53n%40googlegroups.com.
-- 
##########################################
Dr Cedric Notredame, PhD
Group Leader
Notredame's lab - Comparative Bioinformatics Group
Bioinformatics and Genomics Programme
Room 440.03

Centre de Regulació Genòmica (CRG)
Dr. Aiguader, 88
08003 Barcelona
Spain

Ph#     + 34 93 316 02 71
Fax#    + 34 93 316 00 99
Mobile# + 34 66 250 47 82

email   cedric.n...@crg.eu
url     www.tcoffee.org
blog    cedricnotredame.blogspot.com
ORC-ID: 0000-0003-1461-0988
###########################################

Esteban López Tavera

unread,
Aug 24, 2022, 11:16:48 AMAug 24
to Tcoffee
Thank you, Cedric!

I tried the dirty workaround (for now) and it seems to be running nicely now.

Just for reference, I had to make sure all pdbs have ids no longer than 5 characters. I guess that's related to what you just mentioned about expresso expecting public PDBs. Also, in the pdb_entry_type.txt file, the pdb ids need to be lowercase.

Indeed, most of my pdb files come from AF2, lol.

Thanks again!

Best,
Esteban
Reply all
Reply to author
Forward
0 new messages