more trouble with ete3 evol throwing errors before starting

342 views
Skip to first unread message

Brandon Kieft

unread,
Mar 23, 2017, 11:25:09 PM3/23/17
to The ETE toolkit
Hello all,

New here. I've been doing fine creating gene and species trees with ete3 build, both for individual genes of interest and concatenated genes for building species trees. I have also been able to replicate the cookbook recipes.

However, when I try to take one of my gene trees/alignments and run ete3 evol with it, I can never get any output and always get errors. Again, I can replicate the ete3 evol cookbook recipes just fine.

I've tried dozens of iterations, but I'll explain the most basic workflow that leads me to my most common end-point failure.

I start with an unaligned multifasta file containing the gene sequence of interest for my 4 species (I've tried both aa and nts, but here I'll provide nts since that's what I've seen in the cookbook). This is called cluster214.ffn.

Then I run the following:  ete3 build -w standard_fasttree -n cluster214.ffn -o output_tree

This aligns the sequences (cluster214.ffn.final_tree.fa) and builds a tree (cluster214.ffn.final_tree.nw). This process also creates a "used_alg" fasta file (cluster214.ffn.final_tree.used_alg.fa), but I've also tried to use this one in the next scrip to no avail.

Then I run the following:  ete3 evol -t cluster214.ffn.final_tree.nw --alg cluster214.ffn.final_tree.fa -o evol_results1/ --models M0 --cpu 8

I expect this to take my aligned fasta file and my tree and use codeml to infer the null model.

However, what I get is the following error, with no evol_results1 directory even getting created.

Using: /home/kieftb/anaconda2/bin/ete3_apps/bin/Slr
Using: /home/kieftb/anaconda2/bin/ete3_apps/bin/codeml
Traceback (most recent call last):
  File "/home/kieftb/anaconda2/bin/ete3", line 11, in <module>
    load_entry_point('ete3==3.0.0b36', 'console_scripts', 'ete3')()
  File "/home/kieftb/anaconda2/lib/python2.7/site-packages/ete3-3.0.0b36-py3.5.egg/ete3/tools/ete.py", line 92, in main
    _main(sys.argv)
  File "/home/kieftb/anaconda2/lib/python2.7/site-packages/ete3-3.0.0b36-py3.5.egg/ete3/tools/ete.py", line 238, in _main
    args.func(args)
  File "/home/kieftb/anaconda2/lib/python2.7/site-packages/ete3-3.0.0b36-py3.5.egg/ete3/tools/ete_evol.py", line 850, in run
    tree.link_to_alignment(args.alg, alg_format='paml')
  File "/home/kieftb/anaconda2/lib/python2.7/site-packages/ete3-3.0.0b36-py3.5.egg/ete3/evol/evoltree.py", line 295, in link_to_alignment
    leaf.sequence = translate(leaf.nt_sequence)
  File "/home/kieftb/anaconda2/lib/python2.7/site-packages/ete3-3.0.0b36-py3.5.egg/ete3/evol/utils.py", line 170, in translate
    for nt3 in newcod[2]:
IndexError: list index out of range

Possible issues:
(1) I have a species in there whose gene is much shorter than the other three because it is divergent. That puts a lot of gaps in my alignment. However, adding a trim step to the build process does not resolve any issues. Also, doing a manual alignment with muscle then trying the process yields the same error.
(2) My newick tree file has branch lengths included, which is not the case for the cookbook recipe I was following (http://etetoolkit.org/cookbook/ete_evol_lysozyme_branch.ipynb).

I've looked at the other 3 questions here on this topic but none of the issues seem to be resolved.

Thanks!
Brandon


cluster214.ffn
cluster214.ffn.final_tree.nw
cluster214.ffn.final_tree.fa
cluster214.ffn.final_tree.used_alg.fa

françois SERRA

unread,
Mar 24, 2017, 5:04:00 AM3/24/17
to eteto...@googlegroups.com
Hi Brandom,
it seems that the problem is that the alignment is not done in codon base, and you thus have a total number of columns in the alignment that is not multiple of three, e.g.:
>Planktomarina_temperata
ATG ACC AAT GGA CGC GTA AAC CCC CAA TTC ACC CTC GAA GAT CAA GGC ATC ACT GGG CTG GGG ACT GTC TAT TAT AAC CTG CTG GAA CCC ACG CTC ATC GAA CAG GCT TTG GCA CGC AAA GAA GGC GAG CTT GGT AAG GGC GGC GCC TTC TTG GTC TCC ACT GGC AAA TTC ACG GGC CGA TCT CCA AAA GAC AAA CAT GTG GTC AAA ACT GCC TCT GTG GCT GAT AGC ATT TGG TGG GAC AAC AAT GCC GAA ATG TCA GAG GCG GGT TTT GAG GCC CTC TTT GAG GAT ATG ATC GCT CAT ATG CAG GGG CGC GAT TAT TTT GTA CAA GAC CTC TTT GGC GGC GCC GAT CCC GCC A-- ACC GCC TCG ATG TGC GCA TGG TGA CAG AGC TGG CCT GGC ATG GGC TGT TTA TTC GCC ATA TGT TGC GCC GGC CTG ATG CGG AGG AGC TGG AAG AAT TTA CAG CCG ATT GGA CTG TAA TCA ATT GCC CCT CCT TCC AAG CAA ATC CTG AGC GCC ACA ATT GCC GCT CTG AAA CGG TTA TCG CGA TGA ATT TCG ACC GCC GGA TCA TCC TGA TTG GCG GCA CGG AAT ACG CGG GCG AAA ACA AAA AAT CTG TCT TCT CAC TGC TCA ATT ACT TAT TGC CTG AAA AAG GCA TAA TGC CGA TGC ATT GCT CGG CCA ATC ATG CCC CTG GAA ACC CCG TGG ATA CGG CAG TTT TCT TTG GAT TGT CGG GCA CGG GAA AAA CCA CAC TTT CAG CGG ATC CCT CCC GCG TAT TGA TTG GCG ATG ATG AGC ATG GCT GGT CTG ATC GCG GCA CCT TCA ATT TTG AAG GGG GAT GCT ATG CCA AAA CCA TCA ACC TCA GCG CAG AGG CCG AAC CAG AAA TCT TCG CCA CCA CCT CGA AAT TCG GCA CGG TCA TTG AGA ATA TGG TCT ATG ATC CCG AAA CCA AAG AGC TGG ATT TCG ATG ATG ACA GCC TAA CGG CCA ATA TGC GCT GCG CTT ACC CTT TGG AAT ACA TCT CAA ACG CTT CAC CAA CGG CGC TTG GCG GGC ATC CGA AGA ATA TCA TCA TGC TGA CC- --T GTG ACA GCT TTG GCG TTT TGC CAC CGA TTG CGC GGT TGA CGC CGG CCC AGG CCA TGT ATC ACT TTT TGT CGG GCT TCA CCG CC- --A AAG TGG CCG GGA CAG AGC GCG GGG TGA CAG AAC CGC AGC CGA CCT TCT CAA CCT GCT TCG GCG CCC CCT TCA TGC CGC GCC GTC CTG AGG TTT ATG GCA ACC TCC TGC GCG AGA AAA TCG CCA AAC ATG GCG CAA CCT GCT GGT TGG TGA ATA CCG GTT GGA CAG GTG GCG CAT ATG GAA AAG GCA GCC GTA TGC CGA TCC GCG CCA CCC GCT CGT TGC TCA GCG CCG CGC TCG ATG GCA CGC TGG CCA ATG GCA CAT TTC GCA CTG ATC CAA ATT TTG GCT TTG AAG TGC CAA CCT CAG TGC CAG GCG TGG CAG ATC TCT TGC TCG AGC CCA GAC GGA CAT GGG AAG ACA AAG ACG CCT ATG ACG CAC AGG CCG AAA AGC TTG TCG CGA TGT TCA GCG AAA ACT TCC AGC AGT ATC TTC CCT ATA TTG ACG AAG ACG TGC GTG CGA TTG CCT TGG GCG GTT AA


The solution is to get a translation of your sequences into protein (attached) and to align it with:
ete3 build -a cluster214_aa.ffn -n cluster214.ffn -o mixed_types/ -w standard_fasttree --clearall --nt-switch-threshold 0.0

And run the ete evol command again:

ete3 evol -t cluster214.ffn.final_tree.nw --alg cluster214_aa.ffn.final_tree.used_alg.fa -o evol_results1/ --models M0 --cpu 8

cheers

francois

--
You received this message because you are subscribed to the Google Groups "The ETE toolkit" group.
To unsubscribe from this group and stop receiving emails from it, send an email to etetoolkit+unsubscribe@googlegroups.com.
To post to this group, send email to eteto...@googlegroups.com.
Visit this group at https://groups.google.com/group/etetoolkit.
For more options, visit https://groups.google.com/d/optout.



--

François Serra

*Postdoctoral Fellow*

Structural Genomics Team - CNAG - Centre Nacional d’Anàlisi Genòmica
Structural Genomics Group - CRG - Centre de Regulació Genòmica
Parc Científic de Barcelona – Torre I
Baldiri Reixac, 4
08028 Barcelona
Tel: +34 934 020 828


cluster214_aa.ffn

Brandon Kieft

unread,
Mar 24, 2017, 1:16:36 PM3/24/17
to eteto...@googlegroups.com
Hello Francois,

Thanks so much, this seemed to be the solution! I actually did try the mixed types before, but I think I didn't set the switch threshold parameter so it was still failing even though it made it a little farther into the script.

I do have a couple more questions for you though if you have some time.

1. The branch-site models (bs...) don't seem to work for me. When I sub "M0" for i.e. "bsA" in the evol command you sent, it throws the exception: ERROR: model bsA failed, problem with outfile: evol_results/bsA/out. Are these models still being developed?

2. I've produced a species tree with ete build for 5 species using 250 shared, single-copy orthologs (found using OrthoVenn). Is there a way with ete evol to calculate dn/ds ratios for each gene and test null v alternative (i.e. M7 v M8) across sites in each gene? I suppose I could write a little program that iterates through each individual orthologous group and runs the ete evol script separately, but maybe there's an automated way? And, if so, can I use the concatenated newick tree or do I need to build a separate tree for each gene?

Thanks for taking the time to help me out!

Brandon

François Serra

unread,
Mar 24, 2017, 1:43:05 PM3/24/17
to eteto...@googlegroups.com
Hi again Brandom,
I answer bellow


On 24/03/17 18:14, Brandon Kieft wrote:
Hello Francois,

Thanks so much, this seemed to be the solution! I actually did try the mixed types before, but I think I didn't set the switch threshold parameter so it was still failing even though it made it a little farther into the script.

I do have a couple more questions for you though if you have some time.

1. The branch-site models (bs...) don't seem to work for me. When I sub "M0" for i.e. "bsA" in the evol command you sent, it throws the exception: ERROR: model bsA failed, problem with outfile: evol_results/bsA/out. Are these models still being developed?
yes, these models are still supported, as well as their documentation :)

have a look there: http://etetoolkit.org/cookbook/ete_evol_lysozyme_branch-site.ipynb

If you want to use branch models, you have to define which branch first.


2. I've produced a species tree with ete build for 5 species using 250 shared, single-copy orthologs (found using OrthoVenn). Is there a way with ete evol to calculate dn/ds ratios for each gene and test null v alternative (i.e. M7 v M8) across sites in each gene? I suppose I could write a little program that iterates through each individual orthologous group and runs the ete evol script separately, but maybe there's an automated way? And, if so, can I use the concatenated newick tree or do I need to build a separate tree for each gene?
You will have to execute an instance of ete evol for each of your trees.
check this cookbook to understand the hypothesis testing you want to execute: http://etetoolkit.org/cookbook/ete_evol_hiv-env_site.ipynb

cheers



Thanks for taking the time to help me out!


I realize that I could catch better the errors, and give more informative messages, I will try to clarify it in the future.

cheers

francois

To unsubscribe from this group and stop receiving emails from it, send an email to etetoolkit+...@googlegroups.com.

To post to this group, send email to eteto...@googlegroups.com.
Visit this group at https://groups.google.com/group/etetoolkit.
For more options, visit https://groups.google.com/d/optout.

Brandon Kieft

unread,
Mar 24, 2017, 1:59:12 PM3/24/17
to eteto...@googlegroups.com
Thanks again! I'll try your suggestions and report back if I find any error messages that could be improved.

Brandon
Reply all
Reply to author
Forward
0 new messages