What is the suggested way when using Phylophan to build a phylogenetic tree from individual genomes?

719 views
Skip to first unread message

Cedric Laczny

unread,
Aug 13, 2014, 9:42:18 AM8/13/14
to phylophl...@googlegroups.com
Hello,

I have  collection of a few hundred genomes in individual FASTA files; some of them are finished genomes, some are in draft form.
I would like to build a phylogenetic tree for these genomes using Phylophlan and the "-u" option.
As far as I understood, Phylophlan requires the genomes not as nucleotides but as peptides.
What is the suggested way to create this input when it is not readily available, i.e., when ".faa" files are not available, e.g., from NCBI?

So far, I have used prodigal (with the "-a" option) to output the translated CDS sequences into per-genome .faa files and put these into the input/ folder as suggested in the Phylophlan documentation.
Phyloplan performed the usearch run and generated .b6o files for every genome. So far, so good, apparently.
However, the execution of the program crashed immediately after this with the following message:

Traceback (most recent call last):
 
File "/PATH/TO/PHYLOPHLAN/phylophlan.py", line 777, in <module>
    gens2prots
(inps, projn)
 
File "/PATH/TO/PHYLOPHLAN/phylophlan.py", line 301, in gens2prots
   
assert vv not in prots2ups
AssertionError

And I have no idea what might be the issue.

Looking forward to your input.

Thank you very much in advance.

Best,

Cedric

Nicola Segata

unread,
Aug 13, 2014, 6:03:20 PM8/13/14
to phylophl...@googlegroups.com
Hi Cedric,
 thanks for getting in touch.

Your approach to generate proteomes using prodigal looks good to me. We are going to add support for using the genomes directly as input, but it is not implemented yet.

My guess for the error you get are:
1. did you interrupt a previous PhyloPhlAn run and restarted? Some files may have been corrupted, you should remove all the project files from data/ and output/ and re-run from scratch
2. are the protein IDs in your input files unique? No duplicate protein names are allowed

let me know if this helps solving the problem.
thanks
Nicola

Cedric Laczny

unread,
Aug 14, 2014, 1:45:52 AM8/14/14
to phylophl...@googlegroups.com
Hi Nicola,

thank you for your quick reply and your input. Good to hear that a translation functionality will be integrated. This will help to resolve uncertainties that might arise from doing that step independently. In particular, prodigal (and maybe other tools too) includes an asterisk to indicate translation stop and Phylophlan had complaint about that being a "non-letter", while it seems to be a legit letter in the FASTA amino acid code. Hence, I decided to remove that from every protein sequence before running Phylophlan.

I have now made sure that the protein IDs are unique _per_ file *and* unique _across_ files. I also used a small subset of the few hundred genomes this time.
I happily observe that it does no longer crash with the earlier error message and I can find a Newick file in the output/ folder.
However, I now get the following error message:
Optimize all lengths: LogLk = -146359.151 Time 14.24
Total time: 15.54 seconds Unique: 23/23 Bad splits: 0/20

Traceback (most recent call last):
  File "/PATH/TO/PHYLOPHLAN/phylophlan.py", line 794, in <module>
    circlader(projn,pars['integrate'],tax)
  File "/PATH/TO/PHYLOPHLAN/phylophlan.py", line 620, in circlader
    tree, xtree_rr ] )
  File "/PATH/TO/PYTHONLIB/lib/python2.7/subprocess.py", line 493, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/PATH/TO/PYTHONLIB/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/PATH/TO/PYTHONLIB/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
And the PhyloXML file is not found in the output/ folder.
Due this error, I am also wondering if the produced Newick file is correct.

Looking forward to your input.

Best,

Cedric

Nicola Segata

unread,
Aug 14, 2014, 1:59:14 AM8/14/14
to phylophl...@googlegroups.com
Well done!
The error is due to some missing dependency in the "circlader" submodule. This module is only doing a rerooting of the tree and a graphical plotting.

The real output is the newick file which is independent from the subsequent re-rooting and plotting. So you are safe using that tree in your research. If needed, you can re-root it with other tools. For plotting, one option you can consider is GraPhlAn (https://bitbucket.org/nsegata/graphlan/wiki/Home) which we are going to integrate more tightly with PhyloPhlAn in the near future.

cheers
Nicola

Cedric Laczny

unread,
Aug 14, 2014, 2:11:16 AM8/14/14
to phylophl...@googlegroups.com
Alright, thank you. Could you please let me know what dependency I would have to resolve? It is not apparent to me from the output, since it is complaining about a missing file/directory and not more.

Regarding GraPhlAn, that was also the next step I considered. Looking at the documentation, I did not find information about visualizing a Newick file but only information about plotting based on an XML file. Which point did I miss there?
Since these are pretty much my first steps with this kind of analysis, I am not totally clear about what the purpose of the rerooting is or which other tools to use for this step. In particular if this is a required step to use GraPhlAn, I should use a proper tool and not just any tool, I suppose.

Thank you very much for you help! Great support!

Best,

Cedric


--
You received this message because you are subscribed to a topic in the Google Groups "PhyloPhlAn-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/phylophlan-users/btG1rCS7yPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to phylophlan-use...@googlegroups.com.
To post to this group, send email to phylophl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/phylophlan-users/f438ecde-a593-4eff-9cad-d715b19fa65a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nicola Segata

unread,
Aug 14, 2014, 2:20:55 AM8/14/14
to phylophl...@googlegroups.com
It seems that the error is connected with calling this script "./taxcuration/tree_reroot.py" which needs "./pyphlan in the PYTHONPATH.

However, I would continue with the nwk file. Usually rerooting is something needed a bit of manual curation and PhyloPhlAn here may be sub-optimal. You can maybe use Archaeopteryx for rerooting. Usually, what you want to do is to manually place your outgroup genome as outgroup, or root the tree between bacteria and archaea (if you have both domains). Rerooting is dependent on the task.

GraPhlAn does accept newick trees. Check out the first figure here https://bitbucket.org/nsegata/graphlan/wiki/Home

cheers
Nicola
To unsubscribe from this group and all its topics, send an email to phylophlan-users+unsubscribe@googlegroups.com.
To post to this group, send email to phylophlan-users@googlegroups.com.

Cedric Laczny

unread,
Aug 14, 2014, 4:56:03 AM8/14/14
to phylophl...@googlegroups.com
Great, thanks.
It apparently was related to how I called phylophlan. I did not exactly follow the directory structure included/suggested with the package since I tried to set it up for multiple users on our cluster. Turns out that the initial way I did it was not optimal.
I have changed it now such that it is closer to the directory structure of the package while allowing easy usage by other users and the error did not appear on the subset. Both files (nwk and xml) were created this time.

Regarding GraPhlAn, indeed I missed the nwk-input point in the workflow by only looking at the examples provided. Thank you very much for the clarification.

Again, fast and great support! Thank you very much!

Looking forward to the pictures.

Best,

Cedric


To unsubscribe from this group and all its topics, send an email to phylophlan-use...@googlegroups.com.
To post to this group, send email to phylophl...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/phylophlan-users/0661d5bf-cb98-492f-94b7-9351d721c6ee%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages