ALign ITS1 fungi sequences for phylogenetic tree ?

207 views
Skip to first unread message

Émilie Tremblay

unread,
Jul 21, 2016, 8:54:44 AM7/21/16
to Qiime 1 Forum
Hi,
I need to generate phylogenetic trees to analyse my NGS datasets but I read that the alignment is not possible because of the nature of the region. Yet some people do recommend to use alignment such as Pynast to do so.
Can someone provide me with more information, please?
Thanks a lot

Andrew Krohn

unread,
Jul 22, 2016, 7:29:35 PM7/22/16
to Qiime 1 Forum
I always align my ITS data to get "phylogenetic signal" with the understanding such output is completely invalid. I would never submit such data and if you inspect a tree from an ITS alignment, you wouldn't either.

Ghost-tree is a possible way to produce phylogenies from ITS data (https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-016-0153-6). You can clone the repository here: https://github.com/JTFouquier/ghost-tree

This tool needs to only be run once on your reference data set to produce a tree for use with any data you run against it. There are some provided .nwk files at the repository, but they are from old versions of UNITE, before the mass cleansing of chimeric sequences occurred from that database. I recommend to run it yourself against your own file.

One major limitation to ghost-tree is that it will only utilize data from a closed-reference analysis. This is super annoying and problematic since ITS data can be so variable as to produce very little output from closed-reference analysis (I lose ~90% of my data this way). I noticed the limitation was primarily imposed due to the way sequences from SILVA are identified against UNITE -- once you have a ghost-tree made, you need to use the sequence ID from your database (first column of database taxonomy file) against those listed in your OTU table to find a match. This is exactly the output you get with closed-reference analysis, hence it works. With de novo or the de novo component of open-reference, this is something less informative such as, "denovo1", so you need to match the tax string to the de novo OTU ID, compare to your taxonomy file in your database and exchange "denovo1" for the sequenceID in the database OR exchange the sequenceID in the ghost-tree.nwk file with "denovo1" etc. Once this is done, your ghost-tree will behave just like a real tree and generate a phylogeny that is not all mixed up. I note the branch lengths in my ghost trees are often absurdly short, though they are reasonably organized.

To get around this limitation I wrote a script which is essentially a search-and-replace affair that will make your de-novo or open-reference based analysis compatible with ghost-tree: https://github.com/alk224/akutils-v1.2/wiki/preprocess_otus_for_ghost-tree.sh

To make this work you will need to install akutils (http://alk224.github.io/akutils-v1.2/)

You will still lose a lot of data through ghost tree (~50%) since many sequences will not have deep enough taxonomic assignment to find an acceptable match in SILVA. Let me know if you have any questions.

I have attached two pdfs which contain tree graphics (from phyloseq) for the trees used from the same data set. One using a mafft alignment and subsequent fasttree phylogeny, the other using a ghost-tree. Apologies for the colors, R kind of sucks for choosing a decent palette automatically. The mafft tree is clearly a huge mess. The ghost-tree clearly has a bunch of sequences removed, the aforementioned shorter branch lengths, and the ascomycetes clearly separate from the basidios (yay!). However, you also see the few glomero and zygo sequences nested in between these groups when they should actually be basal. Still, what a dramatic improvement ghost-tree makes. For my analysis it still removed so much data I didn't have enough remaining depth to use it, but what a nifty tool.
Phylum_tree - mafft aln.pdf
Phylum_tree - ghosttree.pdf
Reply all
Reply to author
Forward
0 new messages