Bacterial ITS analysis workflow

84 views
Skip to first unread message

sdpapet

unread,
Jan 27, 2014, 10:17:22 AM1/27/14
to qiime...@googlegroups.com
Hello, I have a data set from bacterial ITS 454 amplicon sequencing. I have been using Qiime to work on my 16S rRNA data, but I have never done any ITS analysis.

I have several questions:
1>Does Qiime support ITS analysis

2> What database should I choose? Is the default one, green gene database good for ITS analysis? Which database do you recommend for ITS analysis?

3> As you know, ITS region varies, so it can't align very well. I can't build trees and there won't be unifrac analysis. I can't use the workflow "pick_de_novo_otus.py". What should I do? step by step?

4> Last, I would like to know the cutoff of species. For 16S rRNA, we normally use 97%. What is the cutoff for ITS species level?

Thanks

Tony Walters

unread,
Jan 27, 2014, 10:51:56 AM1/27/14
to qiime...@googlegroups.com
Hello,

1. Yes.

2. You can use the UNITE database as your reference sequences. It can be downloaded from this page: http://qiime.org/home_static/dataFiles.html
You will need to make sure that you are pointing to the UNITE files (rep_set and taxonomy) rather than the default Greengenes to make use of them during the OTU picking and taxonomic assignment steps.

 3. If you use pick_open_reference_otus.py, you can suppress the alignment and tree building with --suppress_align_and_tree
Also, you can point to the UNITE rep set with --reference_fp. To make it use the UNITE taxonomy mapping file for the assign_taxonomy.py step, you would need to use specify a parameters.txt file with the --parameter_fp option, and have a lines in it like these:
assign_taxonomy:id_to_taxonomy  X
assign_taxonomy:reference_seqs_fp Y
where X is the filepath to the taxonomy/97_otu_taxonomy.txt file, and Y is the filepath to the rep_set/97_otus.fasta file from the UNITE files.
You could do the process step by step if you wanted to use that approach as well (e.g. pick_otus.py, pick_rep_set.py, assign_taxonomy.py, and make_otu_table.py).

4. Generally we use 97% on ITS as well, but there isn't necessarily an exact relationship between percent identity and species level separation (that's apart from the question of a definitive notion about what exactly a microbial "species" is).

-Tony


--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Pet Chiang

unread,
Jan 27, 2014, 10:56:20 AM1/27/14
to qiime...@googlegroups.com
Hi Tony,

Thank you.

The second question on the database. Are you sure I can use the UNITE database? I am working on bacterial ITS (16S ITS). I check the website of UNITE and it seems a fungal database (18S ITS)?

Ben


--
 
---
You received this message because you are subscribed to a topic in the Google Groups "Qiime Forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qiime-forum/NPtrWJYfuQA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qiime-forum...@googlegroups.com.

Tony Walters

unread,
Jan 27, 2014, 11:11:10 AM1/27/14
to qiime...@googlegroups.com
Ooops, my mistake. No you can't use the UNITE database in this case, or the Greengenes database (the sequences won't extend into the ITS region).

It's slightly trickier for you to do this analysis, as I don't think there are any bacterial/archaea ITS databases (but correct me if I'm wrong, it would be good to know where they are). You can't use the assign_taxonomy.py in this case, unless you were to put together a set of reference ITS sequences and taxonomy mapping files (in the same format that the Greengenes files are in). You could still cluster the sequences, and look for clusters that are significantly different between your samples (e.g. with group_significance.py), and then blast the representative sequence for that cluster on the NCBI site to see if the taxonomy can be identified.

Sorry there isn't an easy answer on this one. Getting a bacterial/archaeal ITS (actually, a SSU-ITS-LSU database, as the primers are often some distance into the SSU/LSU region) database put together is something that needs to be done, but as far as I know, nobody has done it yet.

Pet Chiang

unread,
Jan 27, 2014, 11:13:07 AM1/27/14
to qiime...@googlegroups.com
Thank you.

Ben
Reply all
Reply to author
Forward
0 new messages