fungal ITS analysis.

328 views
Skip to first unread message

Daniel Laubitz

unread,
Oct 20, 2017, 3:43:26 PM10/20/17
to Qiime 1 Forum
Hello Dear QIIME Friends,

I got fastq files for ITS analysis. Normally I do bacterial V4 16SrRNA analysis and this is my the very first time with fungal ITS. I just want to use your wisdom and experience.

So my questions are:
-  what database I should use (I use Silva or GG for 16S)
- should I use standard QIIME settings?
- do you have any useful param.txt files for ITS, you could share with me (please!) 
- is there anything specific for ITS or different from 16S

Yes, I know there are many very general questions, but as I mentioned, it's going to be my first time (!) and for now I do not have any specific questions.

I just finished join_paired_ends.py and split_libraries_fastq.py. Now it's time for OTU picking.

I would really appreciate your help!

Thanks
Daniel

Dixi Modi

unread,
Oct 21, 2017, 10:26:10 PM10/21/17
to Qiime 1 Forum
HI Daniel,

UNITE database is reference database for ITS.

I downloaded it from the following website.


I tried to run the pick_open_reference_otus.py using its_12_11 as reference.. but did not work. 


I also need help.


TonyWalters

unread,
Oct 22, 2017, 5:10:57 AM10/22/17
to Qiime 1 Forum
Hello,

I would get the latest UNITE QIIME database for ITS: https://unite.ut.ee/repository.php
There were some non-ascii characters in certain releases, but I'm not sure if the most recent ones have that issue (it will manifest as an error during taxonomy assignments, we have a custom script for cleanup if so).

There are two ways to use the UNITE data-either you need to use their trimming software (ITSx) to trim it down to the just the ITS regions (gets rid of overhangs from PCR that are in the SSU/LSU), or, skip the trimming and use the reference files that are in the /developer/ which are not trimmed back to the ITS region.

The "dynamic" reference files have centroids at different clustering percent identities, and the developers of UNITE have gotten better differentiation of fungal taxa with it.

Standard QIIME settings are fine, with the exception that one can not build trees from the ITS region, so you have to stick to non-phylogenetic metrics for beta and alpha diversity. E.g., use bray-curtis for beta diversity (or anything except UniFrac), and use any alpha diversity metric except for PD.

I hope this helps,
Tony

andrefcjp

unread,
Oct 22, 2017, 9:50:09 PM10/22/17
to Qiime 1 Forum
Hi Daniel,

There is an specific protocol for the ITS analysis located here. Please notice that there are some differences compared to the 16S for example.
http://nbviewer.jupyter.org/github/biocore/qiime/blob/1.9.1/examples/ipynb/Fungal-ITS-analysis.ipynb

In your case specific, after the split libraries I guess you need this command below. i.e. without phylogenetic tree.
!pick_open_reference_otus.py -i its-soils-tutorial/seqs.fna -r its_12_11_otus/rep_set/97_otus.fasta -o otus/ -p its-soils-tutorial/params.txt --suppress_align_and_tree

At the end your command might be this one
!core_diversity_analyses.py -i otus/otu_table_mc2_w_tax.biom -o cdout/ -m its-soils-tutorial/map.txt -e 353 --nonphylogenetic_diversity

I hope this will help you.

Andre

Dixi Modi

unread,
Oct 22, 2017, 11:32:18 PM10/22/17
to Qiime 1 Forum
Hi Daniel,

Thanks for the link. I found the link below quite useful as it has the command for the most recent version of UNITE as suggested by Tony above. 


Hope this helps you.

Dixi Modi

unread,
Oct 22, 2017, 11:35:20 PM10/22/17
to Qiime 1 Forum
Hi Tony,

Thanks a lot for clarifying this. 

But I still have trouble running the command I think my data is huge and my CPU is not able to handle the load. I am using Virtual Box for my analysis and my seqs.fna after split library is ~13GB.

Daniel Laubitz

unread,
Oct 23, 2017, 12:52:45 PM10/23/17
to Qiime 1 Forum
Thank you all for all information. I really appreciate your help. 
I will try to download UNITE and run my data set. 
Will let you know. 
Daniel

Daniel Laubitz

unread,
Oct 24, 2017, 4:47:44 PM10/24/17
to Qiime 1 Forum
Hi, 
I ran pick_open_reference_otus.py (see below) and it went through however  in the log file GG13_8 values are used. I did pass paramITS.txt file (attached) as well as -r $PATH/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta
Is that ok, or should I change something else?
Daniel 

time pick_open_reference_otus.py -i /seqs_daniel.fna -o ITS_otus -r /macqiime/sh_qiime_release_10102017/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta -p /paramsITS.txt --suppress_align_and_tree

paramsITS.txt
log_20171024132411.txt

Dixi Modi

unread,
Oct 24, 2017, 7:10:00 PM10/24/17
to Qiime 1 Forum
Hi Daniel,

This looks ok to me. I have few questions 

1) Did you discard the fasta unjoin files after joining paired ends sequences and then run split libraries command?
2) how big is your seqs.fna file?

I saw in your log_file that you got biom table if I am not misinterpreting. How long did the pick otu command run for?


I am having hard time with my data size!

Will appreciate your help.

Thanks.

TonyWalters

unread,
Oct 25, 2017, 12:31:18 AM10/25/17
to Qiime 1 Forum
Hello Daniel,

In the log file, it prints the default values from .qiime_config first, but these are overwritten by the parameters file values that come next. In the commands for reference-based OTU picking and assign taxonomy, it does use the UNITE reference files:

# Pick Reference OTUs command 
pick_otus.py -i /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/demultiplexed/seqs_daniel.fna -o /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/ITS_otus/step1_otus -r /macqiime/sh_qiime_release_10102017/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta -m uclust_ref --enable_rev_strand_match --suppress_new_clusters

assign_taxonomy.py -o /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/ITS_otus/blast_assigned_taxonomy -i /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/ITS_otus/rep_set.fna --reference_seqs_fp /macqiime/sh_qiime_release_10102017/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta --id_to_taxonomy_fp /macqiime/sh_qiime_release_10102017/sh_taxonomy_qiime_ver7_dynamic_10.10.2017.txt --assignment_method blast

Dixi Modi

unread,
Oct 25, 2017, 8:41:38 PM10/25/17
to Qiime 1 Forum
Hi Tony,

While doing split libraries, do we have to consider unjoin files too after joining  paired_ends? and 

remove chimeras and ITS sequences  before pick OTUS or after ???

I am so confused.

Will appreciate your help.

TonyWalters

unread,
Oct 26, 2017, 12:53:18 AM10/26/17
to Qiime 1 Forum
Hello,

I would recommend against using both unjoined and joined data. If you can get most of your reads to join, use the joined reads only. If the stitching process does not yield many reads, then I would skip stitching altogether and just use the R1 reads (unless you happen to have better quality on R2).

For chimera checking, the order of the process depends upon which software you use, e.g. with usearch (or vsearch as a plugin) you do it before OTU picking: http://qiime.org/tutorials/chimera_checking.html

Daniel Laubitz

unread,
Oct 26, 2017, 5:55:41 PM10/26/17
to Qiime 1 Forum
Thank you Tony!

Roger Huerlimann

unread,
Nov 9, 2017, 8:50:11 PM11/9/17
to Qiime 1 Forum
Hi Tony,

Just out of curiosity, what is the reason why  ITS can't be used for phylogenetic metrics?

Regards,
Roger

Colin Brislawn

unread,
Nov 10, 2017, 12:33:14 AM11/10/17
to Qiime 1 Forum
Hello Roger,

why  ITS can't be used for phylogenetic metrics?

It totally can! But do you trust the results?

Let's look at process that centroids of OTUs go through to become part of a phylogenetic tree.
OTU centroids -> MSA (multiple sequence alignment) with pynast or MUSCLE or MAFFT or clustal omega -> ML (maximum likelihood) tree building with FastTree2-> newick format .tre file 

How well do each of these steps work with different kinds of amplicons?

That's a hard questions the final result depends on all of the other steps, and also because establishing the 'right' answer is hard. In the case of ITS, which has high biological variability in length and complexity, doing a 'correct' MSA is hard (if not impossible). While you can run any modern MSA program and get a result from your OTU centroids, it's not fully clean if you can feed that into the rest of the pipeline and get trustworthy results.

Your milage may vary.

Let us know what you find!
Colin


PS. I have totally done de novo OTU picking + pynast + FastTree2 using ITS and 18S sequence. Let me know how it works out for you. 

Roger Huerlimann

unread,
Nov 15, 2017, 4:45:31 AM11/15/17
to Qiime 1 Forum
Hi Colin,

That makes sense. Thanks!

Regards,
Roger
Reply all
Reply to author
Forward
0 new messages