fungal ITS analysis.

Daniel Laubitz

unread,

Oct 20, 2017, 3:43:26 PM10/20/17

to Qiime 1 Forum

Hello Dear QIIME Friends,

I got fastq files for ITS analysis. Normally I do bacterial V4 16SrRNA analysis and this is my the very first time with fungal ITS. I just want to use your wisdom and experience.

So my questions are:

- what database I should use (I use Silva or GG for 16S)

- should I use standard QIIME settings?

- do you have any useful param.txt files for ITS, you could share with me (please!)

- is there anything specific for ITS or different from 16S

Yes, I know there are many very general questions, but as I mentioned, it's going to be my first time (!) and for now I do not have any specific questions.

I just finished join_paired_ends.py and split_libraries_fastq.py. Now it's time for OTU picking.

I would really appreciate your help!

Thanks

Daniel

Dixi Modi

unread,

Oct 21, 2017, 10:26:10 PM10/21/17

to Qiime 1 Forum

HI Daniel,

UNITE database is reference database for ITS.

I downloaded it from the following website.

http://qiime.org/home_static/dataFiles.html

I tried to run the pick_open_reference_otus.py using its_12_11 as reference.. but did not work.

I also need help.

TonyWalters

unread,

Oct 22, 2017, 5:10:57 AM10/22/17

to Qiime 1 Forum

Hello,

I would get the latest UNITE QIIME database for ITS: https://unite.ut.ee/repository.php

There were some non-ascii characters in certain releases, but I'm not sure if the most recent ones have that issue (it will manifest as an error during taxonomy assignments, we have a custom script for cleanup if so).

There are two ways to use the UNITE data-either you need to use their trimming software (ITSx) to trim it down to the just the ITS regions (gets rid of overhangs from PCR that are in the SSU/LSU), or, skip the trimming and use the reference files that are in the /developer/ which are not trimmed back to the ITS region.

The "dynamic" reference files have centroids at different clustering percent identities, and the developers of UNITE have gotten better differentiation of fungal taxa with it.

Standard QIIME settings are fine, with the exception that one can not build trees from the ITS region, so you have to stick to non-phylogenetic metrics for beta and alpha diversity. E.g., use bray-curtis for beta diversity (or anything except UniFrac), and use any alpha diversity metric except for PD.

I hope this helps,

Tony

andrefcjp

unread,

Oct 22, 2017, 9:50:09 PM10/22/17

to Qiime 1 Forum

Hi Daniel,

There is an specific protocol for the ITS analysis located here. Please notice that there are some differences compared to the 16S for example.
http://nbviewer.jupyter.org/github/biocore/qiime/blob/1.9.1/examples/ipynb/Fungal-ITS-analysis.ipynb

In your case specific, after the split libraries I guess you need this command below. i.e. without phylogenetic tree.

!pick_open_reference_otus.py -i its-soils-tutorial/seqs.fna -r its_12_11_otus/rep_set/97_otus.fasta -o otus/ -p its-soils-tutorial/params.txt --suppress_align_and_tree

At the end your command might be this one
!core_diversity_analyses.py -i otus/otu_table_mc2_w_tax.biom -o cdout/ -m its-soils-tutorial/map.txt -e 353 --nonphylogenetic_diversity

I hope this will help you.

Andre

Dixi Modi

unread,

Oct 22, 2017, 11:32:18 PM10/22/17

to Qiime 1 Forum

Hi Daniel,

Thanks for the link. I found the link below quite useful as it has the command for the most recent version of UNITE as suggested by Tony above.

http://geoffreyzahn.com/getting-started-with-qiime-for-fungal-its-cleaning-its-reads-and-picking-otus/

Hope this helps you.

Dixi Modi

unread,

Oct 22, 2017, 11:35:20 PM10/22/17

to Qiime 1 Forum

Hi Tony,

Thanks a lot for clarifying this.

But I still have trouble running the command I think my data is huge and my CPU is not able to handle the load. I am using Virtual Box for my analysis and my seqs.fna after split library is ~13GB.

Daniel Laubitz

unread,

Oct 23, 2017, 12:52:45 PM10/23/17

to Qiime 1 Forum

Thank you all for all information. I really appreciate your help.

I will try to download UNITE and run my data set.

Will let you know.

Daniel

Daniel Laubitz

unread,

Oct 24, 2017, 4:47:44 PM10/24/17

to Qiime 1 Forum

Hi,

I ran pick_open_reference_otus.py (see below) and it went through however in the log file GG13_8 values are used. I did pass paramITS.txt file (attached) as well as -r $PATH/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta

Is that ok, or should I change something else?

Daniel

time pick_open_reference_otus.py -i /seqs_daniel.fna -o ITS_otus -r /macqiime/sh_qiime_release_10102017/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta -p /paramsITS.txt --suppress_align_and_tree

paramsITS.txt

log_20171024132411.txt

Dixi Modi

unread,

Oct 24, 2017, 7:10:00 PM10/24/17

to Qiime 1 Forum

Hi Daniel,

This looks ok to me. I have few questions

1) Did you discard the fasta unjoin files after joining paired ends sequences and then run split libraries command?

2) how big is your seqs.fna file?

I saw in your log_file that you got biom table if I am not misinterpreting. How long did the pick otu command run for?

I am having hard time with my data size!

Will appreciate your help.

Thanks.

TonyWalters

unread,

Oct 25, 2017, 12:31:18 AM10/25/17

to Qiime 1 Forum

Hello Daniel,

In the log file, it prints the default values from .qiime_config first, but these are overwritten by the parameters file values that come next. In the commands for reference-based OTU picking and assign taxonomy, it does use the UNITE reference files:

# Pick Reference OTUs command

pick_otus.py -i /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/demultiplexed/seqs_daniel.fna -o /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/ITS_otus/step1_otus -r /macqiime/sh_qiime_release_10102017/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta -m uclust_ref --enable_rev_strand_match --suppress_new_clusters

assign_taxonomy.py -o /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/ITS_otus/blast_assigned_taxonomy -i /Users/gibugs/Dropbox/Documents/Illumina/EmaleeEveEisenhauer_ITS/ITS_otus/rep_set.fna --reference_seqs_fp /macqiime/sh_qiime_release_10102017/sh_refs_qiime_ver7_dynamic_10.10.2017.fasta --id_to_taxonomy_fp /macqiime/sh_qiime_release_10102017/sh_taxonomy_qiime_ver7_dynamic_10.10.2017.txt --assignment_method blast

Dixi Modi

unread,

Oct 25, 2017, 8:41:38 PM10/25/17

to Qiime 1 Forum

Hi Tony,

While doing split libraries, do we have to consider unjoin files too after joining paired_ends? and

remove chimeras and ITS sequences before pick OTUS or after ???

I am so confused.

Will appreciate your help.

TonyWalters

unread,

Oct 26, 2017, 12:53:18 AM10/26/17

to Qiime 1 Forum

Hello,

I would recommend against using both unjoined and joined data. If you can get most of your reads to join, use the joined reads only. If the stitching process does not yield many reads, then I would skip stitching altogether and just use the R1 reads (unless you happen to have better quality on R2).

For chimera checking, the order of the process depends upon which software you use, e.g. with usearch (or vsearch as a plugin) you do it before OTU picking: http://qiime.org/tutorials/chimera_checking.html

Daniel Laubitz

unread,

Oct 26, 2017, 5:55:41 PM10/26/17

to Qiime 1 Forum

Thank you Tony!

Roger Huerlimann

unread,

Nov 9, 2017, 8:50:11 PM11/9/17

to Qiime 1 Forum

Hi Tony,

Just out of curiosity, what is the reason why ITS can't be used for phylogenetic metrics?

Regards,
Roger

Colin Brislawn

unread,

Nov 10, 2017, 12:33:14 AM11/10/17

to Qiime 1 Forum

Hello Roger,

why ITS can't be used for phylogenetic metrics?

It totally can! But do you trust the results?

Let's look at process that centroids of OTUs go through to become part of a phylogenetic tree.

OTU centroids -> MSA (multiple sequence alignment) with pynast or MUSCLE or MAFFT or clustal omega -> ML (maximum likelihood) tree building with FastTree2-> newick format .tre file

How well do each of these steps work with different kinds of amplicons?

That's a hard questions the final result depends on all of the other steps, and also because establishing the 'right' answer is hard. In the case of ITS, which has high biological variability in length and complexity, doing a 'correct' MSA is hard (if not impossible). While you can run any modern MSA program and get a result from your OTU centroids, it's not fully clean if you can feed that into the rest of the pipeline and get trustworthy results.

Your milage may vary.

Let us know what you find!

Colin

PS. I have totally done de novo OTU picking + pynast + FastTree2 using ITS and 18S sequence. Let me know how it works out for you.

Roger Huerlimann

unread,

Nov 15, 2017, 4:45:31 AM11/15/17

to Qiime 1 Forum

Hi Colin,

That makes sense. Thanks!

Regards,
Roger

Reply all

Reply to author

Forward