filter_alignment in pick_open_reference_otus with ITS sequences failed

Caitlin O

unread,

Aug 9, 2016, 6:31:38 PM8/9/16

to Qiime 1 Forum

Hi,

I have ran into the error "An empty fasta file was provided. Did the alignment complete successfully? Did PyNAST discard all sequences due to too-stringent minimum length of minimum percent ID settings?" I've read a few other posts so I did notice that this issue with tree building with ITS sequences seems pretty common, however I'm wondering why I only had this issue once. I have 350 sequences split into 4 "sections" (3 of 100 sequences and 1 of 50) since the barcodes were reused. I ran the 3 100 sequence sections through picking open OTUs without having this error, but with my last part of this data set I ran into the error. Any idea why this might've happened? I used the same exact script other than changing the corresponding folder names. I do not plan on using the trees for anything, but would like to understand since I haven't run into this error before (this is my first time analyzing ITS sequences and using pick_open_reference_otus though).

Here is the script I used:

pick_open_reference_otus.py -i ITS_301_350/demultiplexed_seqs.fna -p /home/qiime/Documents/Caitlin/parallel_all_params -r /home/qiime/Documents/ITSdb.findley,fasta -a -0 4 -o Pick_open_OTUs/301_350

with my parameters file having:

pick_otus:enable_rev_strand_match True

pick_otus:max_accepts 1

pick_otus:max_rejects 8

pick_otus:stepwords 8

pick_otus:word_length 8

beta_diversity:metrics bray_curtis,unweighted_unifrac,weighted_unifrac

alpha_diversity:metrics shannon,chao1,observed_species

I know unweighted and weighted unifrac use a tree, I do not plan on using those metrics for this data set, I just haven't gotten around to editing that out of the parameters file :)

Jai Ram Rideout

unread,

Aug 10, 2016, 2:54:41 PM8/10/16

to Qiime 1 Forum

Hi Caitlin,

I'm not sure why some of your ITS data aligned and others didn't, PyNAST (as it is hooked up in QIIME) isn't compatible with ITS data. It's possible that one of the "successful" datasets had a small numbers of sequences that happened to align and pass the filters in filter_alignment.py. Regardless, building ITS phylogenetic trees is problematic (searching the forum will yield plenty of previous discussion on the topic). The ghost-tree tool (paper) may be helpful in performing phylogenetically-aware analyses with ITS data in QIIME.

If you're not interested in performing phylogenetically-aware analyses on your data, pass --suppress_align_and_tree to pick_open_reference_otus.py. If you're using core_diversity_analyses.py after that, use the --nonphylogenetic_diversity option. I recommend checking out QIIME's ITS tutorial for more details.

Best,

Jai

Caitlin O

unread,

Sep 15, 2016, 2:25:01 PM9/15/16

to Qiime 1 Forum

Hi Jai,

Sorry, I thought I had replied to this awhile ago and thanked you for your help, but apparently I didn't! Thank you for helping me with this!

I have now run into another related issue. I finally found time to actually go through the taxa summaries resulting from the pick_open_references that I was asking about here, and for some reason my OTUs were classified using green genes. In my command I had use "-r ITSdb.findley.fasta" as shown above, but when I looked at the log, it shows that the default gg_13_8_otus/... files were used instead. Am I missing something in my script?

Thanks,

Caitlin

Jai Ram Rideout

unread,

Sep 16, 2016, 12:21:57 PM9/16/16

to Qiime 1 Forum

Hi Caitlin,

Passing reference sequences via -r to pick_open_reference_otus.py only specifies the reference sequences to use in the OTU picking step of the workflow. To change the reference sequences and taxonomy used during the taxonomy assignment step (assign_taxonomy.py), you'll need to create a parameters file (more info on parameters files can be found here) and pass it to pick_open_reference_otus.py using -p.

Create a text file with these lines in it:

assign_taxonomy:reference_seqs_fp <path to your ITS reference sequences>

assign_taxonomy:id_to_taxonomy_fp <path to your ITS reference taxonomy>

Replace <path to your ITS reference sequences> with the path to your ITS reference sequences. These will probably be the same sequences you're passing to pick_open_reference_otus.py via -r, so you'd use ITSdb.findley.fasta. Next, replace <path to your ITS reference taxonomy> with the path to your reference taxonomy file -- this file maps reference sequence IDs to taxonomy strings. More info on this type of file can be found here. Greengenes, Silva, and UNITE databases that are compatible with QIIME have these taxonomy files. If you're using a custom ITS database (it looks like you are), you'll need to create this file to get taxonomic assignments with QIIME.

Let me know how it goes!

Best,

Jai

Caitlin O

unread,

Sep 19, 2016, 3:09:48 PM9/19/16

to Qiime 1 Forum

It did work, thank you for your help!

Jai Ram Rideout

unread,

Sep 19, 2016, 6:15:46 PM9/19/16

to Qiime 1 Forum

Great, glad it's working!

Jai

Reply all

Reply to author

Forward