usearch or improperly formatted input for pick_open_reference

Amir Ariff

unread,

Aug 23, 2017, 2:11:44 AM8/23/17

to Qiime 1 Forum

Hello,

I've been using Qiime for a while now without any issues. However, I've recently run into some problems with a specific dataset using the script pick_open_reference_otus.py. When I run the command for the samples individually, all works fine, however, when I concatenate the sequences together into a single file, then run the script on it, the following error occurs:

Traceback (most recent call last):

File "/usr/lib/qiime/bin/pick_open_reference_otus.py", line 453, in <module>

main()

File "/usr/lib/qiime/bin/pick_open_reference_otus.py", line 432, in main

minimum_failure_threshold=minimum_failure_threshold)

File "/usr/lib/python2.7/dist-packages/qiime/workflow/pick_open_reference_otus.py", line 713, in pick_subsampled_open_reference_otus

close_logger_on_success=False)

File "/usr/lib/python2.7/dist-packages/qiime/workflow/util.py", line 122, in call_commands_serially

raise WorkflowError(msg)

qiime.workflow.util.WorkflowError:

*** ERROR RAISED DURING STEP: Pick Reference OTUs

Command run was:

pick_otus.py -i ../3.chimeras/no-chimeras/chorti-lenca.fasta -o remove_lenca_test/chorti-lenca/step1_otus -r /data/amir/SILVA_128_QIIME_release/rep_set/rep_set_all/97/97_otus.fasta -m usearch61_ref --enable_rev_strand_match --suppress_new_clusters

Command returned exit status: 1

Stdout:

Stderr

Traceback (most recent call last):

File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>

main()

File "/usr/lib/qiime/bin/pick_otus.py", line 897, in main

otu_prefix=otu_prefix, HALT_EXEC=False)

File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1800, in __call__

HALT_EXEC=HALT_EXEC

File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1844, in usearch61_ref_cluster

raise ApplicationError('Error running usearch61. Possible causes are '

burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided

I've checked that usearch is up to date, and it works with other datasets (and with these samples, individually). Below are my specifications and attached are the log files generated. The only thing i can think of is a memory issue, which I'm not sure how to circumvent (i'm running this on a pretty large shared server, which hasn't had problems with larger data sets, and i've tried turning off pick_otus:enable_rev_strand_match ).

System information

==================

Platform: linux2

Python version: 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4]

Python executable: /usr/bin/python

QIIME default reference information

===================================

For details on what files are used as QIIME's default references, see here:

https://github.com/biocore/qiime-default-reference/releases/tag/0.1.3

Dependency versions

===================

QIIME library version: 1.9.1

QIIME script version: 1.9.1+dfsg-1biolinux4

qiime-default-reference version: 0.1.3

NumPy version: 1.13.1

SciPy version: 0.19.1

pandas version: 0.20.2

matplotlib version: 2.0.2

biom-format version: 2.1.4

h5py version: 2.2.1 (HDF5 version: 1.8.11)

qcli version: 0.1.0

pyqi version: 0.3.2

scikit-bio version: 0.2.3

PyNAST version: 1.2.2

Emperor version: 0.9.51

burrito version: 0.9.1

burrito-fillings version: 0.1.1

sortmerna version: SortMeRNA version 2.0, 29/11/2014

sumaclust version: SUMACLUST Version 1.0.01

swarm version: Swarm 1.2.20 [Feb 1 2015 09:42:15]

gdata: Not installed.

QIIME config values

===================

For definitions of these settings and to learn how to configure QIIME, see here:

http://qiime.org/install/qiime_config.html

http://qiime.org/tutorials/parallel_qiime.html

blastmat_dir: /usr/share/ncbi/data

cluster_jobs_fp: None

pick_otus_reference_seqs_fp: /usr/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta

jobs_to_start: 1

pynast_template_alignment_blastdb: None

qiime_scripts_dir: /usr/lib/qiime/bin/

working_dir: .

pynast_template_alignment_fp: /usr/share/qiime/data/core_set_aligned.fasta.imputed

python_exe_fp: python

temp_dir: /tmp/

assign_taxonomy_reference_seqs_fp: /usr/share/qiime/data/gg_13_8_otus/rep_set/97_otus.fasta

blastall_fp: blastall

seconds_to_sleep: 60

assign_taxonomy_id_to_taxonomy_fp: /usr/share/qiime/data/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

There seem to be similar reports in the forums with this error line, but mine has stalled at the first step, whereas others seem to have gotten further. The same error occurs if I attempt to run the script pick_otus.py on the dataset:

amir@MDHS-NIX-028[4.otus_open_ref] qiime > pick_otus.py -i ../3.chimeras/no-chimeras/lenca.fasta -m usearch61 -o usearch61_picking [11:06AM]

Traceback (most recent call last):

File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>

main()

File "/usr/lib/qiime/bin/pick_otus.py", line 871, in main

HALT_EXEC=False)

File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1690, in __call__

HALT_EXEC=HALT_EXEC

File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster

'provided')

burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided

amir@MDHS-NIX-028[4.otus_open_ref] qiime > pick_otus.py -i ../3.chimeras/no-chimeras/lenca.fasta -m usearch61 -o usearch61_picking -M 1200 [11:27AM]

Traceback (most recent call last):

File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>

main()

File "/usr/lib/qiime/bin/pick_otus.py", line 871, in main

HALT_EXEC=False)

File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1690, in __call__

HALT_EXEC=HALT_EXEC

File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster

'provided')

burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided

amir@MDHS-NIX-028[4.otus_open_ref] qiime > pick_otus.py -i ../3.chimeras/no-chimeras/lenca.fasta -m usearch61 -o usearch61_picking -M 2000 [11:40AM]

Traceback (most recent call last):

File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>

main()

File "/usr/lib/qiime/bin/pick_otus.py", line 871, in main

HALT_EXEC=False)

File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1690, in __call__

HALT_EXEC=HALT_EXEC

File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster

'provided')

burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided

log_20170823103005.txt

log_20170823094618.txt

log_20170823100458.txt

abundance_sorted.log

Colin Brislawn

unread,

Aug 23, 2017, 12:33:56 PM8/23/17

to Qiime 1 Forum

Hello Amir,

Thanks for posting all these log files. Let me see how I can help.

The only thing i can think of is a memory issue, which I'm not sure how to circumvent (i'm running this on a pretty large shared server, which hasn't had problems with larger data sets, and i've tried turning off pick_otus:enable_rev_strand_match ).

I think you are right; this could absolutely be a memory issue.

While your server has lots of memory, the free version of usearch61 only works with 4GB max. This can be super limiting. This also explains why your script worked with the subset of data, but failed on the full data set.

You could try using one of the OTU picking methods which is not limited by 4GB, like -m uclust or -sortmerna_sumaclust. Both should scale much better for large data sets.

Let me know what you find!

Colin

Amir Ariff

unread,

Aug 23, 2017, 8:05:56 PM8/23/17

to Qiime 1 Forum

Hi Colin,

Thanks for clarifying that. I've actually managed to run usearch61 with a larger dataset of ~130 samples and this set is only ~15 samples (though the depth and number of sequences might be different, so it could be less memory intensive?).

Indeed I've managed to run it easily with uclust, but was quite intent on running it with usearch61. Is there a way to circumvent the 4GB limit? I've seen suggestions to 'divide the queries into smaller pieces', but am unsure how i would do that and if it would affect the data analysis.

Cheers,

Amir

Colin Brislawn

unread,

Aug 24, 2017, 1:57:00 PM8/24/17

to Qiime 1 Forum

Hello Amir,

Yes, dividing your data, running it, then combining the output works in theory. In practice this do-it-yourself map-reduce method is really messy, and key steps that have to happen on the full data set steps may still be too big for usearch61.

May I ask how you chose usearch61 as your OTU picking method? Maybe we can find another option which meets your criteria.

Colin

Amir Ariff

unread,

Aug 25, 2017, 2:45:43 AM8/25/17

to Qiime 1 Forum

Hi Colin,

What I've been trying to do is use a few different picking methods and comparing the results. Since usearch61 is quite frequently used in the literature, I wanted to include it in the methods to see if my results are significantly different using the different methods. I'm analysing gut microbiota from healthy adults with an open reference, which I think makes usearch61 a reasonable choice.

If there is a memory limitation which can't be circumvented as we're establishing (short of purchasing usearch), I'm quite happy to go forward with the uclust data that I've already generated. What do you recommend?

Colin Brislawn

unread,

Aug 25, 2017, 10:40:09 AM8/25/17

to Qiime 1 Forum

Hello Amir,

Ah, a benchmark. That is really good to do. I wish more people compared methods like this.

You can use usearch61 with large datasets outside of qiime, but the qiime workflow scripts would need the licenced version to work on large datasets elegantly. Moving forward with uclust is a reasonable choice.

The program vsearch is an open-source implementation of usearch. Because it's fast and free, it's gaining popularity as an alternative to usearch. VSEARCH will also have a Qiime 2 plugin, so benchmarking vsearch may be worthwhile.

https://github.com/torognes/vsearch

Let me know how I can help,

Colin

Amir Ariff

unread,

Aug 27, 2017, 10:16:24 PM8/27/17

to Qiime 1 Forum

Thanks very much, Colin.

I'll just go ahead with the usearch results for now but I'll give vsearch a go later on, as well.

usearch or improperly formatted input for pick_open_reference_otus.py

Amir Ariff

Colin Brislawn

Amir Ariff

Colin Brislawn

Amir Ariff

Colin Brislawn

Amir Ariff