usearch or improperly formatted input for pick_open_reference_otus.py

53 views
Skip to first unread message

Amir Ariff

unread,
Aug 23, 2017, 2:11:44 AM8/23/17
to Qiime 1 Forum

Hello,
I've been using Qiime for a while now without any issues. However, I've recently run into some problems with a specific dataset using the script pick_open_reference_otus.py. When I run the command for the samples individually, all works fine, however, when I concatenate the sequences together into a single file, then run the script on it, the following error occurs:

Traceback (most recent call last):
  File "/usr/lib/qiime/bin/pick_open_reference_otus.py", line 453, in <module>
    main()
  File "/usr/lib/qiime/bin/pick_open_reference_otus.py", line 432, in main
    minimum_failure_threshold=minimum_failure_threshold)
  File "/usr/lib/python2.7/dist-packages/qiime/workflow/pick_open_reference_otus.py", line 713, in pick_subsampled_open_reference_otus
    close_logger_on_success=False)
  File "/usr/lib/python2.7/dist-packages/qiime/workflow/util.py", line 122, in call_commands_serially
    raise WorkflowError(msg)
qiime.workflow.util.WorkflowError: 

*** ERROR RAISED DURING STEP: Pick Reference OTUs
Command run was:
 pick_otus.py -i ../3.chimeras/no-chimeras/chorti-lenca.fasta -o remove_lenca_test/chorti-lenca/step1_otus -r /data/amir/SILVA_128_QIIME_release/rep_set/rep_set_all/97/97_otus.fasta -m usearch61_ref --enable_rev_strand_match --suppress_new_clusters
Command returned exit status: 1
Stdout:

Stderr
Traceback (most recent call last):
  File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>
    main()
  File "/usr/lib/qiime/bin/pick_otus.py", line 897, in main
    otu_prefix=otu_prefix, HALT_EXEC=False)
  File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1800, in __call__
    HALT_EXEC=HALT_EXEC
  File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1844, in usearch61_ref_cluster
    raise ApplicationError('Error running usearch61. Possible causes are '
burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided

I've checked that usearch is up to date, and it works with other datasets (and with these samples, individually). Below are my specifications and attached are the log files generated. The only thing i can think of is a memory issue, which I'm not sure how to circumvent (i'm running this on a pretty large shared server, which hasn't had problems with larger data sets, and i've tried turning off pick_otus:enable_rev_strand_match ).


System information
==================
         Platform: linux2
   Python version: 2.7.6 (default, Oct 26 2016, 20:30:19)  [GCC 4.8.4]
Python executable: /usr/bin/python

QIIME default reference information
===================================
For details on what files are used as QIIME's default references, see here:

Dependency versions
===================
          QIIME library version: 1.9.1
           QIIME script version: 1.9.1+dfsg-1biolinux4
qiime-default-reference version: 0.1.3
                  NumPy version: 1.13.1
                  SciPy version: 0.19.1
                 pandas version: 0.20.2
             matplotlib version: 2.0.2
            biom-format version: 2.1.4
                   h5py version: 2.2.1 (HDF5 version: 1.8.11)
                   qcli version: 0.1.0
                   pyqi version: 0.3.2
             scikit-bio version: 0.2.3
                 PyNAST version: 1.2.2
                Emperor version: 0.9.51
                burrito version: 0.9.1
       burrito-fillings version: 0.1.1
              sortmerna version: SortMeRNA version 2.0, 29/11/2014
              sumaclust version: SUMACLUST Version 1.0.01
                  swarm version: Swarm 1.2.20 [Feb  1 2015 09:42:15]
                          gdata: Not installed.

QIIME config values
===================
For definitions of these settings and to learn how to configure QIIME, see here:

                     blastmat_dir: /usr/share/ncbi/data
                  cluster_jobs_fp: None
      pick_otus_reference_seqs_fp: /usr/lib/python2.7/dist-packages/qiime_default_reference/gg_13_8_otus/rep_set/97_otus.fasta
                    jobs_to_start: 1
pynast_template_alignment_blastdb: None
                qiime_scripts_dir: /usr/lib/qiime/bin/
                      working_dir: .
     pynast_template_alignment_fp: /usr/share/qiime/data/core_set_aligned.fasta.imputed
                    python_exe_fp: python
                         temp_dir: /tmp/
assign_taxonomy_reference_seqs_fp: /usr/share/qiime/data/gg_13_8_otus/rep_set/97_otus.fasta
                      blastall_fp: blastall
                 seconds_to_sleep: 60
assign_taxonomy_id_to_taxonomy_fp: /usr/share/qiime/data/gg_13_8_otus/taxonomy/97_otu_taxonomy.txt

There seem to be similar reports in the forums with this error line, but mine has stalled at the first step, whereas others seem to have gotten further. The same error occurs if I attempt to run the script pick_otus.py on the dataset:

amir@MDHS-NIX-028[4.otus_open_ref]  qiime > pick_otus.py -i ../3.chimeras/no-chimeras/lenca.fasta -m usearch61 -o usearch61_picking                                                                                                  [11:06AM]
Traceback (most recent call last):
  File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>
    main()
  File "/usr/lib/qiime/bin/pick_otus.py", line 871, in main
    HALT_EXEC=False)
  File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1690, in __call__
    HALT_EXEC=HALT_EXEC
  File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster
    'provided')
burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided
amir@MDHS-NIX-028[4.otus_open_ref]  qiime > pick_otus.py -i ../3.chimeras/no-chimeras/lenca.fasta -m usearch61 -o usearch61_picking -M 1200                                                                                          [11:27AM]
Traceback (most recent call last):
  File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>
    main()
  File "/usr/lib/qiime/bin/pick_otus.py", line 871, in main
    HALT_EXEC=False)
  File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1690, in __call__
    HALT_EXEC=HALT_EXEC
  File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster
    'provided')
burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided
amir@MDHS-NIX-028[4.otus_open_ref]  qiime > pick_otus.py -i ../3.chimeras/no-chimeras/lenca.fasta -m usearch61 -o usearch61_picking -M 2000                                                                                          [11:40AM]
Traceback (most recent call last):
  File "/usr/lib/qiime/bin/pick_otus.py", line 1004, in <module>
    main()
  File "/usr/lib/qiime/bin/pick_otus.py", line 871, in main
    HALT_EXEC=False)
  File "/usr/lib/python2.7/dist-packages/qiime/pick_otus.py", line 1690, in __call__
    HALT_EXEC=HALT_EXEC
  File "/usr/lib/python2.7/dist-packages/bfillings/usearch.py", line 1969, in usearch61_denovo_cluster
    'provided')
burrito.util.ApplicationError: Error running usearch61. Possible causes are unsupported version (current supported version is usearch v6.1.544) is installed or improperly formatted input file was provided

log_20170823103005.txt
log_20170823094618.txt
log_20170823100458.txt
abundance_sorted.log

Colin Brislawn

unread,
Aug 23, 2017, 12:33:56 PM8/23/17
to Qiime 1 Forum
Hello Amir,

Thanks for posting all these log files. Let me see how I can help.

The only thing i can think of is a memory issue, which I'm not sure how to circumvent (i'm running this on a pretty large shared server, which hasn't had problems with larger data sets, and i've tried turning off pick_otus:enable_rev_strand_match ).
I think you are right; this could absolutely be a memory issue.

While your server has lots of memory, the free version of usearch61 only works with 4GB max. This can be super limiting. This also explains why your script worked with the subset of data, but failed on the full data set.

You could try using one of the OTU picking methods which is not limited by 4GB, like -m uclust or -sortmerna_sumaclust. Both should scale much better for large data sets.

Let me know what you find!
Colin

Amir Ariff

unread,
Aug 23, 2017, 8:05:56 PM8/23/17
to Qiime 1 Forum
Hi Colin,

Thanks for clarifying that. I've actually managed to run usearch61 with a larger dataset of ~130 samples and this set is only ~15 samples (though the depth and number of sequences might be different, so it could be less memory intensive?).

Indeed I've managed to run it easily with uclust, but was quite intent on running it with usearch61. Is there a way to circumvent the 4GB limit? I've seen suggestions to 'divide the queries into smaller pieces', but am unsure how i would do that and if it would affect the data analysis.

Cheers,

Amir

Colin Brislawn

unread,
Aug 24, 2017, 1:57:00 PM8/24/17
to Qiime 1 Forum
Hello Amir,

Yes, dividing your data, running it, then combining the output works in theory. In practice this do-it-yourself map-reduce method is really messy, and key steps that have to happen on the full data set steps may still be too big for usearch61.

May I ask how you chose usearch61 as your OTU picking method? Maybe we can find another option which meets your criteria.

Colin

Amir Ariff

unread,
Aug 25, 2017, 2:45:43 AM8/25/17
to Qiime 1 Forum
Hi Colin,

What I've been trying to do is use a few different picking methods and comparing the results. Since usearch61 is quite frequently used in the literature, I wanted to include it in the methods to see if my results are significantly different using the different methods. I'm analysing gut microbiota from healthy adults with an open reference, which I think makes usearch61 a reasonable choice.

If there is a memory limitation which can't be circumvented as we're establishing (short of purchasing usearch), I'm quite happy to go forward with the uclust data that I've already generated. What do you recommend?

Colin Brislawn

unread,
Aug 25, 2017, 10:40:09 AM8/25/17
to Qiime 1 Forum
Hello Amir,

Ah, a benchmark. That is really good to do. I wish more people compared methods like this.

You can use usearch61 with large datasets outside of qiime, but the qiime workflow scripts would need the licenced version to work on large datasets elegantly. Moving forward with uclust is a reasonable choice.

The program vsearch is an open-source implementation of usearch. Because it's fast and free, it's gaining popularity as an alternative to usearch. VSEARCH will also have a Qiime 2 plugin, so benchmarking vsearch may be worthwhile.

Let me know how I can help,
Colin

Amir Ariff

unread,
Aug 27, 2017, 10:16:24 PM8/27/17
to Qiime 1 Forum
Thanks very much, Colin.

I'll just go ahead with the usearch results for now but I'll give vsearch a go later on, as well.

Cheers,

Amir
Reply all
Reply to author
Forward
0 new messages