Usearch error

Andy

unread,

Oct 22, 2012, 8:18:53 PM10/22/12

to qiime...@googlegroups.com

I am having trouble getting Usearch to run on the virtual box. Below is my command line and error message. I have attempted to reinstall the program a few different times. I have also tried different fasta files and different reference databases. I followed Jose's four point installation guide and that seemed to be ok.

Thanks in advance for any help you can provide,
Andy

qiime@qiime-VirtualBox:~/Desktop/Shared_Folder$ pick_otus.py -i 16S_subsampled.fna -m usearch --word_length 64 --db_filepath=gg_12_10.fasta -m usearch -o usearch_results

Traceback (most recent call last):
File "/home/qiime/qiime_software/qiime-1.5.0-release/bin/pick_otus.py", line 575, in <module>
    main()
File "/home/qiime/qiime_software/qiime-1.5.0-release/bin/pick_otus.py", line 484, in main
    log_path=log_path,HALT_EXEC=False)
File "/home/qiime/qiime_software/qiime-1.5.0-release/lib/qiime/pick_otus.py", line 983, in __call__
    HALT_EXEC=HALT_EXEC)
File "/home/qiime/qiime_software/qiime-1.5.0-release/lib/qiime/pycogent_backports/usearch.py", line 1420, in usearch_qf
    'provided')
cogent.app.util.ApplicationError: Error running usearch. Possible causes are unsupported version (current supported version is usearch v5.2.32) is installed or improperly formatted input file was provided

Jose Navas

unread,

Oct 22, 2012, 8:21:15 PM10/22/12

to qiime...@googlegroups.com

Hi Andy,

Can you send the output of the command:

usearch --version

Cheers,

2012/10/22 Andy <andy...@yahoo.com>

--

--
Jose Navas

Andy

unread,

Oct 22, 2012, 8:23:07 PM10/22/12

to qiime...@googlegroups.com

Output is usearch v5.2.32

Jai Ram Rideout

unread,

Oct 22, 2012, 8:32:29 PM10/22/12

to qiime...@googlegroups.com

Hi Andy,

This is a bug in QIIME and will be fixed for the next release- your
usearch install / version looks good. For now, the filepath for
--db_filepath needs to be absolute, so you could pass something like
--db_filepath $PWD/gg_12_10.fasta as a workaround.

If this does not fix the issue, can you please try running the script
with -v (verbose) and send us the output? This will show us exactly
where it is failing.

Thanks,
Jai

> --
>
>
>

Flo

unread,

Apr 8, 2013, 4:30:13 AM4/8/13

to qiime...@googlegroups.com

Hi,

I'm currently analysing 454 16S data using Qiime. I succefully analysed my data using usearch, Greengen and alpha and beta diversity metrics and now I want to analyse a bigger dataset 4,8 G of combiened sequences wich is the pool of the datas I have already analyed).
The problem is that Usearch is working for 3 days now and I don't know if it is still working or if there is a problem.
In the usearch folder I can see 4 files: derep.log / derep/uc len-sorted.fasta and sorten.log, The latest modification of these files were done 2 days ago.
On linux system monitor I can see that usearch is using 2.8G (total 6G) but no CPU.
Could you please give me any advice or tell me how to check that usearch is working on my datas?

Thanks in advance

FC

Tony Walters

unread,

Apr 8, 2013, 7:10:00 AM4/8/13

to qiime...@googlegroups.com

Hello Flo,

It may have crashed-can you post the contents of the log files, as well as the results of using head/tail commands on the .uc file(s)?

open a terminal, change to the output directory, and type:

head derep.uc

tail derep.uc

--

---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Flo

unread,

Apr 8, 2013, 11:24:35 AM4/8/13

to qiime...@googlegroups.com

Hi Tony,

I get it : ---Fatal error--- Out of memory, mymalloc(20200072), curr 4.08e+09 bytes in the log file...

Here is the complete log file :

usearch --cluster /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/len_sorted.fasta --slots 16769023 --sizeout --uc /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.uc --w 64 --seedsout /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/dereplicated_seqs.fasta --maxrejects 500 --log /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.log --minlen 64 --derep_subseq
Started Fri Apr 5 17:37:32 2013
Version 5.2.32
6.0Gb RAM

Hash size 16769023 (16.8M)
Algorithm:
De novo clustering
Fast search (U-sorting) enabled
Similarity computed from a global alignment

Word-counting heuristics:
Word length for database index 64
Step words 8
Bump fraction 50%

U-sorted search termination:
Max accepts 1
Max rejects 500
Word count rejection enabled

Accept criteria:
Min id 100.0%

Query set:
Filename /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/len_sorted.fasta
File size 4.8Gb
Sequence type nucelotide

Database:
Initially empty

Substitution scores:
Match score 1.0
Mismatch score -2.0

Gap penalties:
     10.00 Open penalty (internal gaps)
      0.50 Open penalty (end gaps)
      1.00 Ext. penalty (internal gaps)
      0.50 Ext. penalty (end gaps)

Fast alignment heuristics:
HSP seed word length 5
One-hit HSP seeding
Exact word seeds (neighborhoods disabled)
Min HSP length 32
Band radius 16

Sat Apr 6 02:42:04 2013
usearch --cluster /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/len_sorted.fasta --slots 16769023 --sizeout --uc /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.uc --w 64 --seedsout /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/dereplicated_seqs.fasta --maxrejects 500 --log /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.log --minlen 64 --derep_subseq
Elapsed time: 09:04:33

---Fatal error---
Out of memory, mymalloc(20200072), curr 4.08e+09 bytes

Here are the head derep.uc and tail derep.uc results:

flo@flo:/media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results$ head derep.uc
# usearch --cluster /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/len_sorted.fasta --slots 16769023 --sizeout --uc /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.uc --w 64 --seedsout /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/dereplicated_seqs.fasta --maxrejects 500 --log /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.log --minlen 64 --derep_subseq
# version=5.2.32
# Tab-separated fields:
# 1=Type, 2=ClusterNr, 3=SeqLength or ClusterSize, 4=PctId, 5=Strand, 6=QueryStart, 7=SeedStart, 8=Alignment, 9=QueryLabel, 10=TargetLabel
# Record types (field 1): L=LibSeed, S=NewSeed, H=Hit, R=Reject, D=LibCluster, C=NewCluster, N=NoHit
# For C and D types, PctId is average id with seed.
# QueryStart and SeedStart are zero-based relative to start of sequence.
# If minus strand, SeedStart is relative to reverse-complemented seed.
S    0    572    *    *    *    *    *    FE11.Py.243_150023133 H2A12VL02G1ZU3 orig_bc=CGAGACGCGC new_bc=CGAGACGCGC bc_diffs=0    *
S    1    572    *    *    *    *    *    1227_70119763 H2988UE02HATVL orig_bc=CAGACGTCTG new_bc=CAGACGTCTG bc_diffs=0    *
flo@flo:/media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results$ tail derep.uc
H    4881022    341    100.0    .    0    0    341M40I    FE11.Prairie.57_122211 H3OODHQ01BD613 orig_bc=CGCTATCTAT new_bc=CGCTATCTAT bc_diffs=0    FE11.Labour.42_100186158 H4C3CFM02F7RAY orig_bc=CGCGACATCT new_bc=CGCGACATCT bc_diffs=0
S    5039980    341    *    *    *    *    *    FE11.Prairie.43_20217857 H3OODHQ02IUTKE orig_bc=CGACGTGACT new_bc=CGACGTGACT bc_diffs=0    *
H    4694548    341    100.0    .    0    0    341M54I    FE11.Labour.8_20217846 H3OODHQ02F49PM orig_bc=CAGTAGACGT new_bc=CAGTAGACGT bc_diffs=0    FE11.Non.Labour.41_40029906 H3QWBSS01AY590 orig_bc=CGTGAGCTGA new_bc=CGTGAGCTGA bc_diffs=0
S    5039981    341    *    *    *    *    *    FE11.Prairie.7_170003 H3OODHQ01BKORM orig_bc=CGTCTAGTAC new_bc=CGTCTAGTAC bc_diffs=0    *
H    4755853    341    100.0    .    0    0    52I341M    FE11.Py.47_170004153 H2H89AO01CKYV8 orig_bc=CGCGACATCT new_bc=CGCGACATCT bc_diffs=0    FE11.Prairie.51_70197734 H3QWBSS02GUPF3 orig_bc=CGTATATGCT new_bc=CGTATATGCT bc_diffs=0
H    4970581    341    100.0    .    0    0    341M21I    FE11.Labour.16_40231186 H3QWBSS01EXG29 orig_bc=CAGTAGACGT new_bc=CAGTAGACGT bc_diffs=0    FE11.Labour.31_20199113 H3OODHQ02HRPD7 orig_bc=CATCGAGCAG new_bc=CATCGAGCAG bc_diffs=0
S    5039982    341    *    *    *    *    *    FE11.Prairie.43_20149718 H3OODHQ02ISHQC orig_bc=CGACGTGACT new_bc=CGACGTGACT bc_diffs=0    *
H    4823967    341    100.0    .    0    0    47I341M    FE11.Prairie.33_10192681 H3OODHQ01AOI1U orig_bc=CGAGCTCATG new_bc=CGAGCTCATG bc_diffs=0    FE11.Labour.3_30153560 H3OODHQ02FNSNC orig_bc=CATCTACTGA new_bc=CATCTACTGA bc_diffs=0
S    5039983    341    *    *    *    *    *    FE11.Labour.33_20109327 H3OODHQ02FO2CP orig_bc=CGAGCTCATG new_bc=CGAGCTCATG bc_diffs=0    *
H    4694974    341    100.0flo@flo:/media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results$

Thanks

Tony Walters

unread,

Apr 8, 2013, 12:25:34 PM4/8/13

to qiime...@googlegroups.com

Hello Flo,

You might have to get the 64 bit version of usearch to allocate sufficient memory to complete this processing.

I don't know if it would help to call the command directly, but you could run this command:

usearch --cluster /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/len_sorted.fasta --slots 16769023 --sizeout --uc /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.uc --w 64 --seedsout /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/dereplicated_seqs.fasta --maxrejects 500 --log /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.log --minlen 64 --derep_subseq

and see if it crashes as well.

-Tony

Flo

unread,

Apr 9, 2013, 3:07:48 AM4/9/13

to qiime...@googlegroups.com

Hi Tony,

It crashes again!

00:00 20Mb Creating table 16769023 (16.8M) slots ......done
08:50:54 4.1Gb 98.7% 5049977 clusters, avg size 1.9, avg id. 100.0%
Out of memory mymalloc(20240072), curr 4.08e+09 bytes

usearch --cluster /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/len_sorted.fasta --slots 16769023 --sizeout --uc /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.uc --w 64 --seedsout /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/dereplicated_seqs.fasta --maxrejects 500 --log /media/Data/QIIME/SharedFolder/Qiime/Global/usearch_qf_results/derep.log --minlen 64 --derep_subseq

---Fatal error---
Out of memory, mymalloc(20240072), curr 4.08e+09 bytes

jrvalverde

unread,

Apr 9, 2013, 3:44:08 AM4/9/13

to qiime...@googlegroups.com

The problem is the version of Usearch you are using. You need to buy the 64 bit version if you want to deal with such a "huge" dataset (it should be "huge" as I've been able to successfully run Usearch on Illumina data with up to 2 million reads).

Warning, long explanation below. Read only if you want to understand what is going on and how to diagnose this kind of problem.

Look at the log: Usearch reports that it sees 6GB of memory available, but down below

> Started Fri Apr 5 17:37:32 2013
> Version 5.2.32
> 6.0Gb RAM

>

.. .. ..

> ---Fatal error---

> Out of memory, mymalloc(20200072), curr 4.08e+09 bytes

>

> Here are the head derep.uc and tail derep.uc results:

It says that it is trying to allocate memory (mymalloc), and cannot. The memory it is trying to get is not that "big" (between 2^24 and 2^25), but the memory it already has is 4.08 * 10^9, which is very close to 2^32. If you sum both quanities, you get darn close to 2^32 (and probably exceed it if we assume the 4.08e9 figure may be approximate).

The reason these calculations are relevant is the version of Usearch being used: the standard version, the free one, is built for 32 bit architectures, that means that depending on the machine and system it is being run on, it can use (allocate) up to a maximum of ((2^32) - 1) or (( (2^32) / 2) - 1) bytes (the last value is for a signed index (half the values) and the minus one is for the zero.

Therefore, since 2^31 ~= 2.14 e+9, and you already have allocated more, it means your upper limit is actually 2^32. But as in your case, the programis trying to allocate very close to that amount of memory, and most likely slightly over it, it implies that your version of Usearch cannot process such a big amount of memory and therefore cannot run.

Now, 4.29e+9 corresponds to 4 GB of memory. So you have memory enough to run your version of Usearch (it sees 6GB), but your version cannot use it all. That usearch can "see" more memory does not mean it can use it (for instance, on my systems it correctly reports 99GB are available, but it still can only use 4GB).

Usearch comes in two versions: the free one is for 32 bit architectures, and the commercial one for 64 bit. If you run the standard, free version you cannot use more than 4GB of memory (which would accept for about 2-4 million Illumina reads). If you need more memory (more than 2^32 bytes or 4GB) then you must buy the commercial, 64 bit version, which can allocate up to ~ 2^64 bits (16 exbibytes or ~ 10e+18 bytes).

Time to consider giving some money to help altruistic developers who do not mind sharing their work (like RC Edgar) earn their bread.

Flo

unread,

Apr 9, 2013, 4:27:00 AM4/9/13

to qiime...@googlegroups.com

Hi,

Thanks for your answer. I think I'm gonna buy the 64bits version ! But on the usearch website http://www.drive5.com/usearch/manual/reducing_memory.html it's said that we can reduce memory requierement by spliting the dataset.

So is that correct if I pick the OTU with usearch on 3 subsets and then I concatenate the usearch results and the fna dataset? Does it make sens?

Thanks a lot

Flo

Tony Walters

unread,

Apr 9, 2013, 6:20:04 AM4/9/13

to qiime...@googlegroups.com

Hello Flo,

It would be a lot of work (in terms of parsing out results to merge the data) to try and break it apart for usearch. You would have to manually call the commands (as described in OTUPipe, http://www.drive5.com/usearch/manual/otupipe.html).

We're working on some changes (probably available in the development version of QIIME within the next couple of weeks) that would alter the way the OTU IDs are created during usearch_ref OTU picking with no new clusters (skips the enumeration step that's in OTUPipe, which we based the usearch 5.X implementation on), which would allow you to merge your results if they were split up via the merge_otu_maps.py script. The disadvantage is that this approach could discard novel sequences if they happened to not be in the reference dataset.

Getting the 64 bit version is another option.

-Tony

Flo

unread,

Apr 17, 2013, 10:29:22 AM4/17/13

to qiime...@googlegroups.com

Thanks

Reply all

Reply to author

Forward