Differences between usearch/uclust in otu-picking methods

1,496 views
Skip to first unread message

tric...@uni-bremen.de

unread,
Jan 21, 2014, 9:46:09 AM1/21/14
to qiime...@googlegroups.com
Hi,

according to http://qiime.org/scripts/pick_otus.html, one could use usearch and uclust as the method of choice for clustering.
However, the manual of usearch states that usearch was basically not designed to cluster, as it is very basically a fast way to query sequences vs a defined database using unique word counts.
Uclust, if i understood the manual correctly, however, was designed to work as usearch, but starting with an empty database (and adding up new cluster centroids to its database).
So, could someone please clarify the differences between usearch and uclust in the QIIME pipeline?
Thank you!

Jai Ram Rideout

unread,
Jan 21, 2014, 1:07:51 PM1/21/14
to qiime...@googlegroups.com
Hello,

You can use either uclust or usearch to pick OTUs using any of the OTU-picking strategies in QIIME (i.e., closed reference, open reference, or de novo). Please see http://qiime.org/tutorials/otu_picking.html for more details.

Both tools can operate in de novo mode (where you start with an empty database) or reference-based mode (where you already have a database). Using usearch via pick_otus.py will perform extra quality filtering compared to uclust. See this tutorial for more details:


-Jai

--
 
---
You received this message because you are subscribed to the Google Groups "Qiime Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qiime-forum...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

tric...@uni-bremen.de

unread,
Jan 22, 2014, 6:50:54 AM1/22/14
to qiime...@googlegroups.com

Hi there,

thanks for your answer.

I am still wondering what the differences between the two algorithms are.
In the online documentation of J. Edgar, it is said that usearch is basically a search algorithm quering sequences vs. a database, using unique word counts as similarity criterion.
Uclust, if i understood it correctly, is basically taking the usearch algorithm and adds a clustering heuristic on top of that.
So, if i am correct, usearch was initially not designed for sequence clustering. So, i was wondering how usearch works in QIIME, especially for de_novo clustering, because
it wasnt initially working with empty databases, was it?
Thanks for clarification, im puzzled :))

Tony Walters

unread,
Jan 22, 2014, 3:07:14 PM1/22/14
to qiime...@googlegroups.com
Hello,

The names of the software/algorithm can create some confusion.

You'll notice at the beginning of the paper for uclust (http://bioinformatics.oxfordjournals.org/content/26/19/2460), the sentence "UCLUST is a new clustering method that exploits USEARCH to assign sequences to clusters." As USEARCH and UCLUST are also the name of the software packages, you can see how referencing the different aspects of the software can be muddled. Also, from this page describing a usearch command: http://www.drive5.com/usearch/manual/cluster_smallmem.html
You'll see the sentence "Clusters sequences using a variant of the UCLUST algorithm designed to minimize memory use."

In any case, uclust, and the clustering aspect of usearch (i.e. usearch with the --cluster_smallmem or --cluster_fast, as it's implemented in the application controllers with QIIME) do use a heuristic to minimize the processing time, which is using the matching words in the reference sequence (or seeds if open reference/de novo OTU picking is used) to minimize the number of alignments done between the query sequences and the reference database. On top of this, usearch also has functionality to detect chimeras, which is implemented for usearch 5.x and 6.1 in QIIME, as well as some other functionality that isn't implemented in QIIME (see http://www.drive5.com/usearch/manual/algorithms.html).

So the short answer is that you can use either to do clustering, and there are some other options if you're using usearch. The clustering, and the default parameters (e.g. word size) isn't identical between uclust and the different versions of usearch. The recommended values for these are from Robert Edgar (the author of the software), and were based on results from mock community datasets. If you want to get into the nitty-gritty details of exactly what is different between a particular version of uclust/usearch's clustering algorithms, you'll probably want to talk to Robert Edgar, as I don't know myself.

I hope this helps,
Tony


--
Reply all
Reply to author
Forward
0 new messages