vsearch cluster_fast -to cluster data and calculate consensus seq- produces vastly different results than USEARCH

284 views
Skip to first unread message

gloria...@gmail.com

unread,
Feb 15, 2017, 2:39:40 PM2/15/17
to VSEARCH Forum
Hi,

I have fluidigm data of fungal isolates. The data have been demultiplexed by sample and then by amplicon.
What I want to do is to produce clusters and for each cluster I also need the consensus sequence.

I run a test with a small dataset that I have. The dataset is of reads have been already trimmed, stitched together and converted to fasta format.

USEARCH version is  usearch/8.0.1517
VSEARCH version is vsearch/2.4.0

The test run was performed on a cluster. The node has 24 cores and 384 GB of RAM

The USEARCH run produces 693 clusters when I specify 99.5% identity
The VSEARCH run produces 453 clusters when I also specify 99.5% identity

I checked the output files of both runs and compared them side-by-side. They are very different. I wonder why.
Could you please take a look at this issue and help me move forward with this analysis?

Thanks.

--Gloria

==================================================================

The USEARCH command was:

usearch -cluster_fast  $inputfile  -id 0.995 \
-centroids    ${filename}.centroids \
-uc           ${filename}.clusters \
-consout      ${filename}.consesus \
-alnout       ${filename}.aln \
-clusters     $clustdir/${filename}.c- \
-msaout       $msadir/${filename}.c- \



The VSEARCH command was:

vsearch -cluster_fast  $inputfile  -id 0.995 \
-centroids    ${filename}.centroids \
-uc           ${filename}.clusters \
-consout      ${filename}.consesus \
-alnout       ${filename}.aln \
-clusters     $clustdir/${filename}.c- \
-msaout       $msadir/${filename}.c- \



Torbjørn Rognes

unread,
Feb 16, 2017, 5:49:46 AM2/16/17
to VSEARCH Forum
Hi!

The vsearch and usearch programs no not always produce the same results. Both use heuristic clustering algorithms, but there are some differences between them. The exact algorithm followed by usearch is not public.

Often vsearch will be more sensitive and detect similarities between sequences that usearch may overlook. This can result in fewer clusters for vsearch. This is probably a more correct result, as in principle all sequences more than 99.5% similar should be clustered in this case.

- Torbjørn

gloria...@gmail.com

unread,
Feb 16, 2017, 12:32:14 PM2/16/17
to VSEARCH Forum
Thank you very much

--Gloria

Reply all
Reply to author
Forward
0 new messages