Hi,
I have fluidigm data of fungal isolates. The data have been demultiplexed by sample and then by amplicon.
What I want to do is to produce clusters and for each cluster I also need the consensus sequence.
I run a test with a small dataset that I have. The dataset is of reads have been already trimmed, stitched together and converted to fasta format.
USEARCH version is usearch/8.0.1517
VSEARCH version is vsearch/2.4.0
The test run was performed on a cluster. The node has 24 cores and 384 GB of RAM
The USEARCH run produces 693 clusters when I specify 99.5% identity
The VSEARCH run produces 453 clusters when I also specify 99.5% identity
I checked the output files of both runs and compared them side-by-side. They are very different. I wonder why.
Could you please take a look at this issue and help me move forward with this analysis?
Thanks.
--Gloria
==================================================================
The USEARCH command was:
usearch -cluster_fast $inputfile -id 0.995 \
-centroids ${filename}.centroids \
-uc ${filename}.clusters \
-consout ${filename}.consesus \
-alnout ${filename}.aln \
-clusters $clustdir/${filename}.c- \
-msaout $msadir/${filename}.c- \
The VSEARCH command was:
vsearch -cluster_fast $inputfile -id 0.995 \
-centroids ${filename}.centroids \
-uc ${filename}.clusters \
-consout ${filename}.consesus \
-alnout ${filename}.aln \
-clusters $clustdir/${filename}.c- \
-msaout $msadir/${filename}.c- \