OTU Assembly question

Laurie Agosto

unread,

Oct 16, 2019, 12:38:47 PM10/16/19

to VSEARCH Forum

Hello!

I'm fairly new to vsearch and bioinformatics and I'm looking for help on finding a command that shows me all the different sequences generated under each OTU sample. I've used -fastapairs as well as -alnout but that is not giving me the information as to the assembly process of the OTU and the corresponding sequences. As far as I understand there should be a lot of sequences under each OTU generated. If anyone could provide direction, I would greatly appreciate it!

Thank you so much!

Colin Brislawn

unread,

Oct 16, 2019, 2:35:40 PM10/16/19

to VSEARCH Forum

Hello Laurie,

Thanks for using vsearch. I think I have just the command you are looking for:
--clusters string
Output each cluster to a separate fasta file using the prefix string and a ticker (0, 1, 2, etc.) to construct the path and filenames.

Here's how you might use it.

vsearch --cluster_fast seqs.fna --id 0.97 --centroids otus.fna --clusters otu_number_

That will give you your OTU centroids in otus.fna, and every read that went into every centroid inside a whole list of files with names like:
otu_number_1
otu_number_2
otu_number_3
otu_number_4
otu_number_5

each of which is a .fna file, even though it does not have the .fna file extension.

Colin

Torbjørn Rognes

unread,

Oct 17, 2019, 6:23:01 AM10/17/19

to VSEARCH Forum

In addition to what Colin nicely described, you could also use the --msaout option to specify a file where multiple alignments of all sequences in each cluster are written. They are written cluster by cluster, starting with the centroid sequence denoted by a star (*) and ending with a consensus sequence.

- Torbjørn

Reply all

Reply to author

Forward