Hello everyone :)
I am doing a metabarcoding project.
The thing is that I have 50 samples, which I have demultiplexed and preprocessed separately.
At first I have separately clustered each sample, as a result I get a fasta file for each sample with different OTUs. The thing is that I need the OTUs between samples to have the same identifier. That is, when clustering separately the OTU_1 of sample A is not the same as the OTU_1 of sample B (they are even different in the taxonomic classification).
I would like to get a list of OTUs and know which OTUs are repeated in the samples and which are not.
I have decided to concatenate the 50 fasta files and cluster them all together with the following command:
vsearch --cluster_fast all_samples.fasta \
--id 0.97 \
--centroids centroids.fasta \
--uc clusters.uc \
--relabel OTU_ \
--sizeout
And then compare each original sample with the centroids that appeared previously with:
vsearch --usearch_global sample1.fasta \
--db centroids.fasta \
--id 0.97 \
--otutabout sample1_otutable.txt
With this I will get a table of otus for each sample and then combine them using R to get a single table.
As I mentioned before, my goal is to know which otus are repeated and which samples are not. Do you think this workflow is correct?
Thank you very much in advance :)