Hello, I am running a meta analysis using a closed reference
I used mothur to combine my fastq into a fasta file using make.contigs command, then I use this file as the input for vsearch
Here are the commands I'm using:
./vsearch --fastx_uniques mothur.fasta --sizein --sizeout --fasta_width 0 --uc all.derep.uc --fastaout new.mothur.fasta
vsearch v2.22.1_linux_x86_64, 376.5GB RAM, 96 cores
https://github.com/torognes/vsearchDereplicating file newmeta.trim.contigs.good.renamed.fasta 100%
89035602982 nt in 219831151 seqs, min 35, max 625, avg 405
Sorting 100%
86773373 unique sequences, avg cluster 2.5, median 1, max 527191
Writing FASTA output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%
./vsearch --usearch_global new.mothur.fasta --db sintax___BEEx_FL-TS.fa --id 0.97 --strand both --sizein --sizeout --uc 97new-hits.uc --notmatched 97new-miss.fasta --dbmatched 97new.otus.fasta --biomout 97new.biom --mothur_shared_out newmeta.original.shared
The problem I am having is that my shared file from the mothur output is 533GB. I am wondering if there's a way to merge the matched hits at the species level before outputting to the shared file? I believe this will greatly reduce the size of my file