I was running some data with VSEARCH and had a question about the dereplication and rereplication. Here is a summary of what I did to the data with the goal of chimera removal:
- sort by length (--sortbylength)
- dereplication (--derep_full)
- reference-based chimera search (--uchime_ref)
- de novo chimera search (--uchime_denovo)
- rereplication (--rereplicate)
When I do step 5, the output file does not look right (a header that appears once in original input file before derep appears 2252 times in the rereplicated, chimera removed file), also, no. of chimera detected+ no. of non-chimera reported < total no. of sequences in the input file.
Is there a cluster file that the rereplicate is referring to that I don't see? Is information to rereplicate being lost when I dereplicate the file?