vsearch v2.0.3_osx_x86_64, 64.0GB RAM, 24 cores
I'm unable to get ahold of the number generated at the end of chimera log.
Input sequences for derep step: 2286032
vsearch --derep_full full_tagclean.fasta --output full_derep.fasta --log=vsearch_log --sizeout --minuniquesize 2 2>derep_log.txt
After de-replication count is 113957
vsearch -cluster_fast full_derep.fasta -id 0.97 --sizein --sizeout --relabel OTU_ --centroids otus.fna 2> cluster_log.txt
Sequence count of otus.fna 6897
I run chimera check on these:
vsearch --uchime_denovo otus.fna --nonchimeras otus_checked.fna --sizein --chimeras chimeras.fasta 2> chimera_log.txt
Count of seqs in otus_checked.fna 1817
And finally:
vsearch -usearch_global full_tagclean.fasta -db otus_checked.fna -strand plus -id 0.97 -uc otu_table_mapping.uc 2> usearch_global_log.txt
Chimera log says:
Reading file otus.fna 100%
2864388 nt in 6897 seqs, min 254, max 464, avg 415
Masking 100%
Sorting by abundance 100%
Counting unique k-mers 100%
Detecting chimeras 100%
Found 5074 (73.6%) chimeras, 1817 (26.3%) non-chimeras,
and 6 (0.1%) borderline sequences in 6897 unique sequences.
Taking abundance information into account, this corresponds to
105518 (6.2%) chimeras, 1589751 (93.7%) non-chimeras,
and 774 (0.0%) borderline sequences in 1696043 total sequences.
I'm unable to understand how 1696043 is total. My input for derep for more than this. My count of seqs after derep is different.
Of which step does Vsearch consider abundance?
Any help shall be highly appreciated.
Thanks,
Sanjeev
Reading file full_tagclean.fasta 100%
946256408 nt in 2286032 seqs, min 201, max 490, avg 414
Dereplicating 100%
Sorting 100%
703946 unique sequences, avg cluster 3.2, median 1, max 78547
Writing output file 100%
113957 uniques written, 589989 clusters discarded (83.8%)
I've an off the topic query on "border line" sequences. Why are these not discarded if tool is uncertain about them being chimeric?
I've an off the topic query on "border line" sequences. Why are these not discarded if tool is uncertain about them being chimeric?