Hi guys,
This is an alternative approach using vsearch to reproduce what i've done using usearch
for i in trimed/*R1*.gz; do
echo "\n$i" >> log.txt;
vsearch --fastq_mergepairs $i
--reverse ${i/_R1/_R2}
--fastqout $(basename ${i%%_S*})_merged.fastq
--eeout
--fastq_maxdiffs 2
--fastq_maxns 0
--fastq_minlen 100
--fastq_maxmergelen 160
--fastq_minovlen 20
--fastq_maxee 1.0
--threads 32 2>> log.txt
done
for i in merged/*.gz; do
vsearch --derep_fulllength $i
--output $(basename $i .fq)_dereped.fa
--fasta_width 0
--sizeout
--relabel $(basename $i merged.fastq.gz)
--threads 32 2>> log.txt
done
for i in dereped/*.gz; do zcat $i >> unique.fa; done
Now that I've done pre-processing and got my unique sequences all in one file. It is time to cluster them.. And here where vsearch doesn't seem to perform ...
vsearch --cluster_fast reads.fa
--id 0.90
--alnout otus_aln.txt
--threads 32
--rowlen 161
--centroids otus_centroids.txt
--profile otus_profile.txt
--msaout otus_msaout.txt
--uc otus_uclust.txt
--fasta_width 0
This command runs much much slower then equivalent one in usearch and it produces ~ 6000 otus compare to 5 using usearch
vsearch v2.0.5_linux_x86_64, 125.9GB RAM, 32 cores
Reading file reads.fa 100%
145220481 nt in 1137359 seqs, min 100, max 160, avg 128
Masking 100%
Sorting by length 100%
Counting unique k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 12430 Size min 1, max 159458, avg 91.5
Singletons: 6845, 0.6% of seqs, 55.1% of clusters
Multiple alignments 100%
--------------------------------------
usearch8.1.1861_i86linux32 -cluster_otus reads.fa -minsize 2 -otu_radius_pct 10 -otus otus_10.fa -relabel Otu
usearch v8.1.1861_i86linux32, 4.0Gb RAM (132Gb total), 32 cores
(C) Copyright 2013-15 Robert C. Edgar, all rights reserved.
WARNING: OTU radius > 3% not recommended
00:01 46Mb 100.0% 5 OTUs, 4 chimeras (0.1%)
Does anyone know what's the reason for such difference?
Thanks,
Kirill