Runtime for sumaclust with pick_otus.py

20 views
Skip to first unread message

Daniel Hwang

unread,
Jul 29, 2016, 12:19:45 PM7/29/16
to Qiime 1 Forum
Hi,

I am running a pick_otus.py with the sumaclust algorithm on a `seqs.fna` file created from fasta files preprocessed with a `pandaseq` run.

The `seqs.fna` file was created from merging the `pandaseq` fastq results with QIIME's `multiple_split_libraries.py`, giving me a `seqs.fna` with 346841 sequences (1387366 lines total).

I've run uclust and swarm on my `seqs.fna` data which have both run under 5 hours. But my sumaclust run is running at 18+ hours (I had it stop at that time).

Is there something I should be checking for that makes sumaclust's run in `pick_otus.py` run so long?

Thanks

Colin Brislawn

unread,
Jul 29, 2016, 7:03:07 PM7/29/16
to Qiime 1 Forum
Hello Daniel,

Can you tell us more about the parameters used with sumaclust, including threads used? My understanding was that sumaculst used an exact search algorithm which was more accurate but slower than uclust. This could explain the difference in speed.

You could also try making a 'toy' data set consisting of the 10,000 lines of your seqs.fna file, and using that to benchmark the algorythms. 

I hope that helps!
Colin

Reply all
Reply to author
Forward
0 new messages