Sumaclust makes no OTUs

177 views
Skip to first unread message

Andreanna

unread,
Apr 24, 2017, 11:54:33 AM4/24/17
to Qiime 1 Forum
Hi,

I am using Qiime 1.9.1 on my Mac OSX 10.11.6 and trying to call OTUs using Sumaclust. However, when I run Sumaclust I get no OTUs.

I saw a similar previous post here: https://groups.google.com/forum/#!searchin/qiime-forum/sumaclust%7Csort:relevance/qiime-forum/02BcMb1dpxE/x_8eqaRKxcgJ

I followed the suggestions and downloaded and installed Sumaclust 1.0.00 version (I also initially had the wrong version). I tried the new version with the example dataset 10000.fasta (I changed the line endings from Windows to Unix) and ran this command:

$pick_otus.py -m sumaclust -i 10000.fasta -o Sumaclust_Qiime_output

But, it still returns no OTUs and gives no errors or warnings.

I am attaching the output from $print_qiime_config.py, and the output from the pic_otus.py command.

I also tried running Sumaclust from the command line directly (i.e. not with Qiime) and it worked fine (it output 9908 fasta sequences into a file). There were no errors or warning when I compiled Sumaclust.

I can't think of anything else to try at this point, and would appreciate any insights!

Thanks,
Andreanna


Qiime Config.txt
10000_otus.log

Greg Caporaso

unread,
Apr 25, 2017, 5:18:54 PM4/25/17
to Qiime 1 Forum
Hi Andreanna,
Can you share the sumaclust command that worked for you? That would help us determine what might be different. 

Andreanna

unread,
Apr 26, 2017, 5:50:18 AM4/26/17
to Qiime 1 Forum
Hi Greg,

Thanks so much for your help - I can imagine you are super busy!

Either of these commands work for running sumaclust outside of qiime:

sumaclust 10000.fasta
-- This produces the fasta output file in the screen

sumaclust -B 10000_Biom.txt -O 10000_OTUmap.txt -F 10000_OTUs.fasta 10000.fasta
-- This produces the Biom, OTUmapping, and fasta outputs as files. They all seem to be formatted appropriately as far as I can tell since I'm new to this. I have attached them in case they are incorrect and that is causing some trouble with in qiime.

Also, just to note, sumaclust seems to run at the same speed whether through qiime or not, so it doesn't seem to be getting hung up at any particular step when I run it through qiime. Also, when running through qiime, I've tried specifying the full path to the infile, just in case qiime isn't able to find it for some reason, and that doesn't help - I still get no OTUs.

Cheers,
Andreanna

10000_Biom.txt
10000_OTUmap.txt
10000_OTUs.fasta

Andreanna

unread,
Apr 26, 2017, 5:57:47 AM4/26/17
to Qiime 1 Forum
I guess I should also mention that I am able to pick OTUs via Uclust through qiime, so it seems to be something going wrong in particular with sumaclust....

Cheers,
Andreanna

Jai Ram Rideout

unread,
Apr 26, 2017, 7:28:39 PM4/26/17
to Qiime 1 Forum
Hi Andreanna,

This is really odd. Is MacQIIME an option for you? I installed MacQIIME 1.9.1 and was successfully able to run the pick_otus.py command with Jenya's 10000.fasta file (no Windows->Unix line conversion necessary either).

If that's not an option we can continue debugging.

Best,
Jai

Andreanna

unread,
Apr 27, 2017, 6:27:17 AM4/27/17
to Qiime 1 Forum
Hi Jai,

I can definitely try MacQiime, and will let you know how it goes.

After I wrote yesterday, I thought I would have a look at the pick_otus.py script to see if I could figure out if there was some corruption/problem with my version of the script and to see if as a work around it would be possible for me to just run sumaclust outside of qiime and use those results for the next step (without having to parse them first). I didn't have time to make much progress, but I did search for the script and found that I have 6 copies of it and several of them have different file sizes. They are located at:

/miniconda/pkgs/qiime-1.9.1-np110py27_0/bin/pick_otus.py (this script is 200 bytes)
/miniconda/pkgs/qiime-1.9.1-np110py27_0/lib/python2.7/site-packages/qiime/pick_otus.py (82kb)
/miniconda/pkgs/qiime-1.9.1-np110py27_0/lib/python2.7/site-packages/qiime/parallel/pick_otus.py (26kb)
/miniconda/envs/qiime1/bin/pick_otus.py (190 bytes)
/miniconda/envs/qiime1/lib/python2.7/site-packages/qiime-1.9.1-py2.7.egg-info/scripts/pick_otus.py (54kb)
/miniconda/pkgs/qiime-1.9.1-np110py27_0/lib/python2.7/site-packages/qiime-1.9.1-py2.7.egg-info/scripts/pick_otus.py (54kb)
(The last two files appear to be the same after a quick glance)

I wanted to see if any of these were in my $PATH, and after I execute source activate qiime1 at the command line I have the following locations there:
/miniconda/envs/qiime1/bin:
/miniconda/jar:
/miniconda/bin:

So it seems to call the 190 byte script directly. This contains four (really two) lines of code:
#!/miniconda/envs/qiime1/bin/python
# EASY-INSTALL-SCRIPT: 'qiime==1.9.1','pick_otus.py'
__requires__ = 'qiime==1.9.1'
__import__('pkg_resources').run_script('qiime==1.9.1', 'pick_otus.py')

Could having these multiple files be causing the problem? Maybe in my installation it's calling the wrong one? It looks like I only have two copies of each of the other scripts (e.g. make_otu_table.py)

Either way, I will go ahead and install Macqiime now so that we can all get on with our lives : )

More soon,
Andreanna








Andreanna

unread,
Apr 27, 2017, 8:41:13 AM4/27/17
to Qiime 1 Forum
Hi again,

Unfortunately no luck with MacQiime either. When I activate Macqiime in my terminal and then type the command below, qiime behaves as it does with my other installation. It produces the log file but no OTUs and does not give any errors or warnings.

$macqiime
$pick_otus.py -i 10000.fasta -m sumaclust -o ./Sumaclust_output

I deleted the previous attempts and I can tell by the 'Data modified' that this is the output from Macqiime.

When I run $print_qiime_config.py it looks like I am using the macqiime version and it has the updated version of sumaclust installed. I'm attaching the output from both commands.

Feeling a bit paranoid, I was wondering if perhaps the sumaclust it is using was the version I installed independently (which I still had in my path) instead of the one that came with macqiime. So, I exited macqiime, changed the name of the sumaclust executable in my path and typed $which sumaclust. This produced no results. So sumaclust doesn't appear to be anywhere in my usual path anymore. I then 'reactivated' macqiime and typed $which sumaclust and it returned /macqiime/anaconda/bin/sumaclust. So it is apparently looking at the correct executable. I reran the command above and still go no results. Sigh. However, I can run the sumaclust that was installed with macqiime just fine using this command $sumaclust -B 10000_Biom.txt -O 10000_OTUmap.txt -F 10000_OTUs.fasta 10000.fasta

Typing $which pick_otus.py returns /macqiime/anaconda/bin/pick_otus.py so it also seems to be looking in the right place and not at my original copies from my other distribution of qiime. I now have a total of 9 pick_otus.py scripts, so it looks like only 3 more were installed with macqiime:

/macqiime/anaconda/lib/python2.7/site-packages/qiime/pick_otus.py (82kb)
/macqiime/anaconda/bin/pick_otus.py (54kb)
/macqiime/anaconda/lib/python2.7/site-packages/qiime/parallel/pick_otus.py (26kb)

I'm out of ideas for now. If it's possible to just run sumaclust outside of qiime and then use one of the results files for future steps, I would be happy to do that, since it appears to be working...

Thanks,
Andreanna




MacQiime Config.txt
MacQiime_10000_otus.log

Jai Ram Rideout

unread,
Apr 27, 2017, 7:56:01 PM4/27/17
to Qiime 1 Forum
Hi Andreanna,

Nice debugging! Thanks for all the detailed info. Unfortunately we have no idea what is going wrong on this end -- I would have tried the same steps that you did, and it doesn't make any sense why sumaclust + QIIME isn't working for you. We're reaching out to another developer who may be able to help you further.

Best,
Jai

Andreanna

unread,
Apr 28, 2017, 7:50:09 AM4/28/17
to Qiime 1 Forum
Hi Jai,

Bummer to hear that you don't know what's going on either : )  Thanks for asking your colleague - I'll keep my fingers crossed that he or she may have some insights.

In the meantime, I've been thinking...Since this problem seems to occur with two separate distributions, and seems to be within python, then it seems like both pythons must be referring/calling to some underlying thing (for lack of a better word) on my computer that is somehow wrong. Could there be some library (or package/module or something) that's not included in the distribution that I could check? Maybe I have the wrong version or I have it in the wrong place?

Also, I have been trying to narrow down where along the line the problem is occurring. Putting a print command into pick_otus.py I can see that it seems to be parsing the options correctly. It prints the text below, which apparently is also printed to the log file

OtuPicker parameters:
denovo_otu_id_prefix:denovo
exact:False
l:True
prefilter_identical_sequences:True
similarity:0.97
threads:1

I was hoping (as a last ditch effort) that I could print the command that is passed to sumaclust to verify that it's formatted correctly, then I could try to look at the sumaclust output to make sure it's there (presumably this goes to a temporary directory somewhere). My suspicion is that those things are fine because sumaclust appears to run, which would then narrow down the problem to the script that parses the output... However, I can't seem to figure out how qiime passes the command to sumaclust (is it in the pick_otus.py script?) and what script is called to parse the output...my python skills are a bit crude. If it's not too much trouble, could you explain that a little bit? If it's a bother, that's fine. I appreciate that this problem seems to be specific to my computer and you have already spent a bunch of time trying to help me sort this out.

Thanks,
Andreanna



Céline Mercier

unread,
Apr 28, 2017, 3:00:38 PM4/28/17
to Qiime 1 Forum
Hi Andreanna, 

I'm Sumaclust's main developer, I'm not sure what the problem here is, but it could be an issue with an old version of Sumaclust (possibly 1.0).

If Qiime for some reason uses a deprecated binary with that version, it could explain an empty output.

A solution could be to find and delete all Sumaclust binaries from your system except the right one that is 1.0.00 (or greater but not 1.0). You can check the version with 'sumaclust -h'.

Another solution would be to print the Sumaclust help (the C source one) from Qiime, to see what's the version of the binary used, but I haven't found how to do that.

Andreanna

unread,
Apr 28, 2017, 3:10:19 PM4/28/17
to Qiime 1 Forum
Hi Céline,

Thanks for joining in.

Unfortunately, I have already tried that. The version being used by qiime is the 1.0.00 downloaded from a link on their website (verified that qiime is using this version by printing the config file, and I've checked my path to make sure there are no other versions installed). Subsequently, I also installed a second qiime distribution (macqiime) that comes with sumaclust 1.0.00. In both cases sumaclust runs fine independently, but returns 0 OTUs when run through qiime.

Thanks,
Andreanna

Greg Caporaso

unread,
May 2, 2017, 4:36:34 PM5/2/17
to Qiime 1 Forum
Hi Andreanna,
Thanks for the reply. It does sound like your sumaclust binary is working ok. I've contacted the developer of the QIIME wrapper for sumclust to see if she has any ideas about what the issue might be. She or I will follow up with you here. 

Thanks for your patience! 

Greg

Greg Caporaso

unread,
May 5, 2017, 5:43:36 PM5/5/17
to Qiime 1 Forum
Hi Andreanna,
The developer of the QIIME wrapper for sumaclust (Jenya) followed up and said that she'll get back to you here, but is tied up with something at the moment. Just wanted to let you know that we haven't forgotten about you.

Greg

Andreanna

unread,
May 6, 2017, 3:10:14 AM5/6/17
to Qiime 1 Forum
Hi Greg,

Thank you so much for continuing to look into my issue. I haven't had any further luck sorting it out, so I will be very glad to get Jenya's help.

I sincerely appreciate it.

Andreanna

Jenya Kopylov

unread,
May 8, 2017, 7:15:36 AM5/8/17
to Qiime 1 Forum
Hi Andreanna,

Could you paste the output of each command below? Thanks!

$ sumaclust -h
$ which pick_otus.py
$ sumaclust -f -l -O 10000_otus.txt -t 0.97 -p 1 -R 1 SumaClustExactMatchFilterVQseO1.fasta

Jenya




Is 10000_otus.txt empty?


SumaClustExactMatchFilterVQseO1.fasta

Andreanna

unread,
May 8, 2017, 10:06:55 AM5/8/17
to Qiime 1 Forum
Hi Jenya,

Thanks for taking the time to help. Here's the info you requested:

$ sumaclust -h

------------------------------------------------------------
 SUMACLUST Version 1.0.00
------------------------------------------------------------
 Synopsis : star clustering of sequences.
 Usage: sumaclust [options] <dataset>
------------------------------------------------------------
 Options:
 -h       : [H]elp - print <this> help

 -l       : Reference sequence length is the shortest.

 -L       : Reference sequence length is the largest.

 -a       : Reference sequence length is the alignment length (default).

 -n       : Score is normalized by reference sequence length (default).

 -r       : Raw score, not normalized.

 -d       : Score is expressed in distance (default : score is expressed in similarity).

 -t ##.## : Score threshold for clustering. If the score is normalized and expressed in similarity (default),
            it is an identity, e.g. 0.95 for an identity of 95%. If the score is normalized
            and expressed in distance, it is (1.0 - identity), e.g. 0.05 for an identity of 95%.
            If the score is not normalized and expressed in similarity, it is the length of the
            Longest Common Subsequence. If the score is not normalized and expressed in distance,
            it is (reference length - LCS length).
            Only sequences with a similarity above ##.## with the center sequence of a cluster
            are assigned to that cluster. Default: 0.97.

 -e       : Exact option : A sequence is assigned to the cluster with the center sequence presenting the
            highest similarity score > threshold, as opposed to the default 'fast' option where a sequence is
            assigned to the first cluster found with a center sequence presenting a score > threshold.

 -R ##    : Maximum ratio between the counts of two sequences so that the less abundant one can be considered
            as a variant of the more abundant one. Default: 1.0.

 -p ##    : Multithreading with ## threads using openMP.

 -s ####  : Sorting by ####. Must be 'None' for no sorting, or a key in the fasta header of each sequence,
            except for the count that can be computed (default : sorting by count).

 -o       : Sorting is in ascending order (default : descending).

 -g       : n's are replaced with a's (default: sequences with n's are discarded).

 -B ###   : Output of the OTU table in BIOM format is activated, and written to file ###.

 -O ###   : Output of the OTU map (observation map) is activated, and written to file ###.

 -F ###   : Output in FASTA format is written to file ### instead of standard output.

 -f       : Output in FASTA format is deactivated.

------------------------------------------------------------
 Argument : the nucleotide dataset to cluster
------------------------------------------------------------
 http://metabarcoding.org/sumatra
------------------------------------------------------------




$ which pick_otus.py

/miniconda/envs/qiime1/bin/pick_otus.py





$ sumaclust -f -l -O 10000_otus.txt -t 0.97 -p 1 -R 1 SumaClustExactMatchFilterVQseO1.fasta

===========================================================
 SUMACLUST version 1.0.00
 Alignment using SSE2 instructions.
===========================================================
Reading dataset...
Discarded 81 sequences that did not contain only 'AaTtGgCc' characters.
5040 sequences
Indexing dataset... : Done
Sorting sequences by count...
Maximum ratio between the counts of two sequences to connect them: 1.000000
Clustering sequences when similarity >= 0.970000
Aligning and clustering...
Done : 100 %       2145 clusters created.                       
Printing results in OTU table format...
Done.


I've attached the output file from this run, and there are OTUs there. So, it seems that when I run sumaclust outside of Qiime it is able to make OTUs just fine. When I run a similar command through Qiime ($pick_otus.py -m sumaclust -s 0.97 --threads 1 -o Test -i SumaClustExactMatchFilterVQseO1.fasta), then I get some text about the settings in the log file but no OTUs are called, and the otus.text file is empty. I'm attaching the log file from the run in Qiime as well. I've also tried to attache the OTUs file, but it doesn't seem to work.

Please don't hesitate to let me know if there's anything else I can try.

Thanks,
Andreanna





10000_otus.txt
InQiime_SumaClustExactMatchFilterVQseO1_otus.log

Jai Ram Rideout

unread,
May 12, 2017, 5:38:21 PM5/12/17
to Qiime 1 Forum
Hi Andreanna,

We've asked Jenya to follow up with you when she has the chance -- just wanted you to know that this topic hasn't been forgotten.

Best,
Jai

Jenya Kopylov

unread,
May 14, 2017, 1:58:02 AM5/14/17
to Qiime 1 Forum
Hi Andreanna,

Perhaps there is some issue writing the dereplicated sequences to a temporary directory, which Sumaclust then tries to cluster. Could you please try the command below:

pick_otus.py -i SumaClustExactMatchFilterVQseO1.fasta -m sumaclust -o Sumaclust_Qiime_output_suppress_exact_match --suppress_prefilter_exact_match

and let me know if the resulting log file is still empty?


Thanks a lot!
Jenya

Andreanna

unread,
May 15, 2017, 5:35:58 AM5/15/17
to Qiime 1 Forum
Hi again,

Thanks Jai and Jenya for continuing to work with me on this.

I've run the command ($pick_otus.py -i SumaClustExactMatchFilterVQseO1.fasta -m sumaclust -o Sumaclust_Qiime_output_suppress_exact_match --suppress_prefilter_exact_match), and unfortunately the result is the same as before: some text in the log file but no OTUs picked. I'm attaching the output in case it would be helpful.

Please don't hesitate to let me know if you have any other ideas - I happy to try just about anything short of kicking my computer or chucking it off the roof.

Thanks again,
Andreanna
Sumaclust_Qiime_output_suppress_exact_match.zip

Jenya Kopylov

unread,
May 16, 2017, 3:40:53 AM5/16/17
to Qiime 1 Forum
Hi Andreanna,

Let's try two new options. Can you try,

1) The sumaclust command using the attached 10000_clean.fasta file

$ pick_otus.py -m sumaclust -i 10000_clean.fasta -o sumaclust_clean

2) The swarm command using the 10000_clean.fasta file

$ pick_otus.py -i 10000_clean.fasta -m swarm -o swarm_clean

Are you seeing results in the log file?


Jenya


10000_clean.fasta

Andreanna

unread,
May 16, 2017, 3:38:53 PM5/16/17
to Qiime 1 Forum
Hi Jenya,

I ran the commands you sent and am attaching the results.

$pick_otus.py -m sumaclust -i 10000_clean.fasta -o sumaclust_clean

This produced the usual text in the log file, but no OTUs.


$ pick_otus.py -i 10000_clean.fasta -m swarm -o swarm_clean

This produced both text in the log file and also 2969 OTUs.

I hope that helps!
Andreanna

sumaclust_clean.zip
swarm_clean.zip

Andreanna

unread,
May 22, 2017, 12:54:50 PM5/22/17
to Qiime 1 Forum
Any other ideas to try? I've just submitted a grant and am hoping to make progress on this data analysis again.

Thanks,
Andreanna

Jai Ram Rideout

unread,
May 22, 2017, 6:34:35 PM5/22/17
to Qiime 1 Forum
Hi Andreanna,

Greg, Jenya, and I talked about this offline. We haven't had luck in reproducing this error and are out of ideas -- can you try running this script in the QIIME VirtualBox, Amazon EC2, or on some other machine with a QIIME installation? That would help you move forward with your time-sensitive analyses. Alternatively, is there another OTU picker you'd be interested in trying instead of sumaclust? Sorry that we haven't had luck in helping you with this issue.

Best,
Jai

Andreanna

unread,
May 30, 2017, 2:19:31 PM5/30/17
to Qiime 1 Forum
 Hi again Jai,

After a bit of a struggle, I was able to get Qiime loaded onto a Linux computer here and get Sumaclust to work with the 10000_clean.fasta that Jenya sent earlier.  : )  So now I have a work around! I will probably always wonder what is wrong with my laptop, lurking back in the shadows somewhere (and when it will strike again), but I am really grateful for all of your effort to get this sorted out for me. You have really gone above and beyond, and I can't tell you how much I appreciate it.

All the best,
Andreanna

jfg

unread,
May 31, 2017, 9:47:00 AM5/31/17
to Qiime 1 Forum
Hey QIIMErs, 

   Also ran into this problem and eventually(... ) fixed it by removing non-alphanumerics from the header except the opening '>' and underscores, as per example below -

from:
>DW02_7356;uchime_denovo=0.0000;;size=528;
to:
>DW02_7356uchime_denovo00000size528


   Changing machine didn't work for me, while but UCLUST worked fine and so did Sumaclust with the test files. Leaving it to the initiated to speculate as to why that should be... 

 Thanks to Jai & Greg & Jenya and co. for helping troubleshoot.

Céline Mercier

unread,
May 31, 2017, 10:07:36 AM5/31/17
to Qiime 1 Forum
Hi!
That's a separate issue, maybe the one where when a character would not be recognized by the parser it could corrupt the output. I fixed it with version 1.0.20 that was released in february 2016 but I don't think that's the version QIIME uses, so yes you have to be careful with that!
And in that particular case it could actually be an issue with the double ';;', since the parser uses ';' to parse the header!
Céline
Reply all
Reply to author
Forward
0 new messages