Swarm parameter file still using uclust (macqiime 1.9.1)

53 views
Skip to first unread message

Sam

unread,
Apr 19, 2016, 9:09:31 PM4/19/16
to Qiime 1 Forum
Hi all,

I am having a bit of trouble running swarm in de novo OTU picking. I am running this command:

pick_de_novo_otus.py -o denovo_splitfastq_uninhab_swarm -i seqs.fna -p param_files/swarm_params.txt


swarm_params.txt I wrote in textwrangler and contains only one line:


pick_otus:otu_picking_method swarm


However, the output seems to be identical (identical number of OTU's picked) as the uclust method when I don't include a parameter file. Also, I notice that when it runs, at the top of terminal, it says swarm for only a second as it runs, and then uclust appears instead for most of the run time. Does anyone know what I am doing wrong here and how I can get swarm pick my OTU's?


Thanks very much!


Sam


TonyWalters

unread,
Apr 19, 2016, 9:36:35 PM4/19/16
to Qiime 1 Forum
Sam, the uclust you are seeing is probably from the taxonomic assignment step, which uses uclust by default. Can you post the log files that are being created just so we can double check that it's using the right parameters?

Something else you could do is run the OTU picking step stand-alone, i.e., call pick_otus.py -m swarm and -pick_otus.py -m uclust, and make_otu_table.py on the seqs_otus.txt file, and compare the OTU count there.

Sam

unread,
Apr 19, 2016, 10:16:34 PM4/19/16
to Qiime 1 Forum
I have attached the log file. I can't see a difference in the biom summary files between uclust and swarm attempts when using pick_de_novo_otus.py

When I run the pick_otu.py etc commands as you suggested, I seem to have a difference in the biom summary files:

uclust:

Num samples: 2
Num observations: 2942
Total count: 111110
Table density (fraction of non-zero values): 0.556

Counts/sample summary:
 Min: 41553.0
 Max: 69557.0
 Median: 55555.000
 Mean: 55555.000
 Std. dev.: 14002.000
 Sample Metadata Categories: None provided
 Observation Metadata Categories: None provided

Counts/sample detail:
 WER.MS515F: 41553.0
 MER.MS515F: 69557.0 

swarm:

Num samples: 2
Num observations: 12687
Total count: 111110
Table density (fraction of non-zero values): 0.509

Counts/sample summary:
 Min: 41553.0
 Max: 69557.0
 Median: 55555.000
 Mean: 55555.000
 Std. dev.: 14002.000
 Sample Metadata Categories: None provided
 Observation Metadata Categories: None provided

Counts/sample detail:
 WER.MS515F: 41553.0
 MER.MS515F: 69557.0

So I am not quite sure what is going on.... It would be good to get this working with de_novo so I can process samples faster if possible.

Thanks for the help!

Sam
log_20160420013918.txt

TonyWalters

unread,
Apr 19, 2016, 10:23:47 PM4/19/16
to Qiime 1 Forum
Okay, so the log file does indicate that it was using swarm. Does the other log show uclust (or no method at all)?

Can you run biom summarize-table on the OTU table created by the workflow for both swarm and uclust (e.g on this file uninhabited_habitats/Silva119_denovo_splitfastq_uninhab_swarm3/otu_table.biom and on whatever file was created by the uclust pick_de_novo_otus.py call) 


Sam

unread,
Apr 20, 2016, 6:53:47 AM4/20/16
to Qiime 1 Forum
uclust log is uploaded. It looks like it doesn't specify the method.

The summary tables look a little different:

uclust pick_de_novo_otus summary table:

Num samples: 2
Num observations: 2942
Total count: 111110
Table density (fraction of non-zero values): 0.556

Counts/sample summary:
 Min: 41553.0
 Max: 69557.0
 Median: 55555.000
 Mean: 55555.000
 Std. dev.: 14002.000
 Sample Metadata Categories: None provided
 Observation Metadata Categories: taxonomy

Counts/sample detail:
 WER.MS515F: 41553.0
 MER.MS515F: 69557.0

swarm pick_de_novo_otus summary table:

Num samples: 2
Num observations: 12687
Total count: 111110
Table density (fraction of non-zero values): 0.509

Counts/sample summary:
 Min: 41553.0
 Max: 69557.0
 Median: 55555.000
 Mean: 55555.000
 Std. dev.: 14002.000
 Sample Metadata Categories: None provided
 Observation Metadata Categories: taxonomy

Counts/sample detail:
 WER.MS515F: 41553.0
 MER.MS515F: 69557.0

I have run summarize taxa through plots, and it looks like the output between the two is very slightly different, which I guess means it has run swarm. I was expecting swarm to identify quite a lot fewer OTU's than uclust (see ), so I think this is what threw me off. Unless it still hasn't run properly.... what do you think?

Thanks for the help!
log_20160419234556.txt

TonyWalters

unread,
Apr 20, 2016, 7:04:46 AM4/20/16
to Qiime 1 Forum
Well I think we've resolved the question of swarm running and giving different results. As far as the number of OTUs, you might add this parameter to the params.txt file"--pick_otus:swarm_resolution 2

Default is 1, I would try 2 or 3 and see how that alters the OTU counts.

Colin Brislawn

unread,
Apr 20, 2016, 4:56:01 PM4/20/16
to Qiime 1 Forum
Hello Sam,

I'll let Tony continue to troubleshoot this with you. I wanted to quickly comment on the difference between Swarm and uclust. 

I was expecting swarm to identify quite a lot fewer OTU's than uclust
In general, I could expect the opposite, with swarm making more OTUs.

As you already know, swarm does not use a fixed similarity threshold. Rather, it links reads with a single differences into a 'mountain range' (also called single-linkage-clustering) then divides the 'mountain range' into 'separate mountains' by cutting along the 'valleys.' This second step means that reads with as little as 2 bp difference can be 'separate peaks' i.e. different OTUs. 

This means that swarm can have much higher resolution than uclust, and in the process, will also produce more OTUs. I think your result is totally reasonable. 

I hope that helps! 
Colin

Reply all
Reply to author
Forward
0 new messages