Re: OTUpipe and Denoise

Serena Thomson

unread,

Feb 6, 2012, 8:32:54 AM2/6/12

to Qiime Forum

Hi there,

I wonder if you could provide clarification on the differences between
OTUpipe and denoise_wrapper.py:

I understand that OTUpipe (USEARCH) works on the fasta file and
denoise_wrapper on the flowgrams. Therefore is USEARCH actually
denoising the data?

Is one better than the other?

The instructions for using OTUpipe include using the output from
split_libraries as your input fasta file. If USEARCH is doing
something quite different to denoising, could you use the inflated,
denoised output as your input or would this be over-processing?

Literature has suggested that UCHIME outperforms ChimeraSlayer.
Therefore is there a way of incorporating just the UCHIME element with
the denoiser_wrapper script within MacQIIME?

Thanks

Serena

Jose Carlos Clemente

unread,

Feb 6, 2012, 10:48:20 AM2/6/12

to qiime...@googlegroups.com

Hi Serena,

according to the author of OTUpipe, this tool results in a number of
OTUs very close to the expected one when using a mock community:

http://drive5.com/usearch/perf/mock_results.html

in our results, we have seen beta diversity been essentially the same
as with denoiser, and the relative alpha diversity (i.e. does group A
have higher alpha diversity than group B?) is conserved. There are
concerns about OTUpipe being overly aggressive when discarding small
OTUs, so if you are interested in rare taxa you might want to modify
the default parameters, as described here:

http://qiime.org/svn_documentation/tutorials/otupipe.html

Denoiser (denoise_wrapper.py) is computationally more expensive and
can be prohibitive if your number of sequences is large; in those
cases OTUpipe is a good solution. I wouldn't recommend using OTUpipe
from anything but the output of split_libraries.py, as we haven't
tested other uses.

Concerning uchime, we are working on a solution to integrate it
independently from OTUpipe.

Jose

Mattias

unread,

Feb 15, 2012, 6:25:27 AM2/15/12

to qiime...@googlegroups.com

Hello Jose,

I would like to hear your opinion about running the Qiime Denoiser and OTUpipe on the same dataset. I did this for a small dataset. So, first split_libraries, then denoiser_wrapper and then pick_otus_through_otutable with the usearch option. Do you think it makes sense to combine those two tools? I thought it does because the Denoiser targets the flowgrams (for homopolymers etc) and the OTUpipe scripts specifically looks at chimeric sequences. But I just want to be sure.
And yes, I also observed that OTUpipe is quite strict, lowering the total number of OTU's from 700 to 100 compared to a 'denoise-only' run.

Thanks for your opinion.

Greetz,

Mattias

Jose Carlos Clemente

unread,

Feb 15, 2012, 9:46:20 AM2/15/12

to qiime...@googlegroups.com

Hi Mattias,
as you described it, you are missing inflate_denoiser_output.py after
denoiser_wrapper and before pick_otus_through_otu_table. Did you
forget to mention it, or to use it?
You can use the OTUpipe scripts after denoiser_wrapper if you want to
do chimera checking with usearch/uchime, but make sure to de-activate
other options, i.e. don't use error correction, don't remove OTUs with
less than 4 sequences, etc. All the options are described here:

http://qiime.org/tutorials/otupipe.html

Jose

Mattias de Hollander

unread,

Feb 15, 2012, 10:07:12 AM2/15/12

to qiime...@googlegroups.com

Hello Jose,

Yep, you are right, the inflate command just slipped out of my mind.

Regarding disabling the other filtering option, I am not sure how to do that. I also look at the pick_otus.py documentation and there are options for specifying the percent identity for error correction (-j) and minimal seqs per OTU (-g), but I don't see how to disable them. Or does the --cluster_size_filtering does that when set to False? Thanks for these suggestions as I was not yet doing this...

Thanks,

Mattias

Girish

unread,

Feb 15, 2012, 10:18:06 AM2/15/12

to qiime...@googlegroups.com

Hi Mattias,

Here is the link that you are looking for

http://qiime.org/tutorials/otupipe.html

Regards,
Girish

Mattias de Hollander

unread,

Feb 15, 2012, 10:57:05 AM2/15/12

to qiime...@googlegroups.com

Thanks, but on that page I don't see how I can disable step 4 (filtering noisy sequences). Or am I missing something?

I tried the --cluster_size_filtering False option, but I get this error:

pick_otus.py: error: Positional argument detected: False

Or has this been fixed in the 1.4.0 version of Qiime (I am still on a dev version from december)

Thanks,

Mattias

Tony Walters

unread,

Feb 15, 2012, 11:38:35 AM2/15/12

to qiime...@googlegroups.com

Hello Mattias,

Just pass --cluster_size_filtering (no False after it), it should disable the check.

-Tony

Mattias de Hollander

unread,

Feb 15, 2012, 12:10:36 PM2/15/12

to qiime...@googlegroups.com

Hi,

Didn't think of that. I suppose this works now, because the clustered_seqs.fasta contains now a lot more sequences, although the contents of clustered_error_corrected.fasta is stil the same. I think this file is generated and not used or something?

Mattias

Mattias de Hollander

unread,

Feb 15, 2012, 12:21:36 PM2/15/12

to qiime...@googlegroups.com

Having a second look at it.. Has this only effect on the minimum cluster size, or does this also disable the cluster error detection with OTUpipe (first round of clustering at 97%)?

Thanks!

Tony Walters

unread,

Feb 15, 2012, 12:28:59 PM2/15/12

to qiime...@googlegroups.com

Hello Mattias,

It doesn't affect the initial clustering step, but rather is a filter
on these clusters once chimera detection has discarded potential
chimeras (any cluster below the minimum size is then filtered out).

-Tony

Mattias

unread,

Feb 20, 2012, 11:20:54 AM2/20/12

to qiime...@googlegroups.com

I still have some questions, because the difference in the number of OTUs is quite big.

If I run split_libraries and then pick_otu_through_otutable with usearch (e.g. OTUpipe) and default options I get around 117 OTUs. If I use denoised data with pick_otus i get 786 OTUs. Can this decrease in the number of OTUs really be explained by the chimera's? Or should I adjust the abundance_skew option?
If I run pick_otus_through_otutable on denoised data with the --cluster_size_filtering option, noisy sequences are still removed (step 4). But does this make sense since the inflate_denoiser_scripts has written the centroid sequence multiple times according to the cluster size. According to Jose I should also disable error correcting, but I don't see any option for that. Or should i set --percent_id_err to 1.0?
If I disable the denovo chimera search I end up with 454 OTU's. What is the best way to find the optimal value for abundance_skew if I use the denovo mode? (It's 16S data)

This chimera filtering seems to be very important, if not more than denoising.

Mattias

Tony Walters

unread,

Feb 20, 2012, 12:16:51 PM2/20/12

to qiime...@googlegroups.com

Hello Mattias,

1. If you disable size filtering, or set the value to 2 (-g 2) how does this change the overall number of OTUs?

2. Jose meant to disable the cluster size filtering (-l option).

3. The abundance skew value of 2 was chosen as the value as it was optimized by Edgar to filter chimeras from a specific mock community. It could vary depending upon PCR conditions (the annealing temperature and conservation of the various sequences amplified will modulate the number of chimeras generated, but we don't have a way to estimate how these should be taken into account). I know this isn't a direct answer to the question, but you could increase the value depending on how strict you felt your PCR conditions were.

Best regards,

Tony Walters

JQL

unread,

Mar 19, 2012, 11:29:31 AM3/19/12

to Qiime Forum

Hi,

I am new to OTUpipe. It seems to me from the information I learned so
far, one should try to avoid run both OTUpipe and denoiser, or at
least it is not recommended?

Is the noise filetering at USEARCH same/similar as denoise_wrapper.py?

thanks
John

On Feb 6, 10:48 am, Jose Carlos Clemente <jose.cleme...@gmail.com>
wrote:
> Hi Serena,
>
> according to the author ofOTUpipe, this tool results in a number of

> OTUs very close to the expected one when using a mock community:
>
> http://drive5.com/usearch/perf/mock_results.html
>
> in our results, we have seen beta diversity been essentially the same
> as with denoiser, and the relative alpha diversity (i.e. does group A
> have higher alpha diversity than group B?) is conserved. There are

> concerns aboutOTUpipebeing overly aggressive when discarding small

> OTUs, so if you are interested in rare taxa you might want to modify
> the default parameters, as described here:
>
> http://qiime.org/svn_documentation/tutorials/otupipe.html
>
> Denoiser (denoise_wrapper.py) is computationally more expensive and
> can be prohibitive if your number of sequences is large; in those

> casesOTUpipeis a good solution. I wouldn't recommend usingOTUpipe

> from anything but the output of split_libraries.py, as we haven't
> tested other uses.
>
> Concerning uchime, we are working on a solution to integrate it
> independently fromOTUpipe.
>
> Jose
>
> On Mon, Feb 6, 2012 at 06:32, Serena Thomson
>
>
>
> <serenathoms...@googlemail.com> wrote:
> > Hi there,
>
> > I wonder if you could provide clarification on the differences between
> >OTUpipeand denoise_wrapper.py:
>

> > I understand thatOTUpipe(USEARCH) works on the fasta file and

> > denoise_wrapper on the flowgrams. Therefore is USEARCH actually
> > denoising the data?
>
> > Is one better than the other?
>

> > The instructions for usingOTUpipeinclude using the output from

> > split_libraries as your input fasta file. If USEARCH is doing
> > something quite different to denoising, could you use the inflated,
> > denoised output as your input or would this be over-processing?
>
> > Literature has suggested that UCHIME outperforms ChimeraSlayer.
> > Therefore is there a way of incorporating just the UCHIME element with
> > the denoiser_wrapper script within MacQIIME?
>
> > Thanks
>

> > Serena- Hide quoted text -
>
> - Show quoted text -

Tony Walters

unread,

Mar 19, 2012, 12:23:18 PM3/19/12

to qiime...@googlegroups.com

Hello John,

OTUPipe would best be described as "pseudo-denoising" (by way of filtering out low abundance clusters) and chimera checking. Unlike denoise_wrapper, OTUPipe does not use flowgram (or quality score) data, it just uses fasta sequences, so it's quite different.

The documentation for usearch/OTUPipe is here: http://www.drive5.com/usearch/usearch_docs.html if you want more information about how it works.

If you were to use denoise_wrapper, it would be fine to use OTUPipe for chimera checking with pick_otus.py, but you would probably want to disable the cluster size filtering with the --cluster_size_filtering parameter.

Re: OTUpipe and Denoise_wrapper

Serena Thomson

Jose Carlos Clemente

Mattias

Jose Carlos Clemente

Mattias de Hollander

Girish

Mattias de Hollander

Tony Walters

Mattias de Hollander

Mattias de Hollander

Tony Walters

Mattias

Tony Walters

JQL

Tony Walters