denoise_wrapper.py

35 views
Skip to first unread message

sdpapet

unread,
Jun 30, 2016, 5:04:07 PM6/30/16
to Qiime 1 Forum
Hello, it's my first time to use Qiime deoising scripts to denoise my 454 data. I follow the instruction here (http://qiime.org/tutorials/denoising_454_data.html)

1. I run denoise_wrapper.py, for example,
denoise_wrapper.py -v -i run1.sff.txt -f run1/seqs.fna -o run1/denoised/ -m run1_mapping.txt

The output gives me three fasta files centroids.fasta,singletons.fasta and denoised_seqs.fasta

2. According to the workflow, I need to run inflate_denoiser_output.py (e.g.,inflate_denoiser_output.py -c centroids.fna -s singletons.fna -f seqs.fna -d denoiser_mapping.txt -o denoised_seqs.fna)

I have several questions about I run the inflate_denoiser_output.py.

First, this scripts seems to put the singleton and cnetroids together. Why we need to do this (Run inflate_denoiser_output.py)? Can I use the denoised_seqs.fasta directly to build OTU table and this file is from denoise_wrapper.py. What's the function of this file? Does this file = centroids.fasta+singletons.fasta?

Second, After I run the inflate_denoiser_output.py. I checked the number of sequences in the output. I was surprise then number of seqs in the output fasta file is same as the number of seqs in the fasta befoe denoising (seqs.fna). I don't know why this would happen. I use to use Ampliconnoise to denoise. The reads counts before and after denoising are quite different.

Third, the instruction suggests to pick up otus using the output of inflate_denoiser_output.py with optimal option?  (http://qiime.org/tutorials/denoising_454_data.html) Can anyone tell me what's the difference between optimal option and without? I never use it before? Does this mean I must use --optimal if my input files from Qiime denoising work flow? (i.e. denoise_wrapper.py + inflate_denoiser_output.py)

Thanks,
Ben

sdpapet

unread,
Jul 1, 2016, 12:50:00 PM7/1/16
to Qiime 1 Forum
Can anyone help with this.

Thanks,
Ben

sdpapet

unread,
Jul 4, 2016, 8:57:32 AM7/4/16
to Qiime 1 Forum
Hello, does anyone has the same problem as me?

"the number of seqs in the output fasta file is same as the number of seqs in the fasta befoe denoising (seqs.fna)."

Jens Reeder

unread,
Jul 11, 2016, 7:24:25 PM7/11/16
to Qiime 1 Forum
Hi Ben,

Let me try to answer your questions one-by-one:

1. We need run inflate_denoiser to put the data back into a shape that fits the qiime workflow, in particular the otu_picking step. Denoiser reduces your input sequeces to just one read per cluster - the centroid for clusters with more than one member and a singleton for clusters with just one member. Since the otu picking steps in qiime need the original abundances, the inflate step replicates each centroid as many times as its cluster size. Then in the otu picking steps they will all be collapsed into one OTU. So in short you can;t use the denoised_seqs.fasta file directly for your qiime analysis.

2. that is expected, the number must be identical, for the reason described in 1.

3. As far as I remember the --optimal flag makes uclust to search for the best possible match of a read to all OTUs. If not using --optimal it works in a greedy fashion and picks the first good hit. Except for small data sets we never use the --optimal.

Jens


sdpapet

unread,
Jul 11, 2016, 8:10:35 PM7/11/16
to Qiime 1 Forum
 Hi Jen,

Thanks. I guess your denoising workflow is different from Quince's ampliconnoise. I used to run his workflow a lot. The reads were reduced a lot after denoising.

Here is my fasta files results.

Before
224788  : ./Split_Library_Output/seqs.
fna (Sequence lengths (mean +/- std): 380.1737 +/- 90.5738)
224788  : Total

After denoising

224788  : denoised_AlkB.fna (Sequence lengths (mean +/- std): 434.8327 +/- 87.3534)
224788  : Total

So, your workflow changed the the quality for each read but not the quantity? It seems the average length increase about 50 bp after denoising. Interesting, I thought the length of each read can't be changed once sequencing is done.

~Ben

Jens Reeder

unread,
Jul 12, 2016, 1:57:16 PM7/12/16
to Qiime 1 Forum
The two methods used by AmpliconNoise and the qiime denoiser are actually quite similar. Each of the methods denoises by using a clustering approach on the flowgram level.
There is no error correction on the individual read level, but it all happens by comparing reads at the flowgram level to detect possible flowgram-to-sequence conversion errros.
I suggest you have a look at the paper if you are interested in the details: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2945879/

Technically this means each reads gets assigned to a centroid flowgram/read that is chosen as representative for the whosl cluster. These are the reads that you see in the centroids.fasta and singletons.fasta.
Their number is usually much lower than the initial number of reads.

However, as I said earlier, to feed the output of denoising back into the qiime workflow, which expect to get the correct initial abundances of the reads, we have to inflate the denoiser output back to its original size.
For each read we replace it with the sequence of its cluster centroid. Usually the cluster centroid is of high quality and pretty much uses all available cycles from the sequencing run and thus tends to be longer that the average read.
This is confirmed by the numbers you posted for your data.

Jens




Reply all
Reply to author
Forward
0 new messages