Hello All,
I have dataset that I
am classifying based on a non-marker functional gene, so I am using
the pick_de_novo workflow here. UCLUST is run at 97% similarity to
pick otus (default parameters). When I am using the data at a phred
quality of 30, I have ~400,000 reads total across 50 samples, and the
number of observations in the summary biom file is approximately
~12,000. When I checked the alpha_rarefaction curves, it seems that
the depth is not enough.
I decided to reanalyse the dataset with a phred quality threshold of 25 to increase number of available reads. After tha, I ran the same pick_de_novo workflow with same parameters as before. The new biom summary shows nearly 2 million reads, but with 335,000 observations (of which 267,000 are singletons!!). I went on with the core_diversity analysis. The taxonomy assignment have a similar profile even at the species level (few additional taxonomies in the summary plots in the run with more reads at phred quality 25).
The number of uclust clusters increased by nearly 28 times whereas the number of reads increased by just 5 times when I decreased the phred quality threshold from 30 to 25 for the initial filtering of the reads. A similar behaviour is obtained when clustering with cd-hit-est as well. What would you suggest here?
I am using QIIME 1.8 with all associated executables properly configured.
Thanks,
Hazem
I decided to reanalyse the dataset with a phred quality threshold of 25 to increase number of available reads.
The number of uclust clusters increased by nearly 28 times whereas the number of reads increased by just 5 times when I decreased the phred quality threshold from 30 to 25 for the initial filtering of the reads. A similar behaviour is obtained when clustering with cd-hit-est as well.