How do we know the results are reasonable or sensible -- from OTUs picking to relative abundance

Hiunvun Chin

unread,

Jan 17, 2017, 10:32:47 PM1/17/17

to Qiime 1 Forum

Hi everyone,
Maybe I haven't red enough.
How can I make sure my results are correct, once I have done the procedures from otus picking to finally got the relative abundance with taxonomy information?
Since we have two groups of people processing the same seqs.fna file with similar methods, but the results are different. How to know which result is better? The methods we used briefly shown here:

pick_otus.py (default: uclust)
pick_rep_set.py (default: pick the first seq)
assign_taxonomy.py -i rep_set.fna -r silva123_97_18S.fasta -t silva123_97_taxonomy_all_levels.txt -m rdp --similarity 0.8 --rdp_max_memory 24000
make_otu_table.py -i seqs_otus.txt -t taxonomy_results/rep_set_tax_assignments.txt -o otu_table.biom
filter_otus_from_otu_table.py -i otu_table.biom -o otu_table_no_singletons.biom -n 2
summarize_taxa.py -i otu_table.biom -o taxonomy_summaries/
plot_taxa_summary.py -i taxonomy_summaries/otu_table_L6.txt -o taxonomy_plot_L6/

Another thing is, there are so many options to tune the parameters, such as select different otus picking methods, different taxonomy assignment methods, different confidence levels, similarity and etc. How do we know if these results generated from different methods with different parameters are sensible and comparable? Imagine different researchers carried out the analyses which may not totally identical, then how these results are comparable? or we just compare our own data?

It is grateful to have your opinions.
Chin

Stefan Janssen

unread,

Jan 18, 2017, 12:42:14 AM1/18/17

to Qiime 1 Forum

Hi Chin,

sorry to disappoint you, but we never know the ground truth :-(
As you noticed, every step involves configuring a set of parameters and selection of algorithms. But even if you would know all details about those algorithms, it is still not clear which one to use. It depends on your biological question, used wet-lab protocols, ...
As a guideline, you might want to take a look into our platform QIITA: https://qiita.ucsd.edu/
There, we try to face the problem you mentioned: how to combine different studies from different researchers. Our approach is to use a strictly defined pipeline for all datasets with only a few parameters to tune. Maybe you want to adopt some of those best practice decisions. But keep in mind, you might get much better results for your specific experiment if you use settings not listed in QIITA.

Does this help?

Hiunvun Chin

unread,

Jan 18, 2017, 11:17:44 AM1/18/17

to Qiime 1 Forum

Hi Stefan,
Thank you very much for the reply.
I am thinking if we work on something based on wrong evidence, then we will be accumulatively more far away from the truth.
Yes, you are right. It depends on the biological question. It requires us to be more careful when dealing with results generated by machine.
I hope more works can be done for this issue.
I will look into the Qiita. Thank you again.

Chin

Reply all

Reply to author

Forward